Parsing a string of binary into text/characters

Parsing a string of binary into text/characters - java

I am probably overlooking something silly, but I've never had to deal with binary in code and thought it'd be a good idea to practice it in an encryption program, for kicks.
Long story short, I'm able to convert a string into binary (in the form of a string), but can't figure out how to do the reverse.
Right now, I have something like this:
public static String bytesToString(String bytes){
int i = bytes.length()/8;
int pos = 0;
String result = "";
for(int j=0; j<i; j++){
String temp = bytes.substring(pos,pos+8);
byte b = (byte) Integer.parseInt(temp);
result = result + Byte.toString(b);
pos++;
}
System.out.println("Result: " + result);
return result;
}
I think the bytes are being parsed as literal numbers. What am I missing?
Edit: To clarify, I will previously have parsed a string of text into bits and written them to a string. I want to split this string into bytes and parse them back into letters. It would take "011010000110010101111001" and return "hey".

How about using Integer.parseInt(text, 2)? As in,
public static int binaryToInt(String binary)
{
return Integer.parseInt(binary, 2);
}
I'm not sure why your binaryToString method both takes and returns a string.

Integer.parseInt(temp) will attempt to read temp as a number and return the corresponding int. For example, Integer.parseInt("123") returns 123
EDIT: Be aware that the binary value of a character or text depends on the encoding you are using. For example "hi" is 0110100001101001 in ASCII but it may not in UTF-16 or UTF-32. And Java encodes characters into UTF-16 characters: see http://download.oracle.com/javase/6/docs/api/java/lang/String.html
(for this reason Java chars are 16-bit unsigned integers).
So your bytesToString method must treat input differently depending on the encoding of the input. Or you may write it specifically for ASCII characters, and maybe rename it to, say, asciiBytesToString
You'd better see:
constructor String(byte[])
http://download.oracle.com/javase/6/docs/api/java/lang/String.html
Integer.parseInt(String s, int radix) http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html

public class BinaryStringToChars {
public static void main(String[] args) {
String bin = "011010000110010101111001";
StringBuilder b = new StringBuilder();
int len = bin.length();
int i = 0;
while (i + 8 <= len) {
char c = convert(bin.substring(i, i+8));
i+=8;
b.append(c);
}
System.out.println(b.toString());
}
private static char convert(String bs) {
return (char)Integer.parseInt(bs, 2);
}
}

You need to advance 8 digits at a time, not digit by digit. Otherwise you are reusing bits. Also, you need to tell Integer.parseInt() what radix you want to use, since parseInt(String val) cannot really detect binary (you want Integer.parseInt(String val, int radix). You also need to choose a character encoding to convert bytes into characters (they are not the same thing!). Assuming ISO-8859-1 is ok:
public static String bytesToString(String bytes){
int i = bytes.length()/8;
int pos = 0;
String result = "";
byte[] buffer = new byte[i];
for(int j=0; j<i; j++){
String temp = bytes.substring(pos,pos+8);
buffer[j] = (byte) Integer.parseInt(temp, 2);
pos+=8;
}
result = new String(buffer, "ISO-8859-1");
System.out.println("Result: " + result);
return result;
}

Related

convert string contains ISO 8859-1 hex characters code to UTF-8 java

I have a string, which I believed contains some of ISO-8859-1 hex character code
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n"
And I want to change it into this,
Áo thun bé gái cột dây xanh biển
I have tried this method but no luck
byte[] isoBytes = doc.getBytes("ISO-8859-1");
System.out.println(new String(isoBytes, "UTF-8"));
What is the proper way to convert it? Many thanks for your help!

On the assumption that the #nnnn; sequences are plain old Unicode character representation, I suggest the following approach.
class Cvt {
static String convert(String in) {
String str = in;
int curPos = 0;
while (curPos < str.length()) {
int j = str.indexOf("#x", curPos);
if (j < 0) // no more #x
curPos = str.length();
else {
int k = str.indexOf(';', curPos + 2);
if (k < 0) // unterminated #x
curPos = str.length();
else { // convert #xNNNN;
int n = Integer.parseInt(str.substring(j+2, k), 16);
char[] ch = { (char)n };
str = str.substring(0, j) + new String(ch) + str.substring(k+1);
curPos = j + 1; // after ch
}
}
}
return str;
}
static public void main(String... args) {
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
System.out.println(convert(doc));
}
}
This is very similar to the approach of the previous answer, except for the assumption that the character is a Unicode codepoint and not an 8859-1 codepoint.
And the output is
Áo thun bé gái cột dây xanh biển

There is no hex literal syntax for strings in Java. If you need to support that String format, I would make a helper function which parses that format and builds up a byte array and then parse that as ISO-8859-1.
import java.io.ByteArrayOutputStream;
public class translate {
private static byte[] parseBytesWithHexLiterals(String s) throws Exception {
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (!s.isEmpty()) {
if (s.startsWith("#x")) {
s = s.substring(2);
while (s.charAt(0) != ';') {
int i = Integer.parseInt(s.substring(0, 2), 16);
baos.write(i);
s = s.substring(2);
}
} else {
baos.write(s.substring(0, 1).getBytes("US-ASCII")[0]);
}
s = s.substring(1);
}
return baos.toByteArray();
}
public static void main(String[] args) throws Exception {
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
byte[] parsedAsISO88591 = parseBytesWithHexLiterals(doc);
doc = new String(parsedAsISO88591, "ISO-8859-1");
System.out.println(doc); // Print out the string, which is in Unicode internally.
byte[] asUTF8 = doc.getBytes("UTF-8"); // Get a UTF-8 version of the string.
}
}

This is a case where the code can really obscure the requirements. The requirements are a bit uncertain but seem to be to decode a specialized Unicode character entity reference similar to HTML and XML, as documented in the comments.
It is also a somewhat rare case where the advantage of the regular expression engine outweighs any studying needed to understand the pattern language.
String input = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
// Hex digits between "#x" and ";" are a Unicode codepoint value
String text = java.util.regex.Pattern.compile("(#x([0-9A-Fa-f]+);)")
.matcher(input)
// group 2 is the matched input between the 2nd ( in the pattern and its paired )
.replaceAll(x -> new String(Character.toChars(Integer.parseInt(x.group(2), 16))));
System.out.println(text);
The matcher function finds candidate strings to replace that match the pattern. The replaceAll function replaces them with the calculated Unicode codepoint. Since a Unicode codepoint might be encoded as two char (UTF-16) values the desired replacement string must be constructed from a char[].

How to achieve php ^ in java [duplicate]

How to do bitwise XOR operation to two strings in java.

You want something like this:
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
import java.io.IOException;
public class StringXORer {
public String encode(String s, String key) {
return base64Encode(xorWithKey(s.getBytes(), key.getBytes()));
}
public String decode(String s, String key) {
return new String(xorWithKey(base64Decode(s), key.getBytes()));
}
private byte[] xorWithKey(byte[] a, byte[] key) {
byte[] out = new byte[a.length];
for (int i = 0; i < a.length; i++) {
out[i] = (byte) (a[i] ^ key[i%key.length]);
}
return out;
}
private byte[] base64Decode(String s) {
try {
BASE64Decoder d = new BASE64Decoder();
return d.decodeBuffer(s);
} catch (IOException e) {throw new RuntimeException(e);}
}
private String base64Encode(byte[] bytes) {
BASE64Encoder enc = new BASE64Encoder();
return enc.encode(bytes).replaceAll("\\s", "");
}
}
The base64 encoding is done because xor'ing the bytes of a string may not give valid bytes back for a string.

Note: this only works for low characters i.e. below 0x8000, This works for all ASCII characters.
I would do an XOR each charAt() to create a new String. Like
String s, key;
StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.length(); i++)
sb.append((char)(s.charAt(i) ^ key.charAt(i % key.length())));
String result = sb.toString();
In response to #user467257's comment
If your input/output is utf-8 and you xor "a" and "æ", you are left with an invalid utf-8 string consisting of one character (decimal 135, a continuation character).
It is the char values which are being xor'ed, but the byte values and this produces a character whichc an be UTF-8 encoded.
public static void main(String... args) throws UnsupportedEncodingException {
char ch1 = 'a';
char ch2 = 'æ';
char ch3 = (char) (ch1 ^ ch2);
System.out.println((int) ch3 + " UTF-8 encoded is " + Arrays.toString(String.valueOf(ch3).getBytes("UTF-8")));
}
prints
135 UTF-8 encoded is [-62, -121]

Pay attention:
A Java char corresponds to a UTF-16 code unit, and in some cases two consecutive chars (a so-called surrogate pair) are needed for one real Unicode character (codepoint).
XORing two valid UTF-16 sequences (i.e. Java Strings char by char, or byte by byte after encoding to UTF-16) does not necessarily give you another valid UTF-16 string - you may have unpaired surrogates as a result. (It would still be a perfectly usable Java String, just the codepoint-concerning methods could get confused, and the ones that convert to other encodings for output and similar.)
The same is valid if you first convert your Strings to UTF-8 and then XOR these bytes - here you quite probably will end up with a byte sequence which is not valid UTF-8, if your Strings were not already both pure ASCII strings.
Even if you try to do it right and iterate over your two Strings by codepoint and try to XOR the codepoints, you can end up with codepoints outside the valid range (for example, U+FFFFF (plane 15) XOR U+10000 (plane 16) = U+1FFFFF (which would the last character of plane 31), way above the range of existing codepoints. And you could also end up this way with codepoints reserved for surrogates (= not valid ones).
If your strings only contain chars < 128, 256, 512, 1024, 2048, 4096, 8192, 16384, or 32768, then the (char-wise) XORed strings will be in the same range, and thus certainly not contain any surrogates. In the first two cases you could also encode your String as ASCII or Latin-1, respectively, and have the same XOR-result for the bytes. (You still can end up with control chars, which may be a problem for you.)
What I'm finally saying here: don't expect the result of encrypting Strings to be a valid string again - instead, simply store and transmit it as a byte[] (or a stream of bytes). (And yes, convert to UTF-8 before encrypting, and from UTF-8 after decrypting).

This solution is compatible with Android (I've tested and used it myself). Thanks to #user467257 whose solution I adapted this from.
import android.util.Base64;
public class StringXORer {
public String encode(String s, String key) {
return new String(Base64.encode(xorWithKey(s.getBytes(), key.getBytes()), Base64.DEFAULT));
}
public String decode(String s, String key) {
return new String(xorWithKey(base64Decode(s), key.getBytes()));
}
private byte[] xorWithKey(byte[] a, byte[] key) {
byte[] out = new byte[a.length];
for (int i = 0; i < a.length; i++) {
out[i] = (byte) (a[i] ^ key[i%key.length]);
}
return out;
}
private byte[] base64Decode(String s) {
return Base64.decode(s,Base64.DEFAULT);
}
private String base64Encode(byte[] bytes) {
return new String(Base64.encode(bytes,Base64.DEFAULT));
}
}

Assuming (!) the strings are of equal length, why not convert the strings to byte arrays and then XOR the bytes. The resultant byte arrays may be of different lengths too depending on your encoding (e.g. UTF8 will expand to different byte lengths for different characters).
You should be careful to specify the character encoding to ensure consistent/reliable string/byte conversion.

This is the code I'm using:
private static byte[] xor(final byte[] input, final byte[] secret) {
final byte[] output = new byte[input.length];
if (secret.length == 0) {
throw new IllegalArgumentException("empty security key");
}
int spos = 0;
for (int pos = 0; pos < input.length; ++pos) {
output[pos] = (byte) (input[pos] ^ secret[spos]);
++spos;
if (spos >= secret.length) {
spos = 0;
}
}
return output;
}

the abs function is when the Strings are not the same length so the legth of the result will be the same as the min lenght of the two String a and b
public String xor(String a, String b){
StringBuilder sb = new StringBuilder();
for(int k=0; k < a.length(); k++)
sb.append((a.charAt(k) ^ b.charAt(k + (Math.abs(a.length() - b.length()))))) ;
return sb.toString();
}

How to take a 1's complement of a binary String

I have a String of length >10^4 which has only binary numbers.
How can I take 1's complement of it ?
Example- Sting a = "0101"
I want String b = "1010"
Is there any better method other than replacing every character using StringBuffer/StringBuilder?

I suggest to avoid reinventing the wheel you use BigInteger. It’s not method gives you almost what you want, only it gives you a negative number when applied to a positive one. To get back into positive, add 2^n where n is the length of the original string:
String a = "0101";
BigInteger twoToLength = new BigInteger("2").pow(a.length());
String b = twoToLength.add(new BigInteger(a, 2).not()).toString(2);
System.out.println(b);
This prints:
1010
The argument 2 to the constructor and toString() is an radix indicating binary numbers.
We are not quite there yet: if the original string has leading ones, the leading zeroes in the result are not printed. You will have to prepend these manually to get the same string length as you had originally. I think the easiest way to do this is to add 2^(n+1) instead of 2^n so we are sure there is at least one 1 bit in front of the bits we really care about. So we remove this bit only after converting back to a string:
String a = "0101";
int length = a.length();
// add a couple of more bits in front to make sure we have a positive number
BigInteger twoToLengthPlus1 = BigInteger.ONE.shiftLeft(length + 1);
String b = twoToLengthPlus1.add(new BigInteger(a, 2).not()).toString(2);
// remove extra bits from the front again
b = b.substring(b.length() - length);
With this change 1010 becomes 0101.

Does it have to be a String? If a CharSequence is enough, you can do this:
public class BinaryComplementCharSequence implements CharSequence {
private final String source;
public BinaryComplementCharSequence(String source) {
this.source = source;
}
#Override
public int length() {
return source.length();
}
#Override
public char charAt(int index) {
switch (source.charAt(index)) {
case '0':
return '1';
case '1':
return '0';
default:
throw new IllegalStateException();
}
}
#Override
public CharSequence subSequence(int start, int end) {
return new BinaryComplementCharSequence(source.substring(start, end));
}
#Override
public String toString() {
return new StringBuilder(length()).append(this).toString();
}
}
If you really need a String, call toString() (but that uses a StringBuilder again).

You've already figured it out: use a StringBuilder and replace every bloody char individually.
You could also use a char array: char ca[] = str.toCharArray() to extract the characters, modify individual chars in ca, String newstr =new String(ca) to pack the array back into a String. Might be slightly faster.
Take your pick.

Every char check with '1' and invert:
char[] charsConverted = new char[a.length()];
char[] charArray = a.toCharArray();
for (int i = 0; i < charArray.length; i++) {
boolean b = charArray[i] == '1';
charsConverted[i] = !b ? '1' : '0';
}
String b= String.valueOf(charsConverted);

Why can't I store Japanese UTF-8 characters in char array in Java?

I have a string "1234567(Asics (アシックスワーキング) )". It has unicode character, some are a part of ASCII and some are not. What java does is that it takes one byte for ASCII character and two bytes for other unicode characters.
Some part of my program is unable to process the string in this format. So I wanted to encode the values into escaped sequences.
So the string
"1234567(Asics (アシックスワーキング) )"
would map to
"\u0031\u0032\u0033\u0034\u0035\u0036\u0037\u0028\u0041\u0073\u0069\u0063\u0073\u0020\u0028\u30a2\u30b7\u30c3\u30af\u30b9\u30ef\u30fc\u30ad\u30f3\u30b0\u0029\u0020\u0029"
.
I wrote this function to do this :-
public static String convertToEscaped(String utf8) throws java.lang.Exception
{
char[] str = utf8.toCharArray();
StringBuilder unicodeStringBuilder = new StringBuilder();
for(int i = 0; i < str.length; i++){
char charValue = str[i];
int intValue = (int) charValue;
String hexValue = Integer.toHexString(intValue);
unicodeStringBuilder.append("\\u");
for (int length = hexValue.length(); length < 4; length++) {
unicodeStringBuilder.append("0");
}
unicodeStringBuilder.append(hexValue);
}
return unicodeStringBuilder.toString();
}
This was working fine outside of my program but caused issues inside my program. This was happening to the line char[] str = utf8.toCharArray();
Somehow I was loosing my japanese unicode characters and this was happening because t was dividing these characters into 2 in the char array.
So I decided to go with byte [] instead.
public static String convertToEscaped(String utf8) throws java.lang.Exception
{
byte str[] = utf8.getBytes();
StringBuilder unicodeStringBuilder = new StringBuilder();
for(int i = 0; i < str.length - 1 ; i+=2){
int intValue = (int) str[i]* 256 + (int)str[i+1];
String hexValue = Integer.toHexString(intValue);
unicodeStringBuilder.append("\\u");
for (int length = hexValue.length(); length < 4; length++) {
unicodeStringBuilder.append("0");
}
unicodeStringBuilder.append(hexValue);
}
return unicodeStringBuilder.toString();
}
Output :
\u3132\u3334\u3536\u3738\u2841\u7369\u6373\u2028\uffffe282\uffffa1e3\uffff81b7\uffffe283\uffff82e3\uffff81af\uffffe282\uffffb8e3\uffff82af\uffffe283\uffffbbe3\uffff81ad\uffffe283\uffffb2e3\uffff81b0\u2920
But this is also wrong as I am merging two single byte characters into one. What can I do to overcome this?

I don't know your other code's specific requirements. But my advice is to not reinvent the wheel and use the built-in encoding capabilities of the API.
For instance call getBytes with either StandardCharsets.UTF_16BE or StandardCharsets.UTF_16LE based on the endian-ness you need:
String s = "1234567(Asics (アシックスワーキング) )";
byte[] utf8 = s.getBytes(StandardCharsets.UTF_8);
byte[] utf16 = s.getBytes(StandardCharsets.UTF_16BE); // high order byte first
System.out.println(s.length()); // 28
System.out.println(utf8.length); // 48
System.out.println(utf16.length); // 56 (2 bytes for each char)

As they commented above the internal representation of string in java is utf-16. Found
Character.codePointAt() and Integer.toHexString() that are helpful in your case.
Renamed the parameter to just theString, also removed the throws Exception clause from your original method since no exception was thrown. (it is bad practice in general to throw these generic exceptions)
public static String convertToEscaped(String theString) {
char[] charArr = theString.toCharArray();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < charArr.length; i++) {
String hexString = Integer.toHexString(Character.codePointAt(charArr, i));
sb.append("\\u");
if (hexString.length() == 2) {
sb.append("00");
}
sb.append(hexString);
}
return sb.toString();
}

How do I create a padString function in Java?

Every other question I have seen in my book I had at least some understanding of what the book was asking but this one I have no idea on how to approach it. It goes:
"Write a method called padString that accepts two parameters: a String and an integer representing a length. For example,
padString ("hello", 8)
should return "hello " (that's three spaces in there at the end). If the string's length is already at least as long as the length parameter, your method should return the original string. For example,
padString ("congratulations", 10)
should return "congratualtions".
I have no idea on how to approach this being pretty new to Java. This is supposed to be a beginner's homework so I suppose the method is very simple. Please show me how to do this and explain the steps if you can. Please and Thank you to whoever helps.

So your function should do something like this:
Determine number of padding characters required.
Need <= 0 padding characters? return input string
Otherwise, create a string with required padding characters, then return input string + required padding characters
You can find a string's length with the .length() method.

You could use the printf method in System.out (needs Java 1.6 or later, it's a new PrintStream method). Hake a look at an interesting example below, where the output is (specified below code). The padding is specified in the printf argument as 30, and is justified left:
package pft;
public class PrintfTest {
public static void main(String[] args) {
int padding = 30;
String s = "hi!";
System.out.printf("'%0$-" + padding + "s'", s);
}
}
prints: 'hi! '.

Taking it piece at a time (and without giving you all the code):
"Write a method called padString that
accepts two parameters: a String and
an integer representing a length."
public static ??? padString(String str, int len)
"For example,padString("hello", 8)
should return "hello"."
public static String padString(String str, int len)
{
throw new Error("not implemented yet");
}
"If the string's length is already at
least as long as the length parameter,
your method should return the original
string. For example,
padString("congratulations", 10)
should return "congratualtions"."
EDIT: you fixed the question...
public static String padString(String str, int len)
{
// if the str.length is greater than len
// return str
// this next part is very simple, not a very good way but gets you
// started. Once you have it working look at StringBuilder and .append.
// int val = the difference in length between the two strings
// for i = 0; i is less than val; i++
// str += " ";
// return str
}

public class PadString {
public static void main(String[] args) {
String str = "hello";
str = padStr(str, 10, ' ');
}
static String padStr(String s, int len, char c) {
int n = len - s.length();
if (n <= 0)
return s;
StringBuilder b = new StringBuilder(s);
for (int i = 0; i < n; i++)
b.append(c);
return b.toString();
}
}

Even thought this post is about 2 years old. I just recently had this
question for a homework. And I thought it might help other beginners
that might come across this problem to see a simpler way of solving
this problem.
One that will probably be more in line to where they are in their
beginner java course assuming they are getting this around the same
time that I did.
Of course you should remove the dashes in the loop and use spaces to
get credit for the assignment, that is there just to show you that it
works.
public class ex3_11_padString {
public static void main(String[] args) {
padString("hello",10);
}
public static String padString( String s, int len) {
int s_length = s.length();
int diff = len - s_length;
String newString;
newString = "";
for (int x = 0 ; x < diff; x++) {
newString += "-";
}
newString = newString + s;
return new String;
}
}

You may want to take a look at Java's String class documentation. Look for a method that returns the length of the string...

public static String padString(String str, int len)
{
int lengt=str.length();
if(lengt==len||lengt>len)
return str;
else if(lengt<len){
String spacstr="";
for(var i=lengt;i<len;i++){
spacstr+="$";
}
return str+spacstr;
}
}
///more generalized by accepting pad character
public static String padString(String str, int len,String padChar)
{
int lengt=str.length();
if(lengt==len||lengt>len)
return str;
else if(lengt<len){
String spacstr="";
for(var i=lengt;i<len;i++){
spacstr+=padChar;
}
return str+spacstr;
}
}

public String padString (String s, int padding) {
return String.format("%-" + padding + "s", s);
}
This is the better solution for me. Taken from the comment of #John C, with the "%-" added.
Sorry #John C I cannot edit your comment or add one below yours.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing a string of binary into text/characters - java

How about using Integer.parseInt(text, 2)? As in, public static int binaryToInt(String binary) { return Integer.parseInt(binary, 2); } I'm not sure why your binaryToString method both takes and returns a string.

Related

convert string contains ISO 8859-1 hex characters code to UTF-8 java

How to achieve php ^ in java [duplicate]

How to take a 1's complement of a binary String

Why can't I store Japanese UTF-8 characters in char array in Java?

How do I create a padString function in Java?

Categories

Resources