Convert String to/from byte array without encoding - java

I have a byte array read over a network connection that I need to transform into a String without any encoding, that is, simply by treating each byte as the low end of a character and leaving the high end zero. I also need to do the converse where I know that the high end of the character will always be zero.
Searching the web yields several similar questions that have all got responses indicating that the original data source must be changed. This is not an option so please don't suggest it.
This is trivial in C but Java appears to require me to write a conversion routine of my own that is likely to be very inefficient. Is there an easy way that I have missed?

No, you aren't missing anything. There is no easy way to do that because String and char are for text. You apparently don't want to handle your data as text—which would make complete sense if it isn't text. You could do it the hard way that you propose.
An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). ISO-8859-1 or IBM437 both qualify. (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way.
As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes.

This will convert a byte array to a String while only filling the upper 8 bits.
public static String stringFromBytes(byte byteData[]) {
char charData[] = new char[byteData.length];
for(int i = 0; i < charData.length; i++) {
charData[i] = (char) (((int) byteData[i]) & 0xFF);
}
return new String(charData);
}
The efficiency should be quite good. Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead.

Here is a sample code which will convert String to byte array and back to String without encoding.
public class Test
{
public static void main(String[] args)
{
Test t = new Test();
t.Test();
}
public void Test()
{
String input = "Hèllo world";
byte[] inputBytes = GetBytes(input);
String output = GetString(inputBytes);
System.out.println(output);
}
public byte[] GetBytes(String str)
{
char[] chars = str.toCharArray();
byte[] bytes = new byte[chars.length * 2];
for (int i = 0; i < chars.length; i++)
{
bytes[i * 2] = (byte) (chars[i] >> 8);
bytes[i * 2 + 1] = (byte) chars[i];
}
return bytes;
}
public String GetString(byte[] bytes)
{
char[] chars = new char[bytes.length / 2];
char[] chars2 = new char[bytes.length / 2];
for (int i = 0; i < chars2.length; i++)
chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));
return new String(chars2);
}
}

Using deprecated constructor String(byte[] ascii, int hibyte)
String string = new String(byteArray, 0);

String is already encoded as Unicode/UTF-16. UTF-16 means that it can take up to 2 string "characters"(char) to make one displayable character. What you really want is to use is:
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString);
to convert a String to an array of bytes. This does exactly what you did above except it is 10 times faster in performance. If you would like to cut the transmission data nearly in half, I would recommend converting it to UTF8 (ASCII is a subset of UTF8) - the format the internet uses 90% of the time, by calling:
byte[] bytes = Encoding.UTF8.GetBytes(myString);
To convert back to a string use:
String myString = Encoding.Unicode.GetString(bytes);
or
String myString = Encoding.UTF8.GetString(bytes);

Related

How to convert binary string to Java String encoded using UFT-8

In order to send a chunk of bits from a 4 words String, I'm doing getting the byte array from the String and calculating the bit string.
StringBuilder binaryStr = new StringBuilder();
byte[] bytesFromStr = str.getBytes("UTF-8");
for (int i = 0, l = bytesFromStr.length; i < l; i++) {
binaryStr.append(Integer.toBinaryString(bytesFromStr[i]));
}
String result = binaryStr.toString();
The problem appears when I want to do the reverse operation: converting a bit string to a Java String encoded using UTF-8.
Please, Is there someone that can explain me the best way to do that?
Thanks in advance!
TL;DR Don't use toBinaryString(). See solution at the end.
Your problem is that Integer.toBinaryString() doesn't return leading zeroes, e.g.
System.out.println(Integer.toBinaryString(1)); // prints: 1
System.out.println(Integer.toBinaryString(10)); // prints: 1010
System.out.println(Integer.toBinaryString(100)); // prints: 1100100
For your purpose, you want to always get 8 bits for each byte.
You also need to prevent negative values from causing errors, e.g.
System.out.println(Integer.toBinaryString((byte)129)); // prints: 11111111111111111111111110000001
Easiest way to accomplish that is like this:
Integer.toBinaryString((b & 0xFF) | 0x100).substring(1)
First, it coerces the byte b to int, then retains only lower 8 bits, and finally sets the 9th bit, e.g. 129 (decimal) becomes 1 1000 0001 (binary, spaces added for clarity). It then excludes that 9th bit, in effect ensuring that leading zeroes are in place.
It's better to have that as a helper method:
private static String toBinary(byte b) {
return Integer.toBinaryString((b & 0xFF) | 0x100).substring(1);
}
In which case your code becomes:
StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
binaryStr.append(toBinary(b));
String result = binaryStr.toString();
E.g. if str = "Hello World", you get:
0100100001100101011011000110110001101111001000000101011101101111011100100110110001100100
You could of course just do it yourself, without resorting to toBinaryString():
StringBuilder binaryStr = new StringBuilder();
for (byte b : str.getBytes("UTF-8"))
for (int i = 7; i >= 0; i--)
binaryStr.append((b >> i) & 1);
String result = binaryStr.toString();
That will probably run faster too.
Thanks #Andreas for your code. I test using your function and "decoding" again to UTF-8 using this:
StringBuilder revealStr = new StringBuilder();
for (int i = 0; i < result.length(); i += 8) {
revealStr.append((char) Integer.parseUnsignedInt(result.substring(i, i + 8), 2));
}
Thanks for all folks to help me.

Java decryption from byte to text

I want to write a java program which takes a text field with byte data. Output of my program should be string. How can I achieve that. Any inputs are appreciated.
Input is
85f960f0 82868260 f4f78486 60f8f6f
Output is string format like customer, hero, english..
I am planning to write a simple java program.
Thanks in advance.
Sorry for missing out details first time. I am in learning stages now.
Your question doesn't provide enough detail for a full answer. Assuming the fragment "85f960f0 82868260 f4f78486 60f8f6f" is the output you want...
convert a byte array to hexadecimal string using String.format() using the %x pattern within a loop.
Use %02x to pad each octet to 2 digits if necessary
if you need spaces every 8 characters you could do this by checking to see if the counter is divisible by 4 using the % operator.
For example.
byte[] valueFromTextField = "hello world foo bar".getBytes();
StringBuilder builder = new StringBuilder();
int i = 0;
for (byte element : valueFromTextField) {
if (i % 4 == 0 && builder.length() > 0) {
builder.append(" ");
}
builder.append(String.format("%02x", element));
i++;
}
System.out.println(builder.toString());
Output
68656c6c 6f20776f 726c6420 666f6f20 626172
Assuming you are having byte[] bytes and you can convert it using bytes.toString() OR you can change byte by byte
byte[] bites = new byte[]{24,4,72,56};
for(int i = 0; i < bites.length; i++)
System.out.println(new String(bites, i,1));
String hexadecimals = textField.getText();
hexadecimals = hexadecimals.replaceAll("[^0-9A-Fa-f]", ""); // Remove garbage
int nn = hexadecimals.length();
if (nn % 2 != 0) {
JOptionPane.showMessageDialog(null, "... must be even", JOptionPane.ERROR_MESSAGE);
return "";
}
byte[] bytes = new byte[nn / 2];
for (int i = 0; i < nn - 1; i += 2) {
int b = Integer.parseInt(hexadecimals.substring(i, i + 2), 16);
bytes[i] = (byte) b;
}
return new String(bytes, StandardCharsets.UTF_8);
This uses Integer.parseInt with base 16 (0-9A-F) in integer range to ignore negative byte values.
To convert those bytes to text (which in Java is Unicode to hold any combination of chars), one needs to know which text encoding those bytes are in. Here I use UTF-8, which however requires adherance to the UTF-8 multibyte format.

My java class implementation of XOR encryption has gone wrong

I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}
Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.
If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.

Taking a string representation of a large integer and converting it to a byte array in Java

Basically, my problem is two-fold, and refers pretty specifically to the Bitcoin RPC. I am writing a miner in Java for Litecoin (a spinoff of BTC) and need to take a string that looks like:
000000000000000000000000000000000000000000000000000000ffff0f0000
Convert it to look like
00000fffff000000000000000000000000000000000000000000000000000000
(Which I believe is switching from little endian to big endian)
I then need to turn that string into a byte array --
I've looked at the Hex class from org.apache, String.toByte(), and a piece of code that looks like:
public static byte[] toByta(char[] data) {
if (data == null) return null;
// ----------
byte[] byts = new byte[data.length * 2];
for (int i = 0; i < data.length; i++)
System.arraycopy(toByta(data[i]), 0, byts, i * 2, 2);
return byts;
}
So essentially: What is the best way, in Java to change endianness? And what is the best way to take a string representation of a number and convert it to a byte array to be hashed?
EDIT: I had the wrong result after changing the endian.
Integer and BigInteger both have toString methods taking a radix, so
you can get the hex String.
You can make a StringBuffer from that
String and call reverse().
You then convert back to a String using
toString(), then get the bytes via getBytes();
Don't know if this is "best" but it requires little work on your part.
If you need better speed, call getBytes() on the original wrong direction hex string (from step 1) and reverse it in place using a for loop. e.g.
for (int i=0; i<bytes.length/2; i++) {
byte temp = bytes[i];
bytes[i] = bytes[bytes.length - i];
bytes[bytes.length - i] = temp;
}

String to binary and vice versa: extended ASCII

I want to convert a String to binary by putting it in a byte array (String.getBytes[]) and then store the binary string for each byte (Integer.toBinaryString(bytearray)) in a String[]. Then I want to convert back to normal String via Byte.parseByte(stringarray[i], 2). This works great for standard ASCII-Table, but not for the extended one. For example, an A gives me 1000001, but an Ä returns
11111111111111111111111111000011
11111111111111111111111110000100
Any ideas how to manage this?
public class BinString {
public static void main(String args[]) {
String s = "ä";
System.out.println(binToString(stringToBin(s)));
}
public static String[] stringToBin(String s) {
System.out.println("Converting: " + s);
byte[] b = s.getBytes();
String[] sa = new String[s.getBytes().length];
for (int i = 0; i < b.length; i++) {
sa[i] = Integer.toBinaryString(b[i] & 0xFF);
}
return sa;
}
public static String binToString(String[] strar) {
byte[] bar = new byte[strar.length];
for (int i = 0; i < strar.length; i++) {
bar[i] = Byte.parseByte(strar[i], 2);
System.out.println(Byte.parseByte(strar[i], 2));
}
String s = new String(bar);
return s;
}
}
First off: "extended ASCII" is a very misleading title that's used to refer to a ton of different encodings.
Second: byte in Java is signed, while bytes in encodings are usually handled as unsigned. Since you use Integer.toBinaryString() the byte will be converted to an int using sign extension (because byte values > 127 will be represented by negative values in Java).
To avoid this simply use & 0xFF to mask all but the lower 8 bit like this:
String binary = Integer.toBinaryString(byteArray[i] & 0xFF);
To expand on Joachim's point about "extended ASCII" I'd add...
Note that getBytes() is a transcoding operation that converts data from UTF-16 to the platform default encoding. The encoding varies from system to system and sometimes even between users on the same PC. This means that results are not consistent on all platforms and if a legacy encoding is the default (as it is on Windows) that data can be lost.
To make the operation symmetrical, you need to provide an encoding explicitly (preferably a Unicode encoding such as UTF-8 or UTF-16.)
Charset encoding = Charset.forName("UTF-16");
byte[] b = s1.getBytes(encoding);
String s2 = new String(b, encoding);
assert s1.equals(s2);

Categories