Convert Java string to byte array - java

I have a byte array which I'm encrypting then converting to a string so it can be transmitted. When I receive the string I then have to convert the string back into a byte array so it can be decrypted. I have checked that the received string matches the sent string (including length) but when I use something like str.getBytes() to convert it to a byte array, it does not match my original byte array.
example output:
SENT: WzShnf/fOV3NZO2nqnOXZbM1lNwVpcq3qxmXiiv6M5xqC1A3
SENT STR: [B#3e4a9a7d
RECEIVED STR: [B#3e4a9a7d
RECEIVED: W0JAM2U0YTlhN2Q=
any ideas how i can convert the received string to a byte array which matches the sent byte array?
Thanks

You used array.toString(), which is implemented like this:
return "[B#" + Integer.toString(this.hashCode(), 16);
(In fact it inherits the definition from Object, and the part before the # simply is the result of getClass().getName().)
And the hashCode here does not depend on the content.
Instead, use new String(array, encoding).
Of course, this only works for byte-arrays which are really representable as Java strings (which then contain readable characters), not for arbitrary arrays. There better use base64 like Bozho recommended (but make sure to use it on both sides of the channel).

This looks like Base64. Take a look at commons-codec Base64 class.

You can't just use getBytes() on two different machines, since getBytes uses the plattform's default charset.
Decode and encode the array with a specified charset (i.e. UTF-8) to make sure you get the correct results.

First do convertion of your byte array to proper string, by doing
String line= new String(Arrays.toString(your_array))
Then send it and use function below
public static byte[] StringToByteArray(String line)
{
String some=line.substring(1, line.length()-1);
int element_counter=1;
for(int i=0; i<some.length(); i++)
{
if (some.substring(i, i+1).equals(","))
{
element_counter++;
}
}
int [] comas =new int[element_counter-1];
byte [] a=new byte[element_counter];
if (a.length==1)
{
a[0]= Byte.parseByte(some.substring(0));
}
else
{
int j=0;
for (int i = 0; i < some.length(); i++)
{
if (some.substring(i, i+1).equals(","))
{
comas[j]=i;
j++;
}
}
for (int i=0; i<element_counter; i++)
{
if(i==0)
{
a[i]=Byte.parseByte(some.substring(0, comas[i]));
}
else if (i==element_counter-1)
{
a[i]=Byte.parseByte(some.substring(comas[comas.length-1]+2));
}
else
{
a[i]=Byte.parseByte(some.substring(comas[i-1]+2, comas[i]));
}
}
}
return a;
}

Related

How to convert a Base64 String to a byte array in java (signed char)?

I need to convert a Base64 String (UTF-8) to an array of bytes in java. The logic I've used was not able to accomplish that properly. I guess this code does not take into consideration the indication for signed or unsigned char. I runs with no erros, but the results are not the expected ones. How Can I get the byte array considering possible negative values?
Base64 decoder = new Base64();
byte[] decodedBytes = (byte[]) decoder.decode(getInfo().getBytes());
for (int i = 0; i < decodedBytes.length; i++) {
System.out.println(decodedBytes[i]);
}
private static String getInfo() {
return "+//q/93/4v/0//7/AAD0/+f/6v/7/wIA9P/g/9v/7P/x//H/5//b/+D/9v/+
//v/8f/g/+z/BQD+/+r/4P/i/+z/9v/5/+L/0f/q/wIABQD5/+z/5f/5/wcA
+f/l/+D/4v/v//v/7//d/93/9P8FAAIA+f/g/+f//v8FAPn/4v/b/+z/+//7
/+//5//x/wIABQD7/+r/1v/q//b/+f/s/+f/4v/s//n/9P/i/9j/7P/+/wAA
9P/q/+f/+f8FAPv/5//d/+X/9v8AAO//4P/b/+//9v/+//b/5f/v//v/AAD0
/+L/3f/q//H/+//s/93/5f/2/wAA+f/x/+X/7P/7//T/4P/R/93/6v/5//T/
4v/Y/+///v8CAPb/8f/n//T//v/0/93/4P/s/+//AAD2/+L/5f/x//v/+f/2
/+L/6v/+/wUA8f/l/93/6v/0//v/7P/b/+L/+f8AAPv/7P/b/+//AAD+/+r/
2//d/+z/+//2/+X/1v/q//v/+//0/+X/3f/x/wAA+f/l/93/6v/x//n/7//l
/+L/7/8CAPn/8f/g/+r//v8AAPT/4P/b/+L/7P/0/+f/1v/l//b/+f/0//H/
4P/n//7//v/q/9v/2//n//b/8f/d/9v/7P/5//v/9v/s/+D/9P8AAPn/5f/d
/+D/5//2//H/4P/i//H/AAD7//T/5//v//v//v/v/9v/0f/i/+z/9v/l/9P/
3f/0//7/+//v/+L/7P8HAAUA7P/d/9v/4v/2/+//4P/Y/+z/AAAAAPn/7P/g
/+//AgD5/+f/3f/l//H//v/0/+D/3f/s/wAAAAD2/+L/5//7//v/8f/b/9j/
5f/x//n/7P/b/+r/+/8FAPv/7//d//H//v/7/+z/2//d/+r/+//7/+f/3f/s
/wIABQAAAPH/6v/x/wUAAADq/+L/7//s//v/8f/l/+L/7/8AAP7/9v/l/+z/
AAAKAPv/6v/l/+//+f8AAPH/2//i//b/BQAFAPv/7P/x/xsAFAD5/+r/7P/5
/wUA+//v/93/7P/7/wUA+f/v/+X/8f/7//b/5f/b/93/5f/2//H/4v/b//H/
AAD7/+//4v/q//b/+//v/+X/2//i/+//9P/n/93/6v/2//n/9v/x/93/6v/5
//T/4v/W/9v/6v/5//H/5//d/+r//v8AAPT/6v/b/+r/+f/x/+X/3f/l/+z/
+f/x/9v/3f/s/wAA+f/x/+D/5f/v//b/8f/i/+f/5f/2/wAA+f/x//7/DAAU
ABQAEQAHABsAOQBDADkANABNAGYAjgCnAKIArAC+AOgA+gD6APoA+gAVAScB
HwEaAQsBBAEBAQYB5gDFAKwAnQCTAH0AXABBADQAMgAgAAoA6v/i/+r/6v/s
/9P/uv+//9H/zv+//7D/nP+p/8T/xP+z/6n/q/+1/8L/xP+w/6n/uv/M/8z/
x/+9/7r/0f/g/9b/xP+//8f/1v/v/+f/0//R/93/8f/v/+D/zv/W/+f/7//d
/87/zP/Y/+f/9P/l/8z/2P/s//n/9P/d/8z/4v/2/+//4v/W/9P/3f/v/+r/
3f/O/+D/9P/0/+X/2//T/+f/+f/v/93/0f/b/+f/9P/s/9H/0//l//v/+//s
/9v/5//s//T/6v/R/87/2//q//T/4v/W/+D/8f/7/+z/5f/T/+L/8f/x/+L/
zv/O/9j/6v/n/93/1v/g//T/+//s/+L/1v/l//b/7//b/87/2P/l/+//5//M
/87/2//0//v/7//Y/9v/6v/5/+r/2P/R/93/5f/q/9j/xP/O/+r/9v/x/+//
3f/i//v/9P/i/9H/zv/i//T/5//Y/8z/2P/x//v/7P/l/93/8f/+//T/4P/W
/93/4P/q/+z/0//W/+f/+f/0/+z/3f/l//T/AADv/9v/3f/n//H/+//q/9b/
4P/5/wUAAAD0/+D/7P8AAAAA7P/g/+L/8f/+//v/8f/l//T/AgAFAPn/8f/s
/wIAFAAKAPT/5//v//v/DwAKAO//7P/+/xQAFAAFAPb//v8MABsAEQD7/+//
/v8KABEABQD+/wcAFgAjACUAGwAKAA8AIwAgABEABQAAAA8AHgAZAAcA/v8P
ACMAJQAZAAoAAgAWACAAFAAHAAAADAAUACMAGwAHAAUADwAlACMAHgAFAAwA
HgAoABsABwD+/wUADAAWAAoA9v8AAAwAGQARAAoA+f/7/w8AEQAFAPH/8f/+
/woABQD2/+z/AAARAA8AAAD5/+z//v8HAAAA8f/n/+z/7/8AAAUA5//i//T/
AgAAAPv/7//0/wAACgD7/+z/4P/n/+r/9v/q/93/6v/5/wIA/v/5/93/6v/7
/wAA8f/i/9v/5f/x//b/6v/d/+z/+/8CAPb/5f/W/+r/+f/0/93/2//l/+X/
7//2/+z/9v/q//v/AAAHAO//6v/5/wAAAADl/+X/7P/l/+//7P/g/+z/8f/5
//7//v/v/+X/9v/x//T/8f/s/93/9P/5/+z/2//v//b/7//+//n/6v/v//7/
8f/l/+r/4v/i//H/+f/d/+r/7P8AAPv/BQD7//H/+/8CAPb/7P/x/+r/7//7
//H/5f/x//v/9P/7/wAA7//s/wAA+f/s/+z/6v/s//b//v/x/93/6v/0/+z/
/v8AAPH/7//5//H/7//0/+f/7P/5//n/5//q/+//8f/2/wAAAAD0//b/9v/s
/+//9P/q/+//+f/7/+X/9P/7/wIAAAAFAO//5//2/+z/8f/l/+//7P/0//n/
7//i//H/+f/s//b/+f/n/+z/AgD5//H/8f/n/+f/9P/7/+r/7P/x//T/+f/+
/+//7P/2/wAA9P/q//H/6v/x//b/7//q//H/+//0//n/9v/n/+X/+f/2/+//
5//n/93/7//0//H/4P/s//T/7//5//H/7P/0//7/8f/l/+r/6v/l//v/+//s
/+f/6v/2//T/9v/s/+f/9P/5//T/6v/s/+z/5//0/+//5//x//T/6v/7//7/
7P/n//T/9v/v/+z/5//g/+r/7P/n/93/7P/0/+///v/0/+D/5f/s/+L/4v/s
/+X/5//7//n/5//l/+r/+f/2/wIA8f/q//H/9v/0/+r/5//q/+r/8f/5/+z/
+//v/wAA/v8FAO//4v8KAAUA7P/v/+z/+f/+/wAA8f/x//T/AAACAAAA9v/s
//7/CgAAAN3/7//7//7/";
}
I say the results are not the expected because I have to make a graphic with the array values and compare it to another existing one and currently the drawings are different.
Updating... What if I wanted to get 16 bits at a time instead of only 8?

Using charAt() to change a char array

I'm utterly boggled as to why charAt() works in some scenarios but not others. I am doing exercises while learning Java and one of them was to take a string, and return it in reverse order.
My working code:
public String reverseString(String tempStr){
int initialindex = tempStr.length()-1;
int reverseindex = 0;
char tmp;
char[] array = new char[tempStr.length()];
for(int tempchar : array){
tmp = tempStr.charAt(initialindex);
array[reverseindex] = tmp;
initialindex--;
reverseindex++;
}
String returnstr = new String(array);
return returnstr;
}
The problem I ran into is using the following for statement prints gibberish:
for(int tempchar : array){
array[reverseindex] = tempStr.charAt(initialindex);
initialindex--;
reverseindex++;
}
There were perhaps a dozen different variants of using while loops, standard for loops and a few other versions of code that were ugly and didn't work. Why did my making a char tmp field, putting the inspected characrer in said field, and then using said field to enter the data into an array work?
Also, why am I unable to just return the string using return array.toString();?
Edit: I'm using the latest Eclipse I downloaded today, switched from netbeans.
I copied your code into my editor and it performed fine using either version, with tmp field or without. You must have made some other error using the other method.
Java doesn't support pretty .toString() for arrays; any object which does not override toString will produce the hashCode of the object rather than the contents/fields of the object, and arrays are no exception here. Whilst it might seem sensible for character arrays, the same operation on an int array would produce nonsense; See the difference between Arrays.toString() and String.valueOf(array). In this case, you probably want to use the String.valueOf method.
The array.toString() return string representation of the object. You need to use char[] constructor of String new String(array) to create String from the char[].
As a hint to get you started: if you want to convert a char array into a String use the String constructor that takes a char array.
Update: I see you already did that in your edit. Does it work as expected now?
Your loop looks a little bit weird since you never use your loop variable. you could try this:
char[] initialArray = initialStr.toCharArray();
char[] array = new char[tempStr.length()];
for(int srcIndex = 0, destIndex = array.length-1; destIndex >= 0; srcIndex++, destIndex--) {
array[destIndex] = initialArray[srcIndex];
}
public String reverse(String str)
{
if(str == null)
{
return null;
}
byte[] byteArray= str.getBytes();
int arrayLastIndex = byteArray.length -1 ;
for(int i=0 ; i < byteArray.lenght/2: i++)
{
byte temp = byteArray[i];
byteArray[i] = byteArray[arrayLastIndex -i ]
byteArray[arrayLastIndex - i] = temp;
}
return new String(byteArray);
}

Java bug? Why extra zero byte in utf8 encoding?

The following code
public class CharsetProblem {
public static void main(String[] args) {
//String str = "aaaaaaaaa";
String str = "aaaaaaaaaa";
Charset cs1 = Charset.forName("ASCII");
Charset cs2 = Charset.forName("utf8");
System.out.println(toHex(cs1.encode(str).array()));
System.out.println(toHex(cs2.encode(str).array()));
}
public static String toHex(byte[] outputBytes) {
StringBuilder builder = new StringBuilder();
for(int i=0; i<outputBytes.length; ++i) {
builder.append(String.format("%02x", outputBytes[i]));
}
return builder.toString();
}
}
returns
61616161616161616161
6161616161616161616100
i.e. utf8 encoding returns excess byte. If we take less a-s, then we'll have no excess bytes. If we take more a-s we can get more and more excess bytes.
Why?
How one can workaround this?
You can't just get the backing array and use it. ByteBuffers have a capacity, position and a limit.
System.out.println(cs1.encode(str).remaining());
System.out.println(cs2.encode(str).remaining());
produces:
10
10
Try this instead:
public static void main(String[] args) {
//String str = "aaaaaaaaa";
String str = "aaaaaaaaaa";
Charset cs1 = Charset.forName("ASCII");
Charset cs2 = Charset.forName("utf8");
System.out.println(toHex(cs1.encode(str)));
System.out.println(toHex(cs2.encode(str)));
}
public static String toHex(ByteBuffer buff) {
StringBuilder builder = new StringBuilder();
while (buff.remaining() > 0) {
builder.append(String.format("%02x", buff.get()));
}
return builder.toString();
}
It produces the expected:
61616161616161616161
61616161616161616161
You're assuming that the backing array for a ByteBuffer is precisely the correct size to hold the contents, but it's not necessarily. In fact, the contents don't even need to start at the first byte of the array! Study the API for ByteBuffer and you'll understand what's going on: the contents start at the value returned by arrayOffset(), and the end returned by limit().
The answer has already been given, but as I ran into the same problem, I think it might be useful to provide more details:
The byte array returned by invoking cs1.encode(str).array() or cs2.encode(str).array() returns a reference to the whole array allocated to the ByteBuffer at that time. The capacity of the array may be greater than what's actually used. To retrieve only the used portion you should do something like the following:
ByteBuffer bf1 = cs1.encode(str);
ByteBuffer bf2 = cs2.encode(str);
System.out.println(toHex(Arrays.copyOf(bf1.array(), bf1.limit())));
System.out.println(toHex(Arrays.copyOf(bf2.array(), bf2.limit())));
This yields the result you expect.

How to detect end of string in byte array to string conversion?

I receive from socket a string in a byte array which look like :
[128,5,6,3,45,0,0,0,0,0]
The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.
If i simply do :
String myString = new String(myBuffer);
I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).
To get the correct size and the correct string i do this :
int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
byte charac = datasRec[j];
if(charac == 0)
break;
sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
label[j] = datasRec[j];
}
String myString = new String(label);
Is there a better way to handle the problem ?
Thanks
May be its too late, But it may help others. The simplest thing you can do is new String(myBuffer).trim() that gives you exactly what you want.
0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.
If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:
int size = 0;
while (size < data.length)
{
if (data[size] == 0)
{
break;
}
size++;
}
// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");
I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.
If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.
You can always start at the end of the byte array and go backwards until you hit the first non-zero. Then just copy that into a new byte and then String it. Hope this helps:
byte[] foo = {28,6,3,45,0,0,0,0};
int i = foo.length - 1;
while (foo[i] == 0)
{
i--;
}
byte[] bar = Arrays.copyOf(foo, i+1);
String myString = new String(bar, "UTF-8");
System.out.println(myString.length());
Will give you a result of 4.
Strings in Java aren't ended with a 0, like in some other languages. 0 will get turned into the so-called null character, which is allowed to appear in a String. I suggest you use some trimming scheme that either detects the first index of the array that's a 0 and uses a sub-array to construct the String (assuming all the rest will be 0 after that), or just construct the String and call trim(). That'll remove leading and trailing whitespace, which is any character with ASCII code 32 or lower.
The latter won't work if you have leading whitespace you must preserve. Using a StringBuilder and deleting characters at the end as long as they're the null character would work better in that case.
It appears to me that you are ignoring the read-count returned by the read() method. The trailing null bytes probably weren't sent, they are probably still left over from the initial state of the buffer.
int count = in.read(buffer);
if (count < 0)
; // EOS: close the socket etc
else
String s = new String(buffer, 0, count);
Not to dive into the protocol considerations that the original OP mentioned, how about this for trimming the trailing zeroes ?
public static String bytesToString(byte[] data) {
String dataOut = "";
for (int i = 0; i < data.length; i++) {
if (data[i] != 0x00)
dataOut += (char)data[i];
}
return dataOut;
}

Can a empty java string be created from non-empty UTF-8 byte array?

I'm trying to debug something and I'm wondering if the following code could ever return true
public boolean impossible(byte[] myBytes) {
if (myBytes.length == 0)
return false;
String string = new String(myBytes, "UTF-8");
return string.length() == 0;
}
Is there some value I can pass in that will return true? I've fiddled with passing in just the first byte of a 2 byte sequence, but it still produces a single character string.
To clarify, this happened on a PowerPC chip on Java 1.4 code compiled through GCJ to a native binary executable. This basically means that most bets are off. I'm mostly wondering if Java's 'normal' behaviour, or Java's spec made any promises.
According to the javadoc for java.util.String, the behavior of new String(byte[], "UTF-8") is not specified when the bytearray contains invalid or unexpected data. If you want more predictability in your resultant string use http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/CharsetDecoder.html.
Possibly.
From the Java 5 API docs "The behavior of this constructor when the given bytes are not valid in the given charset is unspecified."
I guess that it depends on :
Which version of java you're using
Which vendor wrote your JVM (Sun, HP, IBM, the open source one, etc)
Once the docs say "unspecified" all bets are off
Edit: Beaten to it by Trey
Take his advice about using a CharsetDecoder
If Java handles the BOM mark correctly (which I'm not sure whether they have fixed it yet), then it should be possible to input a byte array with just the BOM (U+FEFF, which is in UTF-8 the byte sequence EF BB BF) and to get an empty string.
Update:
I tested that method with all values of 1-3 bytes. None of them returned an empty string on Java 1.6. Here is the test code that I used with different byte array lenghts:
public static void main(String[] args) throws UnsupportedEncodingException {
byte[] test = new byte[3];
byte[] end = new byte[test.length];
if (impossible(test)) {
System.out.println(Arrays.toString(test));
}
do {
increment(test, 0);
if (impossible(test)) {
System.out.println(Arrays.toString(test));
}
} while (!Arrays.equals(test, end));
}
private static void increment(byte[] arr, int i) {
arr[i]++;
if (arr[i] == 0 && i + 1 < arr.length) {
increment(arr, i + 1);
}
}
public static boolean impossible(byte[] myBytes) throws UnsupportedEncodingException {
if (myBytes.length == 0) {
return false;
}
String string = new String(myBytes, "UTF-8");
return string.length() == 0;
}
UTF-8 is a variable length encoding scheme, with most "normal" characters being single byte. So any given non-empty byte[] will always translate into a String, I'd have thought.
If you want to play it says, write a unit test which iterates over every possible byte value, passing in a single-value array of that value, and assert that the string is non-empty.

Categories