Java Byte Array to String to Byte Array - java

I'm trying to understand a byte[] to string, string representation of byte[] to byte[] conversion... I convert my byte[] to a string to send, I then expect my web service (written in python) to echo the data straight back to the client.
When I send the data from my Java application...
Arrays.toString(data.toByteArray())
Bytes to send..
[B#405217f8
Send (This is the result of Arrays.toString() which should be a string representation of my byte data, this data will be sent across the wire):
[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97]
On the python side, the python server returns a string to the caller (which I can see is the same as the string I sent to the server
[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97]
The server should return this data to the client, where it can be verified.
The response my client receives (as a string) looks like
[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97]
I can't seem to figure out how to get the received string back into a
byte[]
Whatever I seem to try I end up getting a byte array which looks as follows...
[91, 45, 52, 55, 44, 32, 49, 44, 32, 49, 54, 44, 32, 56, 52, 44, 32, 50, 44, 32, 49, 48, 49, 44, 32, 49, 49, 48, 44, 32, 56, 51, 44, 32, 49, 49, 49, 44, 32, 49, 48, 57, 44, 32, 49, 48, 49, 44, 32, 51, 50, 44, 32, 55, 56, 44, 32, 55, 48, 44, 32, 54, 55, 44, 32, 51, 50, 44, 32, 54, 56, 44, 32, 57, 55, 44, 32, 49, 49, 54, 44, 32, 57, 55, 93]
or I can get a byte representation which is as follows:
B#2a80d889
Both of these are different from my sent data... I'm sure Im missing something truly simple....
Any help?!

You can't just take the returned string and construct a string from it... it's not a byte[] data type anymore, it's already a string; you need to parse it. For example :
String response = "[-47, 1, 16, 84, 2, 101, 110, 83, 111, 109, 101, 32, 78, 70, 67, 32, 68, 97, 116, 97]"; // response from the Python script
String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];
for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}
String str = new String(bytes);
** EDIT **
You get an hint of your problem in your question, where you say "Whatever I seem to try I end up getting a byte array which looks as follows... [91, 45, ...", because 91 is the byte value for [, so [91, 45, ... is the byte array of the string "[-45, 1, 16, ..." string.
The method Arrays.toString() will return a String representation of the specified array; meaning that the returned value will not be a array anymore. For example :
byte[] b1 = new byte[] {97, 98, 99};
String s1 = Arrays.toString(b1);
String s2 = new String(b1);
System.out.println(s1); // -> "[97, 98, 99]"
System.out.println(s2); // -> "abc";
As you can see, s1 holds the string representation of the array b1, while s2 holds the string representation of the bytes contained in b1.
Now, in your problem, your server returns a string similar to s1, therefore to get the array representation back, you need the opposite constructor method. If s2.getBytes() is the opposite of new String(b1), you need to find the opposite of Arrays.toString(b1), thus the code I pasted in the first snippet of this answer.

String coolString = "cool string";
byte[] byteArray = coolString.getBytes();
String reconstitutedString = new String(byteArray);
System.out.println(reconstitutedString);
That outputs "cool string" to the console.
It's pretty darn easy.

What I did:
return to clients:
byte[] result = ****encrypted data****;
String str = Base64.encodeBase64String(result);
return str;
receive from clients:
byte[] bytes = Base64.decodeBase64(str);
your data will be transferred in this format:
OpfyN9paAouZ2Pw+gDgGsDWzjIphmaZbUyFx5oRIN1kkQ1tDbgoi84dRfklf1OZVdpAV7TonlTDHBOr93EXIEBoY1vuQnKXaG+CJyIfrCWbEENJ0gOVBr9W3OlFcGsZW5Cf9uirSmx/JLLxTrejZzbgq3lpToYc3vkyPy5Y/oFWYljy/3OcC/S458uZFOc/FfDqWGtT9pTUdxLDOwQ6EMe0oJBlMXm8J2tGnRja4F/aVHfQddha2nUMi6zlvAm8i9KnsWmQG//ok25EHDbrFBP2Ia/6Bx/SGS4skk/0couKwcPVXtTq8qpNh/aYK1mclg7TBKHfF+DHppwd30VULpA==

What Arrays.toString() does is create a string representation of each individual byte in your byteArray.
Please check the API documentation
Arrays API
To convert your response string back to the original byte array, you have to use split(",") or something and convert it into a collection and then convert each individual item in there to a byte to recreate your byte array.

Its simple to convert byte array to string and string back to byte array in java. we need to know when to use 'new' in the right way.
It can be done as follows:
byte array to string conversion:
byte[] bytes = initializeByteArray();
String str = new String(bytes);
String to byte array conversion:
String str = "Hello"
byte[] bytes = str.getBytes();
For more details, look at:
http://evverythingatonce.blogspot.in/2014/01/tech-talkbyte-array-and-string.html

The kind of output you are seeing from your byte array ([B#405217f8) is also an output for a zero length byte array (ie new byte[0]). It looks like this string is a reference to the array rather than a description of the contents of the array like we might expect from a regular collection's toString() method.
As with other respondents, I would point you to the String constructors that accept a byte[] parameter to construct a string from the contents of a byte array. You should be able to read raw bytes from a socket's InputStream if you want to obtain bytes from a TCP connection.
If you have already read those bytes as a String (using an InputStreamReader), then, the string can be converted to bytes using the getBytes() function. Be sure to pass in your desired character set to both the String constructor and getBytes() functions, and this will only work if the byte data can be converted to characters by the InputStreamReader.
If you want to deal with raw bytes you should really avoid using this stream reader layer.

Can you not just send the bytes as bytes, or convert each byte to a character and send as a string? Doing it like you are will take up a minimum of 85 characters in the string, when you only have 11 bytes to send. You could create a string representation of the bytes, so it'd be "[B#405217f8", which can easily be converted to a bytes or bytearray object in Python. Failing that, you could represent them as a series of hexadecimal digits ("5b42403430353231376638") taking up 22 characters, which could be easily decoded on the Python side using binascii.unhexlify().

[JDK8]
import java.util.Base64;
To string:
String str = Base64.getEncoder().encode(new byte[]{ -47, 1, 16, ... });
To byte array:
byte[] bytes = Base64.getDecoder().decode("JVBERi0xLjQKMyAwIG9iago8P...");

If you want to convert the string back into a byte array you will need to use String.getBytes() (or equivalent Python function) and this will allow you print out the original byte array.

Use the below code API to convert bytecode as string to Byte array.
byte[] byteArray = DatatypeConverter.parseBase64Binary("JVBERi0xLjQKMyAwIG9iago8P...");

[JAVA 8]
import java.util.Base64;
String dummy= "dummy string";
byte[] byteArray = dummy.getBytes();
byte[] salt = new byte[]{ -47, 1, 16, ... }
String encoded = Base64.getEncoder().encodeToString(salt);

You can do the following to convert byte array to string and then convert that string to byte array:
// 1. convert byte array to string and then string to byte array
// convert byte array to string
byte[] by_original = {0, 1, -2, 3, -4, -5, 6};
String str1 = Arrays.toString(by_original);
System.out.println(str1); // output: [0, 1, -2, 3, -4, -5, 6]
// convert string to byte array
String newString = str1.substring(1, str1.length()-1);
String[] stringArray = newString.split(", ");
byte[] by_new = new byte[stringArray.length];
for(int i=0; i<stringArray.length; i++) {
by_new[i] = (byte) Integer.parseInt(stringArray[i]);
}
System.out.println(Arrays.toString(by_new)); // output: [0, 1, -2, 3, -4, -5, 6]
But to convert the string to byte array and then convert that byte array to string, below approach can be used:
// 2. convert string to byte array and then byte array to string
// convert string to byte array
String str2 = "[0, 1, -2, 3, -4, -5, 6]";
byte[] byteStr2 = str2.getBytes(StandardCharsets.UTF_8);
// Now byteStr2 is [91, 48, 44, 32, 49, 44, 32, 45, 50, 44, 32, 51, 44, 32, 45, 52, 44, 32, 45, 53, 44, 32, 54, 93]
// convert byte array to string
System.out.println(new String(byteStr2, StandardCharsets.UTF_8)); // output: [0, 1, -2, 3, -4, -5, 6]
I have also answered the same in the following question:
https://stackoverflow.com/a/70486387/17364272

Related

String (bytes[] Charset) is returning results differently in Java7 and java 8

import java.io.UnsupportedEncodingException;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
public class Java87String {
public static void main(String[] args) throws UnsupportedEncodingException {
// TODO Auto-generated method stub
//byte[] b = {-101, 53, -51, -26, 24, 60, 20, -31, -6, 45, 50, 103, -66, 28, 114, -39, 92, 23, -47, 32, -5, -122, -28, 79, 22, -76, 116, -122, -54, -122};
//byte[] b = {-76, -55, 85, -50, 80, -23, 27, 62, -94, -74, 47, -123, -119, 94, 90, 61, -63, 73, 56, -48, -54, -4, 11, 79};
byte[] b = { -5, -122, -28};
System.out.println("Input Array :" + Arrays.toString(b));
System.out.println("Array Length : " + b.length);
String target = new String(b,StandardCharsets.UTF_8);
System.out.println(Arrays.toString(target.getBytes("UTF-8")));
System.out.println("Final Key :" + target);
}
}
The above code returns the following output in Java 7
Input Array :[-5, -122, -28]
Array Length : 3
[-17, -65, -67]
Final Key :�
The Same code returns the following output in Java 8
Input Array :[-5, -122, -28]
Array Length : 3
[-17, -65, -67, -17, -65, -67, -17, -65, -67]
Final Key :���
Sounds like Java8 is doing the right thing of replacing with the default sequence of [-17, -65, -67].
Why is there a difference in output and Any Known bugs in JDK 1.7 which fixes this issue?
Per the String JavaDoc:
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified. The CharsetDecoder class should be used when more control over the decoding process is required.
I think (-5, -122, -28) is a invalid UTF-8 byte sequence, so the JVM may output anything in this case. If it were a valid one, maybe the different Java versions could show the same output.
Does this specific byte sequence have a meaning? just curious

Retrieving bytes of String returns different results in ObjC than Java

I've got a string that I'm trying to convert to bytes in order to create an md5 hash in both ObjC and Java. For some reason, the bytes are different between the two languages.
Java
System.out.println(Arrays.toString(
("78b4a02fa139a2944f17b4edc22fb175:8907f3c4861140ad84e20c8e987eeae6").getBytes()));
Output:
[55, 56, 98, 52, 97, 48, 50, 102, 97, 49, 51, 57, 97, 50, 57, 52, 52, 102, 49, 55, 98, 52, 101, 100, 99, 50, 50, 102, 98, 49, 55, 53, 58, 56, 57, 48, 55, 102, 51, 99, 52, 56, 54, 49, 49, 52, 48, 97, 100, 56, 52, 101, 50, 48, 99, 56, 101, 57, 56, 55, 101, 101, 97, 101, 54]
ObjC
NSString *str = #"78b4a02fa139a2944f17b4edc22fb175:8907f3c4861140ad84e20c8e987eeae6";
NSData *bytes = [str dataUsingEncoding:NSISOLatin1StringEncoding allowLossyConversion:NO];
NSLog(#"%#", [bytes description]);
Output:
<37386234 61303266 61313339 61323934 34663137 62346564 63323266 62313735 3a383930 37663363 34383631 31343061 64383465 32306338 65393837 65656165 36>
I've tried using different charsets with no luck and can't think of any other reasons why the bytes would be different. Any ideas? I did notice that all of the byte values are different by some factor of 18 but am not sure what is causing it.
Actually, Java is printing in decimal, byte by byte. Obj C is printing in hex, integer by integer.
Referring this chart:
Dec Hex
55 37
56 38
98 62
...
You'll just have to find a way to output byte by byte in Obj C.
I don't know about Obj C, but if that NSLog function works similar to printf() in C, I'd start with that.
A code snippet from Apple
unsigned char aBuffer[20];
NSString *myString = #"Test string.";
const char *utfString = [myString UTF8String];
NSData *myData = [NSData dataWithBytes: utfString length: strlen(utfString)];
[myData getBytes:aBuffer length:20];
The change in bytes can be due to Hex representation. The above code shows how to convert the string to bytes and store the result in a buffer.

Java get string from byte array

I am modding a java program and in it a handler receives 2 byte arrays
When I print those arrays using a line of code like this\
java.util.Arrays.toString(this.part1))
I get an output like this
[43, 83, 123, 97, 104, -10, -4, 124, -113, -56, 118, -23, -25, -13, -9, -85, 58, -66, -34, 38, -55, -28, -40, 125, 22, -83, -72, -93, 73, -117, -59, 72, 105, -17, 3, -53, 121, -21, -19, 103, 101, -71, 54, 37...
I know these byte arrays contain a string. How might I get that string from them?
Here is the code
public void readPacketData(PacketBuffer data) throws IOException
{
this.field_149302_a = data.readByteArray();
this.field_149301_b = data.readByteArray();
String packet1 = (java.util.Arrays.toString(this.field_149302_a));
String packet2 = (java.util.Arrays.toString(this.field_149301_b));
}
In order to convert Byte array into String format correctly, we have to explicitly create a String object and assign the Byte array to it. You can try this:
String str = new String(this.part1, "UTF-8"); //for UTF-8 encoding
System.out.println(str);
Please note that the byte array contains characters in a special encoding (that you must know).
String has a constructor from byte[], so you could just call new String(this.part1), or, if the bytes do not represent a string in the platform's default charster, use the overloaded flavor and pass the charset too.
actually to convert bytes to String you need encoding name. You need to change UTF-8 to correct encoding name in first answer to avoid wrong output, try UTF-16 or one of https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html (try to choose by your locale).

How to decode a Base64 string in Scala or Java?

I have a string encoded in Base64:
eJx9xEERACAIBMBKJyKDcTzR_hEsgOxjAcBQFVVNvi3qEsrRnWXwbhHOmzWnctPHPVkPu-4vBQ==
How can I decode it in Scala language?
I tried to use:
val bytes1 = new sun.misc.BASE64Decoder().decodeBuffer(compressed_code_string)
But when I compare the byte array with the correct one that I generated in Python language, there is an error. Here is the command I used in python:
import base64
base64.urlsafe_b64decode(compressed_code_string)
The Byte Array in Scala is:
(120, -100, 125, -60, 65, 17, 0, 32, 8, 4, -64, 74, 39, 34, -125, 113, 60, -47, -2, 17, 44, -128, -20, 99, 1, -64, 80, 21, 85, 77, -66, 45, -22, 18, -54, -47, -99, 101, -16, 110, 17, -50, -101, 53, -89, 114, -45, -57, 61, 89, 15, -69, -2, 47, 5)
And the one generated in python is:
(120, -100, 125, -60, 65, 17, 0, 32, 8, 4, -64, 74, 39, 34, -125, 113, 60, -47, -2, 17, 44, -128, -20, 99, 1, -64, 80, 21, 85, 77, -66, 45, -22, 18, -54, -47, -99, 101, -16, 110, 17, -50, -101, 53, -89, 114, -45, -57, 61, 89, 15, -69, -18, 47, 5)
Note that there is a single difference in the end of the array
In Scala, Encoding a String to Base64 and decoding back to the original String using Java APIs:
import java.util.Base64
import java.nio.charset.StandardCharsets
scala> val bytes = "foo".getBytes(StandardCharsets.UTF_8)
bytes: Array[Byte] = Array(102, 111, 111)
scala> val encoded = Base64.getEncoder().encodeToString(bytes)
encoded: String = Zm9v
scala> val decoded = Base64.getDecoder().decode(encoded)
decoded: Array[Byte] = Array(102, 111, 111)
scala> val str = new String(decoded, StandardCharsets.UTF_8)
str: String = foo
There is unfortunately not just one Base64 encoding. The - character doesn't have the same representation in all encodings. For example, in the MIME encoding, it's not used at all. In the encoding for URLs, it is a value of 62--and this is the one that Python is using. The default sun.misc decoder wants + for 62. If you change the - to +, you get the correct answer (i.e. the Python answer).
In Scala, you can convert the string s to MIME format like so:
s.map{ case '-' => '+'; case '_' => '/'; case c => c }
and then the Java MIME decoder will work.
Both Python and Java are correct in terms of the decoding. They are just using a different RFC for this purpose. Python library is using RFC 3548 and the used java library is using RFC 4648 and RFC 2045.
Changing the hyphen(-) into a plus(+) from your input string will make the both decoded byte data are similar.

byte[] to string and back to byte[]

I have a problem with interpreting a file. The file is builded as follow:
"name"-#-"date"-#-"author"-#-"signature"
The signature is a byte array. When i read the file back in i parse it to String en split it:
myFileInpuStream.read(fileContent);
String[] data = new String(fileContent).split("-#-");
If i look at the var fileContent i see that the bytes are al good.
But when i try to get the signature byte array:
byte[] signature= data[3].getBytes();
Sometimes i get wrong values of 63. I tried a few solutions with:
new String(fileContent, "UTF-8")
But no luck. Can someone help?
The signature is not a fixed length thus i can not do it hard coded...
Some extra info:
Original signature:
[48, 45, 2, 21, 0, -123, -3, -5, -115, 84, -86, 26, -124, -112,
75, -10, -1, -56, 40, 13, -46, 6, 120, -56, 100, 2, 20, 66, -92, -8,
48, -88, 101, 57, 56, 20, 125, -32, -49, -123, 73, 96, 76, -82, 81,
51, 69]
filecontent(var after reading):
... 48, 45, 2, 21, 0, -123, -3, -5, -115, 84, -86, 26, -124, -112,
75, -10, -1, -56, 40, 13, -46, 6, 120, -56, 100, 2, 20, 66, -92, -8,
48, -88, 101, 57, 56, 20, 125, -32, -49, -123, 73, 96, 76, -82, 81,
51, 69]
signature (after split and getBytes()):
[48, 45, 2, 21, 0, -123, -3, -5, 63, 84, -86, 26, -124, 63, 75,
-10, -1, -56, 40, 13, -46, 6, 120, -56, 100, 2, 20, 66, -92, -8, 48, -88, 101, 57, 56, 20, 125, -32, -49, -123, 73, 96, 76, -82, 81, 51, 69]
You can't access data[4] because you have 4 String in your table. So you can access data from 0 to 3.
data[0] = name
data[1] = date
data[2] = author
data[3] = signature
The solution :
byte[] signature = data[3].getBytes();
Edit: I think I finally understand what you are doing.
You have four parts: name, date, author, signature. The name and author are strings, the date is a date and the signature is a hashed or encrypted array of bytes. You want to store them as text in a file, separated by -#-. To do this, you first need to convert each to a valid string. Name and author are already strings. Converting a date to string is easy. Converting an array of bytes to string is not easy.
You can use base64 encoding to convert a byte array to a string. Use javax.xml.bind.DatatypeConverter printBase64Binary() for encoding and javax.xml.bind.DatatypeConverter parseBase64Binary() for decoding.
For example, if you have a name denBelg, date 2013-03-19, author Virtlink and this signature:
30 2D 02 15 00 85 FD FB 8D 54 AA 1A 84 90 4B F6 FF C8 28 0D D2 06 78 C8 64 02 14
42 A4 F8 30 A8 65 39 38 14 7D E0 CF 85 49 60 4C AE 51 33 45
Then, after concatenation and base64 encoding of the signature, the resulting string became, for example:
denBelg-#-20130319-#-Virtlink-#-MC0CFQCF/fuNVKoahJBL9v/IKA3SBnjIZAIUQqT4MKhlOTgUfeDPhUlgTK5RM0U=
Later, when you split the string on -#- you can decode the base64 signature part and get back an array of bytes.
Note that when the name or author can include -#- in their name, they can mess up your code. For example, if I set name as den-#-Belg then your code would fail.
Original post:
Java's String.getBytes() uses the platform default encoding for the string. Encoding is the way string characters are mapped to bytes values. So, depending on the platform the resulting bytes may be different.
Fix the encoding to UTF-8 and read it with the same encoding, and your problems will go away.
byte[] signature = data[3].getBytes("UTF-8");
String sigdata = new String(signature, "UTF-8");
0-???����T�?��K���(
�?x�d??B��0�e98?}�υI`L�Q3E
Your example represents some garbled mess of characters (is it encrypted or something?), but the bytes you highlighted show the problem:
You start with a byte value of -115. The minus indicates it is a byte value above 0x7F, whose character representation highly depends on the encoding used. Let's assume extended US-ASCII, then your byte represents (according to this table) the character ì (with an accent). Now when you decode it the decoder (depending on the encoding you use) might not understand the byte value 0x8D and instead represents it with a question mark ?. Note that the question mark is US-ASCII character 63, and that's where your 63 came from.
So make sure you use your encodings consistently and don't rely on the system's default.
Also, never use string encoding to decode byte arrays that do not represent strings (e.g. hashes or other cryptographic content).
According to your comment you are trying to read encrypted data (which are bytes) and converting them to a string using a decoder? That will never work in any way you expect it to. After you've encrypted something you have an array of bytes which you should store as-is. When you read them back, you have to put the bytes through a decrypter to regain the unencrypted bytes. Only if those decrypted bytes represent a string, then you can use an encoding to decode the string.
You're making extra work for yourself by converting these bytes into Strings by hand. Why aren't you doing it using the classes intended for this?
// get the file /logs/access.log
Path path = FileSystems.getRoot().getPath("logs", "access.log");
// open it, decoding UTF-8
BufferReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
// read a line of text, properly decoded
String line = reader.readLine();
Or, if you're in Java 6:
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("/logs/access.log"), "UTF-8"));
String line = reader.readLine();
Links:
Files.newBufferedReader
InputStreamReader
Sounds like an encoding issue to me.
First you need to know what encoding your file is using, and use that when reading the file.
Secondly, you say you signature is a byte array, but java strings are always unicode. If you want a different encoding (I'm guessing you want ASCII), you need to do getBytes("US-ASCII").
Of course, if your input was ascii, it would be strange that this could cause encoding issues.

Categories