I have a string, which is returned by the Jericho HTML parser and contains some Russian text. According to source.getEncoding() and the header of the respective HTML file, the encoding is Windows-1251.
How can I convert this string to something readable?
I tried this:
import java.io.UnsupportedEncodingException;
public class Program {
public void run() throws UnsupportedEncodingException {
final String windows1251String = getWindows1251String();
System.out.println("String (Windows-1251): " + windows1251String);
final String readableString = convertString(windows1251String);
System.out.println("String (converted): " + readableString);
}
private String convertString(String windows1251String) throws UnsupportedEncodingException {
return new String(windows1251String.getBytes(), "UTF-8");
}
private String getWindows1251String() {
final byte[] bytes = new byte[] {32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32};
return new String(bytes);
}
public static void main(final String[] args) throws UnsupportedEncodingException {
final Program program = new Program();
program.run();
}
}
The variable bytes contains the data shown in my debugger, it's the result of net.htmlparser.jericho.Element.getContent().toString().getBytes(). I just copy and pasted that array here.
This doesn't work - readableString contains garbage.
How can I fix it, i. e. make sure that the Windows-1251 string is decoded properly?
Update 1 (30.07.2015 12:45 MSK): When change the encoding in the call in convertString to Windows-1251, nothing changes. See the screenshot below.
Update 2: Another attempt:
Update 3 (30.07.2015 14:38): The texts that I need to decode correspond to the texts in the drop-down list shown below.
Update 4 (30.07.2015 14:41): The encoding detector (code see below) says that the encoding is not Windows-1251, but UTF-8.
public static String guessEncoding(byte[] bytes) {
String DEFAULT_ENCODING = "UTF-8";
org.mozilla.universalchardet.UniversalDetector detector =
new org.mozilla.universalchardet.UniversalDetector(null);
detector.handleData(bytes, 0, bytes.length);
detector.dataEnd();
String encoding = detector.getDetectedCharset();
System.out.println("Detected encoding: " + encoding);
detector.reset();
if (encoding == null) {
encoding = DEFAULT_ENCODING;
}
return encoding;
}
I fixed this problem by modifying the piece of code, which read the text from the web site.
private String readContent(final String urlAsString) {
final StringBuilder content = new StringBuilder();
BufferedReader reader = null;
InputStream inputStream = null;
try {
final URL url = new URL(urlAsString);
inputStream = url.openStream();
reader =
new BufferedReader(new InputStreamReader(inputStream);
String inputLine;
while ((inputLine = reader.readLine()) != null) {
content.append(inputLine);
}
} catch (final IOException exception) {
exception.printStackTrace();
} finally {
IOUtils.closeQuietly(reader);
IOUtils.closeQuietly(inputStream);
}
return content.toString();
}
I changed the line
new BufferedReader(new InputStreamReader(inputStream);
to
new BufferedReader(new InputStreamReader(inputStream, "Windows-1251"));
and then it worked.
(In the light of updates I deleted my original answer and started again)
The text which appears
пїЅпїЅпїЅпїЅпїЅпїЅ
is an accurate decoding of these byte values
-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67
(Padded at either end with 32, which is space.)
So either
1) The text is garbage or
2) The text is supposed to look like that or
3) The encoding is not Windows-1215
This line is notably wrong
return new String(windows1251String.getBytes(), "UTF-8");
Extracting the bytes out of a string and constructing a new string from that is not a way of "converting" between encodings. Both the input String and the output String use UTF-16 encoding internally (and you don't normally even need to know or care about that). The only times other encodings come into play are when text data is stored OUTSIDE of a string object - ie in your initial byte array. Conversion occurs when the String is constructed and then it is done. There is no conversion from one String type to another - they are all the same.
The fact that this
return new String(bytes);
does the same as this
return new String(bytes, "Windows-1251");
suggests that Windows-1251 is the platforms default encoding. (Which is further supported by your timezone being MSK)
Just to make sure you understand 100% how java deals with char and byte.
byte[] input = new byte[1];
// values > 127 become negative when you put them in an array.
input[0] = (byte)239; // the array contains value -17 now.
// but all 255 values are preserved.
// But if you cast them to integers, you should use their unsigned value.
// (casting alone isn't enough).
int output = input[0] & 0xFF; // output is 239 again
// you shouldn't cast directly from a single-byte to a char.
// because: char is 16-bit ; but you only want to use 1 byte ; unfortunately your negative values will be applied in the 2nd byte, and break it.
char corrupted = (char) input[0]; // char-code: 65519 (2 bytes are used)
char corrupted = (char) ((int)input[0]); // char-code: 66519 (2 bytes are used)
// just casting to an integer/character is ok for values < 0x7F though
// values < 0x7F are always positive, even when casted to byte
// AND the first 7-bits in any ascii-encodings (e.g. windows-1251) are identical.
byte simple = (byte) 'a';
char chr = (char) ascii_LT_7F; // will result in 'a' again
// But it's still more reliable to use the & 0xFF conversion.
// Because it ensures that your character can never be greater than char code 255 (a single byte), even when the byte is unexpectedly negative (> 0x7F).
char chr = (char) ((byte)simple & 0xFF); // also results in 'a'
// for value 239 (which is 0xEF) it's impossible though.
// a java char is 16-bit encoded internally, following the unicode character set.
// characters 0x00 to 0x7F are identical in most encodings.
// but e.g. 0xEF in windows-1251 does not match 0xEF in UTF-16.
// so, this is a bad idea.
char corrupted = (char) (input[0] & 0xFF);
// And that's something you can only fix by using encodings.
// It's good practice to use encodings really just ALWAYS.
// the encoding indicates what your bytes[] are encoded in NOW.
// your bytes will be converted to 16-bit characters.
String text = new String(bytes, "from-encoding");
// if you want to change that text back to bytes, use an encoding !!
// this time the encoding specifies is the TARGET-ENCODING.
byte[] bytes = text.getBytes("to-encoding");
I hope this helps.
As for the displayed values:
I can confirm that the byte[] is displayed correctly. I checked them in the Windows-1251 code page. (byte -17 = int 239 = 0xEF = char 'п')
In other words, your byte values are incorrect, or it's a different source-encoding.
Related
Since today I'm fronting a really weird error related to byte[] to String conversion.
Here is the code:
private static final byte[] test_key = {-112, -57, -45, 125, 91, 126, -118, 13, 83, -60, -119, 57, 38, 118, -115, -52, -92, 39, -24, 75, 59, -21, 88, 84, 66, -125};
public static void main(String[] args) {
byte[] encryptedArray = xor("ciao".getBytes(), test_key);
System.out.println("Encrypted arrray: " + Arrays.toString(encryptedArray));
final String encrypted = new String(encryptedArray);
System.out.println("Length: " + new String(encryptedArray).length());
System.out.println(Arrays.toString(encrypted.getBytes()));
System.out.println("Encrypted value: " + encrypted);
System.out.println("Decrypted value: " + new String(xor(encrypted.getBytes(), test_key)));
}
private static byte[] xor(byte[] data, byte[] key) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.length; i++) {
result[i] = (byte) (data[i] ^ key[i % key.length]);
}
return result;
}
My output is:
Encrypted arrray: [-13, -82, -78, 18]
Length: 2
[-17, -65, -67, 18]
Encrypted value: �
Decrypted value: xno
Why does length() return 2? What am I missing?
There is no 1-to-1 mapping between byte and char, rather it depends on the charset you use. Strings are logically chars sequences. So if you want to convert between chars and bytes, you need a character encoding, which specifies the mapping from chars to bytes, and vice versa. Your bytes in encryptedArray are first converted to Unicode string, which attempts to create UTF-8 char sequence from these bytes.
If you want to use String and revert back the exact bytes, you need to do a Base64 of the encryptedArray and then do a new String() of it:
String encoded = new String(Base64.getEncoder().encode(encryptedArray));
To retreive, just decode:
Base64.getDecoder().decode(encoded);
I just thought of a good way of showing what happens by simply replacing the new String(byte[]) method by another one, which is why I will answer the question. This one performs the same basic action as the constructor, with one change: it throws an exception if any invalid characters are found.
private static final byte[] test_key = {-112, -57, -45, 125, 91, 126, -118, 13, 83, -60, -119, 57, 38, 118, -115, -52, -92, 39, -24, 75, 59, -21, 88, 84, 66, -125};
public static void main(String[] args) throws Exception {
byte[] encryptedArray = xor("ciao".getBytes(), test_key);
System.out.println("Encrypted arrray: " + Arrays.toString(encryptedArray));
final String encrypted = new String(encryptedArray);
// original
System.out.println("Length: " + new String(encryptedArray).length());
// replacement
System.out.println("Length: " + decode(encryptedArray).length());
System.out.println(Arrays.toString(encrypted.getBytes()));
System.out.println("Encrypted value: " + encrypted);
System.out.println("Decrypted value: " + new String(xor(encrypted.getBytes(), test_key)));
}
private static String decode(byte[] encryptedArray) throws CharacterCodingException {
var decoder = Charset.defaultCharset().newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
var decoded = decoder.decode(ByteBuffer.wrap(encryptedArray));
return decoded.toString();
}
private static byte[] xor(byte[] data, byte[] key) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.length; i++) {
result[i] = (byte) (data[i] ^ key[i % key.length]);
}
return result;
}
The method is called decode because that's what you are actually doing: you are decoding the bytes to a text. A character encoding is the encoding of characters as bytes, which means that the opposite must be decoding after all.
As you will see, the above will first print out 2 if your platform uses the default UTF-8 encoding (Linux, Android, MacOS). You can get the same result by replacing Charset.defaultCharset() with StandardCharsets.UTF_8 on Windows which uses the Windows-1252 charset instead (a single byte encoding which is an expansion of Latin-1, which itself is an expansion of ASCII). However, it will generate the following exception if you use the decode method:
java.nio.charset.MalformedInputException: Input length = 3
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:815)
at StackExchange/com.stackexchange.so.ShowBadEncoding.decode(ShowBadEncoding.java:36)
at StackExchange/com.stackexchange.so.ShowBadEncoding.main(ShowBadEncoding.java:24)
Now maybe you'd expect 4 here, the size of the byte array. But note that UTF-8 characters may be encoded over multiple bytes. The error occurs not on the entire string, but on the last character it is trying to read. Obviously it is expecting a longer encoding based on the previous byte values.
If you replace REPORT with the default decoding action REPLACE (heh) you will see that the result is identical to the constructor, and length() will now return the value 2 again.
Of course, Topaco is correct when he says you need to use base 64 encoding. This encodes bytes to characters instead so that all of the meaning of the bytes is maintained, and the reverse is of course the decoding of text back to bytes.
The elements of a String are not bytes, they are chars. A char is not a byte.
There are many ways of converting a char to a sequence of bytes (i.e., many character-set encodings).
Not every sequence of chars can be converted to a sequence of bytes; there is not always a mapping for every char. It depends on your chosen character-set encoding.
Not every sequence of bytes can be converted to a String; the bytes have to be syntactically valid for the specified character set.
I have to convert a byte array to string in Android, but my byte array contains negative values.
If I convert that string again to byte array, values I am getting are different from original byte array values.
What can I do to get proper conversion? Code I am using to do the conversion is as follows:
// Code to convert byte arr to str:
byte[] by_original = {0,1,-2,3,-4,-5,6};
String str1 = new String(by_original);
System.out.println("str1 >> "+str1);
// Code to convert str to byte arr:
byte[] by_new = str1.getBytes();
for(int i=0;i<by_new.length;i++)
System.out.println("by1["+i+"] >> "+str1);
I am stuck in this problem.
Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:
byte[] bytes = {...}
String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding
There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.
The "proper conversion" between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
If you really must use a String to hold binary data then the safest way is to use Base64 encoding.
The root problem is (I think) that you are unwittingly using a character set for which:
bytes != encode(decode(bytes))
in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for "no such character". Naturally, when you then try to encode the characters as bytes the result will be different.
The solution is:
Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
Use the right character set for your byte data ... or alternatively one (such as "Latin-1" where all byte sequences map to valid Unicode characters.
If your bytes are (really) binary data and you want to be able to transmit / receive them over a "text based" channel, use something like Base64 encoding ... which is designed for this purpose.
For Java, the most common character sets are in java.nio.charset.StandardCharsets. If you are encoding a string that can contain any Unicode character value then UTF-8 encoding (UTF_8) is recommended.
If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 - more commonly just called "Latin 1" or simply "Latin" (ISO_8859_1). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won't see them in any output.
From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.
We just need to construct a new String with the array: http://www.mkyong.com/java/how-do-convert-byte-array-to-string-in-java/
String s = new String(bytes);
The bytes of the resulting string differs depending on what charset you use. new String(bytes) and new String(bytes, Charset.forName("utf-8")) and new String(bytes, Charset.forName("utf-16")) will all have different byte arrays when you call String#getBytes() (depending on the default charset)
Using new String(byOriginal) and converting back to byte[] using getBytes() doesn't guarantee two byte[] with equal values. This is due to a call to StringCoding.encode(..) which will encode the String to Charset.defaultCharset(). During this encoding, the encoder might choose to replace unknown characters and do other changes. Hence, using String.getBytes() might not return an equal array as you've originally passed to the constructor.
Why was the problem: As someone already specified:
If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
I was observing this problem when I was trying to create byte[] from a pdf file and then converting it to String and then taking the String as input and converting back to file.
So make sure your encoding and decoding logic is same as I did. I explicitly encoded the byte[] to Base64 and decoded it to create the file again.
Use-case:
Due to some limitation I was trying to sent byte[] in request(POST) and the process was as follows:
PDF File >> Base64.encodeBase64(byte[]) >> String >> Send in request(POST) >> receive String >> Base64.decodeBase64(byte[]) >> create binary
Try this and this worked for me..
File file = new File("filePath");
byte[] byteArray = new byte[(int) file.length()];
try {
FileInputStream fileInputStream = new FileInputStream(file);
fileInputStream.read(byteArray);
String byteArrayStr= new String(Base64.encodeBase64(byteArray));
FileOutputStream fos = new FileOutputStream("newFilePath");
fos.write(Base64.decodeBase64(byteArrayStr.getBytes()));
fos.close();
}
catch (FileNotFoundException e) {
System.out.println("File Not Found.");
e.printStackTrace();
}
catch (IOException e1) {
System.out.println("Error Reading The File.");
e1.printStackTrace();
}
Even though
new String(bytes, "UTF-8")
is correct it throws a UnsupportedEncodingException which forces you to deal with a checked exception. You can use as an alternative another constructor since Java 1.6 to convert a byte array into a String:
new String(bytes, StandardCharsets.UTF_8)
This one does not throw any exception.
Converting back should be also done with StandardCharsets.UTF_8:
"test".getBytes(StandardCharsets.UTF_8)
Again you avoid having to deal with checked exceptions.
private static String toHexadecimal(byte[] digest){
String hash = "";
for(byte aux : digest) {
int b = aux & 0xff;
if (Integer.toHexString(b).length() == 1) hash += "0";
hash += Integer.toHexString(b);
}
return hash;
}
I did notice something that is not in any of the answers. You can cast each of the bytes in the byte array to characters, and put them in a char array. Then the string is new String(cbuf) where cbuf is the char array. To convert back, loop through the string casting each of the chars to bytes to put into a byte array, and this byte array will be the same as the first.
public class StringByteArrTest {
public static void main(String[] args) {
// put whatever byte array here
byte[] arr = new byte[] {-12, -100, -49, 100, -63, 0, -90};
for (byte b: arr) System.out.println(b);
// put data into this char array
char[] cbuf = new char[arr.length];
for (int i = 0; i < arr.length; i++) {
cbuf[i] = (char) arr[i];
}
// this is the string
String s = new String(cbuf);
System.out.println(s);
// converting back
byte[] out = new byte[s.length()];
for (int i = 0; i < s.length(); i++) {
out[i] = (byte) s.charAt(i);
}
for (byte b: out) System.out.println(b);
}
}
This works fine for me:
String cd = "Holding some value";
Converting from string to byte[]:
byte[] cookie = new sun.misc.BASE64Decoder().decodeBuffer(cd);
Converting from byte[] to string:
cd = new sun.misc.BASE64Encoder().encode(cookie);
Following is the sample code safely converts byte array to String and String to byte array back.
byte bytesArray[] = { 1, -2, 4, -5, 10};
String encoded = java.util.Base64.getEncoder().encodeToString(bytesArray);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
System.out.println("input: "+Arrays.toString(bytesArray));
System.out.println("encoded: "+encoded);
System.out.println("decoded: "+Arrays.toString(decoded));
Output:
input: [1, -2, 4, -5, 10]
encoded: Af4E+wo=
decoded: [1, -2, 4, -5, 10]
javax.xml.bind.DatatypeConverter should do it:
byte [] b = javax.xml.bind.DatatypeConverter.parseHexBinary("E62DB");
String s = javax.xml.bind.DatatypeConverter.printHexBinary(b);
byte[] bytes = "Techie Delight".getBytes();
// System.out.println(Arrays.toString(bytes));
// Create a string from the byte array without specifying
// character encoding
String string = new String(bytes);
System.out.println(string);
Heres a few methods that convert an array of bytes to a string. I've tested them they work well.
public String getStringFromByteArray(byte[] settingsData) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(settingsData);
Reader reader = new BufferedReader(new InputStreamReader(byteArrayInputStream));
StringBuilder sb = new StringBuilder();
int byteChar;
try {
while((byteChar = reader.read()) != -1) {
sb.append((char) byteChar);
}
}
catch(IOException e) {
e.printStackTrace();
}
return sb.toString();
}
public String getStringFromByteArray(byte[] settingsData) {
StringBuilder sb = new StringBuilder();
for(byte willBeChar: settingsData) {
sb.append((char) willBeChar);
}
return sb.toString();
}
While base64 encoding is safe and one could argue "the right answer", I arrived here looking for a way to convert a Java byte array to/from a Java String as-is. That is, where each member of the byte array remains intact in its String counterpart, with no extra space required for encoding/transport.
This answer describing 8bit transparent encodings was very helpful for me. I used ISO-8859-1 on terabytes of binary data to convert back and forth successfully (binary <-> String) without the inflated space requirements needed for a base64 encoding, so is safe for my use-case - YMMV.
This was also helpful in explaining when/if you should experiment.
Using Kotlin on Android I found out it is handy to create some simple extension functions for that purpose.
Solution based on Base64 encoding/decoding to be able to pass via JSON, XML, etc:
import android.util.Base64
fun ByteArray.encodeToString() = String(Base64.encode(this, Base64.NO_WRAP), Charsets.UTF_8)
fun String.decodeToBytes(): ByteArray = Base64.decode(toByteArray(Charsets.UTF_8), Base64.NO_WRAP)
So you can use it
val byteArray = byteArrayOf(0, 1, 2, -1, -2, -3)
val string = byteArray.encodeToString()
val restoredArray = string.decodeToBytes()
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
private static String base64Encode(byte[] bytes)
{
return new BASE64Encoder().encode(bytes);
}
private static byte[] base64Decode(String s) throws IOException
{
return new BASE64Decoder().decodeBuffer(s);
}
I succeeded converting byte array to a string with this method:
public static String byteArrayToString(byte[] data){
String response = Arrays.toString(data);
String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];
for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}
String str = new String(bytes);
return str.toLowerCase();
}
This one works for me up to android Q:
You can use the following method to convert o hex string to string
public static String hexToString(String hex) {
StringBuilder sb = new StringBuilder();
char[] hexData = hex.toCharArray();
for (int count = 0; count < hexData.length - 1; count += 2) {
int firstDigit = Character.digit(hexData[count], 16);
int lastDigit = Character.digit(hexData[count + 1], 16);
int decimal = firstDigit * 16 + lastDigit;
sb.append((char)decimal);
}
return sb.toString();
}
with the following to convert a byte array to a hex string
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Here the working code.
// Encode byte array into string . TemplateBuffer1 is my bytearry variable.
String finger_buffer = Base64.encodeToString(templateBuffer1, Base64.DEFAULT);
Log.d(TAG, "Captured biometric device->" + finger_buffer);
// Decode String into Byte Array. decodedString is my bytearray[]
decodedString = Base64.decode(finger_buffer, Base64.DEFAULT);
You can use simple for loop for conversion:
public void byteArrToString(){
byte[] b = {'a','b','$'};
String str = "";
for(int i=0; i<b.length; i++){
char c = (char) b[i];
str+=c;
}
System.out.println(str);
}
byte[] image = {...};
String imageString = Base64.encodeToString(image, Base64.NO_WRAP);
Try to specify an 8-bit charset in both conversions. ISO-8859-1 for instance.
Read the bytes from String using ByteArrayInputStream and wrap it with BufferedReader which is Char Stream instead of Byte Stream which converts the byte data to String.
package com.cs.sajal;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
public class TestCls {
public static void main(String[] args) {
String s=new String("Sajal is a good boy");
try
{
ByteArrayInputStream bis;
bis=new ByteArrayInputStream(s.getBytes("UTF-8"));
BufferedReader br=new BufferedReader(new InputStreamReader(bis));
System.out.println(br.readLine());
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Output is:
Sajal is a good boy
You can do the following to convert byte array to string and then convert that string to byte array:
// 1. convert byte array to string and then string to byte array
// convert byte array to string
byte[] by_original = {0, 1, -2, 3, -4, -5, 6};
String str1 = Arrays.toString(by_original);
System.out.println(str1); // output: [0, 1, -2, 3, -4, -5, 6]
// convert string to byte array
String newString = str1.substring(1, str1.length()-1);
String[] stringArray = newString.split(", ");
byte[] by_new = new byte[stringArray.length];
for(int i=0; i<stringArray.length; i++) {
by_new[i] = (byte) Integer.parseInt(stringArray[i]);
}
System.out.println(Arrays.toString(by_new)); // output: [0, 1, -2, 3, -4, -5, 6]
But to convert the string to byte array and then convert that byte array to string, below approach can be used:
// 2. convert string to byte array and then byte array to string
// convert string to byte array
String str2 = "[0, 1, -2, 3, -4, -5, 6]";
byte[] byteStr2 = str2.getBytes(StandardCharsets.UTF_8);
// Now byteStr2 is [91, 48, 44, 32, 49, 44, 32, 45, 50, 44, 32, 51, 44, 32, 45, 52, 44, 32, 45, 53, 44, 32, 54, 93]
// convert byte array to string
System.out.println(new String(byteStr2, StandardCharsets.UTF_8)); // output: [0, 1, -2, 3, -4, -5, 6]
A string is a collection of char's (16bit unsigned). So if you are going to convert negative numbers into a string, they'll be lost in translation.
public class byteString {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String msg = "Hello";
byte[] buff = new byte[1024];
buff = msg.getBytes("UTF-8");
System.out.println(buff);
String m = new String(buff);
System.out.println(m);
}
}
I have to convert a byte array to string in Android, but my byte array contains negative values.
If I convert that string again to byte array, values I am getting are different from original byte array values.
What can I do to get proper conversion? Code I am using to do the conversion is as follows:
// Code to convert byte arr to str:
byte[] by_original = {0,1,-2,3,-4,-5,6};
String str1 = new String(by_original);
System.out.println("str1 >> "+str1);
// Code to convert str to byte arr:
byte[] by_new = str1.getBytes();
for(int i=0;i<by_new.length;i++)
System.out.println("by1["+i+"] >> "+str1);
I am stuck in this problem.
Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:
byte[] bytes = {...}
String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding
There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.
The "proper conversion" between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
If you really must use a String to hold binary data then the safest way is to use Base64 encoding.
The root problem is (I think) that you are unwittingly using a character set for which:
bytes != encode(decode(bytes))
in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for "no such character". Naturally, when you then try to encode the characters as bytes the result will be different.
The solution is:
Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
Use the right character set for your byte data ... or alternatively one (such as "Latin-1" where all byte sequences map to valid Unicode characters.
If your bytes are (really) binary data and you want to be able to transmit / receive them over a "text based" channel, use something like Base64 encoding ... which is designed for this purpose.
For Java, the most common character sets are in java.nio.charset.StandardCharsets. If you are encoding a string that can contain any Unicode character value then UTF-8 encoding (UTF_8) is recommended.
If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 - more commonly just called "Latin 1" or simply "Latin" (ISO_8859_1). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won't see them in any output.
From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.
We just need to construct a new String with the array: http://www.mkyong.com/java/how-do-convert-byte-array-to-string-in-java/
String s = new String(bytes);
The bytes of the resulting string differs depending on what charset you use. new String(bytes) and new String(bytes, Charset.forName("utf-8")) and new String(bytes, Charset.forName("utf-16")) will all have different byte arrays when you call String#getBytes() (depending on the default charset)
Using new String(byOriginal) and converting back to byte[] using getBytes() doesn't guarantee two byte[] with equal values. This is due to a call to StringCoding.encode(..) which will encode the String to Charset.defaultCharset(). During this encoding, the encoder might choose to replace unknown characters and do other changes. Hence, using String.getBytes() might not return an equal array as you've originally passed to the constructor.
Why was the problem: As someone already specified:
If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
I was observing this problem when I was trying to create byte[] from a pdf file and then converting it to String and then taking the String as input and converting back to file.
So make sure your encoding and decoding logic is same as I did. I explicitly encoded the byte[] to Base64 and decoded it to create the file again.
Use-case:
Due to some limitation I was trying to sent byte[] in request(POST) and the process was as follows:
PDF File >> Base64.encodeBase64(byte[]) >> String >> Send in request(POST) >> receive String >> Base64.decodeBase64(byte[]) >> create binary
Try this and this worked for me..
File file = new File("filePath");
byte[] byteArray = new byte[(int) file.length()];
try {
FileInputStream fileInputStream = new FileInputStream(file);
fileInputStream.read(byteArray);
String byteArrayStr= new String(Base64.encodeBase64(byteArray));
FileOutputStream fos = new FileOutputStream("newFilePath");
fos.write(Base64.decodeBase64(byteArrayStr.getBytes()));
fos.close();
}
catch (FileNotFoundException e) {
System.out.println("File Not Found.");
e.printStackTrace();
}
catch (IOException e1) {
System.out.println("Error Reading The File.");
e1.printStackTrace();
}
Even though
new String(bytes, "UTF-8")
is correct it throws a UnsupportedEncodingException which forces you to deal with a checked exception. You can use as an alternative another constructor since Java 1.6 to convert a byte array into a String:
new String(bytes, StandardCharsets.UTF_8)
This one does not throw any exception.
Converting back should be also done with StandardCharsets.UTF_8:
"test".getBytes(StandardCharsets.UTF_8)
Again you avoid having to deal with checked exceptions.
private static String toHexadecimal(byte[] digest){
String hash = "";
for(byte aux : digest) {
int b = aux & 0xff;
if (Integer.toHexString(b).length() == 1) hash += "0";
hash += Integer.toHexString(b);
}
return hash;
}
I did notice something that is not in any of the answers. You can cast each of the bytes in the byte array to characters, and put them in a char array. Then the string is new String(cbuf) where cbuf is the char array. To convert back, loop through the string casting each of the chars to bytes to put into a byte array, and this byte array will be the same as the first.
public class StringByteArrTest {
public static void main(String[] args) {
// put whatever byte array here
byte[] arr = new byte[] {-12, -100, -49, 100, -63, 0, -90};
for (byte b: arr) System.out.println(b);
// put data into this char array
char[] cbuf = new char[arr.length];
for (int i = 0; i < arr.length; i++) {
cbuf[i] = (char) arr[i];
}
// this is the string
String s = new String(cbuf);
System.out.println(s);
// converting back
byte[] out = new byte[s.length()];
for (int i = 0; i < s.length(); i++) {
out[i] = (byte) s.charAt(i);
}
for (byte b: out) System.out.println(b);
}
}
This works fine for me:
String cd = "Holding some value";
Converting from string to byte[]:
byte[] cookie = new sun.misc.BASE64Decoder().decodeBuffer(cd);
Converting from byte[] to string:
cd = new sun.misc.BASE64Encoder().encode(cookie);
Following is the sample code safely converts byte array to String and String to byte array back.
byte bytesArray[] = { 1, -2, 4, -5, 10};
String encoded = java.util.Base64.getEncoder().encodeToString(bytesArray);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
System.out.println("input: "+Arrays.toString(bytesArray));
System.out.println("encoded: "+encoded);
System.out.println("decoded: "+Arrays.toString(decoded));
Output:
input: [1, -2, 4, -5, 10]
encoded: Af4E+wo=
decoded: [1, -2, 4, -5, 10]
javax.xml.bind.DatatypeConverter should do it:
byte [] b = javax.xml.bind.DatatypeConverter.parseHexBinary("E62DB");
String s = javax.xml.bind.DatatypeConverter.printHexBinary(b);
byte[] bytes = "Techie Delight".getBytes();
// System.out.println(Arrays.toString(bytes));
// Create a string from the byte array without specifying
// character encoding
String string = new String(bytes);
System.out.println(string);
Heres a few methods that convert an array of bytes to a string. I've tested them they work well.
public String getStringFromByteArray(byte[] settingsData) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(settingsData);
Reader reader = new BufferedReader(new InputStreamReader(byteArrayInputStream));
StringBuilder sb = new StringBuilder();
int byteChar;
try {
while((byteChar = reader.read()) != -1) {
sb.append((char) byteChar);
}
}
catch(IOException e) {
e.printStackTrace();
}
return sb.toString();
}
public String getStringFromByteArray(byte[] settingsData) {
StringBuilder sb = new StringBuilder();
for(byte willBeChar: settingsData) {
sb.append((char) willBeChar);
}
return sb.toString();
}
While base64 encoding is safe and one could argue "the right answer", I arrived here looking for a way to convert a Java byte array to/from a Java String as-is. That is, where each member of the byte array remains intact in its String counterpart, with no extra space required for encoding/transport.
This answer describing 8bit transparent encodings was very helpful for me. I used ISO-8859-1 on terabytes of binary data to convert back and forth successfully (binary <-> String) without the inflated space requirements needed for a base64 encoding, so is safe for my use-case - YMMV.
This was also helpful in explaining when/if you should experiment.
Using Kotlin on Android I found out it is handy to create some simple extension functions for that purpose.
Solution based on Base64 encoding/decoding to be able to pass via JSON, XML, etc:
import android.util.Base64
fun ByteArray.encodeToString() = String(Base64.encode(this, Base64.NO_WRAP), Charsets.UTF_8)
fun String.decodeToBytes(): ByteArray = Base64.decode(toByteArray(Charsets.UTF_8), Base64.NO_WRAP)
So you can use it
val byteArray = byteArrayOf(0, 1, 2, -1, -2, -3)
val string = byteArray.encodeToString()
val restoredArray = string.decodeToBytes()
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
private static String base64Encode(byte[] bytes)
{
return new BASE64Encoder().encode(bytes);
}
private static byte[] base64Decode(String s) throws IOException
{
return new BASE64Decoder().decodeBuffer(s);
}
I succeeded converting byte array to a string with this method:
public static String byteArrayToString(byte[] data){
String response = Arrays.toString(data);
String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];
for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}
String str = new String(bytes);
return str.toLowerCase();
}
This one works for me up to android Q:
You can use the following method to convert o hex string to string
public static String hexToString(String hex) {
StringBuilder sb = new StringBuilder();
char[] hexData = hex.toCharArray();
for (int count = 0; count < hexData.length - 1; count += 2) {
int firstDigit = Character.digit(hexData[count], 16);
int lastDigit = Character.digit(hexData[count + 1], 16);
int decimal = firstDigit * 16 + lastDigit;
sb.append((char)decimal);
}
return sb.toString();
}
with the following to convert a byte array to a hex string
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Here the working code.
// Encode byte array into string . TemplateBuffer1 is my bytearry variable.
String finger_buffer = Base64.encodeToString(templateBuffer1, Base64.DEFAULT);
Log.d(TAG, "Captured biometric device->" + finger_buffer);
// Decode String into Byte Array. decodedString is my bytearray[]
decodedString = Base64.decode(finger_buffer, Base64.DEFAULT);
You can use simple for loop for conversion:
public void byteArrToString(){
byte[] b = {'a','b','$'};
String str = "";
for(int i=0; i<b.length; i++){
char c = (char) b[i];
str+=c;
}
System.out.println(str);
}
byte[] image = {...};
String imageString = Base64.encodeToString(image, Base64.NO_WRAP);
Try to specify an 8-bit charset in both conversions. ISO-8859-1 for instance.
Read the bytes from String using ByteArrayInputStream and wrap it with BufferedReader which is Char Stream instead of Byte Stream which converts the byte data to String.
package com.cs.sajal;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
public class TestCls {
public static void main(String[] args) {
String s=new String("Sajal is a good boy");
try
{
ByteArrayInputStream bis;
bis=new ByteArrayInputStream(s.getBytes("UTF-8"));
BufferedReader br=new BufferedReader(new InputStreamReader(bis));
System.out.println(br.readLine());
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Output is:
Sajal is a good boy
You can do the following to convert byte array to string and then convert that string to byte array:
// 1. convert byte array to string and then string to byte array
// convert byte array to string
byte[] by_original = {0, 1, -2, 3, -4, -5, 6};
String str1 = Arrays.toString(by_original);
System.out.println(str1); // output: [0, 1, -2, 3, -4, -5, 6]
// convert string to byte array
String newString = str1.substring(1, str1.length()-1);
String[] stringArray = newString.split(", ");
byte[] by_new = new byte[stringArray.length];
for(int i=0; i<stringArray.length; i++) {
by_new[i] = (byte) Integer.parseInt(stringArray[i]);
}
System.out.println(Arrays.toString(by_new)); // output: [0, 1, -2, 3, -4, -5, 6]
But to convert the string to byte array and then convert that byte array to string, below approach can be used:
// 2. convert string to byte array and then byte array to string
// convert string to byte array
String str2 = "[0, 1, -2, 3, -4, -5, 6]";
byte[] byteStr2 = str2.getBytes(StandardCharsets.UTF_8);
// Now byteStr2 is [91, 48, 44, 32, 49, 44, 32, 45, 50, 44, 32, 51, 44, 32, 45, 52, 44, 32, 45, 53, 44, 32, 54, 93]
// convert byte array to string
System.out.println(new String(byteStr2, StandardCharsets.UTF_8)); // output: [0, 1, -2, 3, -4, -5, 6]
A string is a collection of char's (16bit unsigned). So if you are going to convert negative numbers into a string, they'll be lost in translation.
public class byteString {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String msg = "Hello";
byte[] buff = new byte[1024];
buff = msg.getBytes("UTF-8");
System.out.println(buff);
String m = new String(buff);
System.out.println(m);
}
}
Why does this junit test fail?
import org.junit.Assert;
import org.junit.Test;
import java.io.UnsupportedEncodingException;
public class TestBytes {
#Test
public void testBytes() throws UnsupportedEncodingException {
byte[] bytes = new byte[]{0, -121, -80, 116, -62};
String string = new String(bytes, "UTF-8");
byte[] bytes2 = string.getBytes("UTF-8");
System.out.print("bytes2: [");
for (byte b : bytes2) System.out.print(b + ", ");
System.out.print("]\n");
Assert.assertArrayEquals(bytes, bytes2);
}
}
I would assume that the incoming byte array equaled the outcome, but somehow, probably due to the fact that UTF-8 characters take two bytes, the outcome array differs from the incoming array in both content and length.
Please enlighten me.
The reason is 0, -121, -80, 116, -62 is not a valid UTF-8 byte sequence. new String(bytes, "UTF-8") does not throw any exception in such situations but the result is difficult to predict. Read http://en.wikipedia.org/wiki/UTF-8 Invalid byte sequences section.
The array bytes contains negative noted vales, these have the 8th bit (bit7) set and are converted into UTF-8 as multibyte sequences. bytes2 will be identical to bytes if you use only bytes with values in range 0..127. To make a copy of bytes as given one may use for example the arraycopy method:
byte[] bytes3 = new byte[bytes.length];
System.arraycopy(bytes, 0, bytes3, 0, bytes.length);
I am currently facing an error called Bad Base64Coder input character at ...
Here is my code in java.
String nonce2 = strNONCE;
byte[] nonceBytes1 = Base64Coder.decode(nonce2);
System.out.println("nonceByte1 value : " + nonceBytes1);
The problem now is I get Bad Base64Coder input character error and the nonceBytes1 value is printed as null. I am trying to decode the nonce2 from Base64Coder. My strNONCE value is 16
/** Generating nonce value */
public static String generateNonce() {
try {
byte[] nonce = new byte[16];
Random rand;
rand = SecureRandom.getInstance ("SHA1PRNG");
rand.nextBytes(nonce);
//convert byte array to string.
strNONCE = new String(nonce);
}catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return strNONCE;
}
//convert byte array to string.
strNONCE = new String(nonce);
That is not going to work. You need to base64 encode it.
strNONCE = Base64Coder.encode(nonce);
It simply look like you're confusing some independent concepts and are pretty new to Java as well. Base64 is a type of encoding which converts "human unreadable" byte arrays into "human readable" strings (encoding) and the other way round (decoding). It is usually used to transfer or store binary data as characters there where it is strictly been required (due to the protocol or the storage type).
The SecureRandom thing is not an encoder or decoder. It returns a random value which is in no way to be corelated with a certain cipher or encoder. Here are some extracts from the before given links:
ran·dom
adj.
1. Having no specific pattern, purpose, or objective
Cipher
In cryptography, a cipher (or cypher)
is an algorithm for performing
encryption or decryption — a series
of well-defined steps that can be
followed as a procedure.
Encoding
Encoding is the process of
transforming information from one
format into another. The opposite
operation is called decoding.
I'd strongly recommend you to align those concepts out for yourself (click the links to learn more about them) and not to throw them in one big and same hole. Here's at least an SSCCE which shows how you can properly encode/decode a (random) byte array using base64 (and how to show arrays as string (a human readable format)):
package com.stackoverflow.q2535542;
import java.security.SecureRandom;
import java.util.Arrays;
import org.apache.commons.codec.binary.Base64;
public class Test {
public static void main(String[] args) throws Exception {
// Generate random bytes and show them.
byte[] bytes = new byte[16];
SecureRandom.getInstance("SHA1PRNG").nextBytes(bytes);
System.out.println(Arrays.toString(bytes));
// Base64-encode bytes and show them.
String base64String = Base64.encodeBase64String(bytes);
System.out.println(base64String);
// Base64-decode string and show bytes.
byte[] decoded = Base64.decodeBase64(base64String);
System.out.println(Arrays.toString(decoded));
}
}
(using Commons Codec Base64 by the way)
Here's an example of the output:
[14, 52, -34, -74, -6, 72, -127, 62, -37, 45, 55, -38, -72, -3, 123, 23]
DjTetvpIgT7bLTfauP17Fw==
[14, 52, -34, -74, -6, 72, -127, 62, -37, 45, 55, -38, -72, -3, 123, 23]
A base64 encoded string would only have printable characters in it. You're generating strNONCE directly from random bytes, so it will have non-printable characters in it.
What exactly is it you're trying to do?