How to convert String into Byte and Back - java

For converting a string, I am converting it into a byte as follows:
byte[] nameByteArray = cityName.getBytes();
To convert back, I did: String retrievedString = new String(nameByteArray); which obviously doesn't work. How would I convert it back?

What characters are there in your original city name? Try UTF-8 version like this:
byte[] nameByteArray = cityName.getBytes("UTF-8");
String retrievedString = new String(nameByteArray, "UTF-8");

which obviously doesn't work.
Actually that's exactly how you do it. The only thing that can go wrong is that you're implicitly using the platform default encoding, which could differ between systems, and might not be able to represent all characters in the string.
The solution is to explicitly use an encoding that can represent all characts, such as UTF-8:
byte[] nameByteArray = cityName.getBytes("UTF-8");
String retrievedString = new String(nameByteArray, "UTF-8");

Related

how to remove special characters from string

I have a String called
String s = "Constitución Garantía";
I want to convert it to Constitución garantía.
This is a Spanish keyword. How can I convert it?
What you have described is an XY problem. It's the encoding issue and there might appear more of the characters that need to be replaced. Instead of replacing them one by one, you need to encode the whole String to UTF-8.
String s = "Constitución Garantía";
byte[] ptext = s.getBytes(StandardCharsets.ISO_8859_1);
String string = new String(ptext, StandardCharsets.UTF_8);
System.out.println(string); // Constitución Garantía
Consider fixing the encoding of a source where the string comes from before you actually start to work with it.

decode base64 utf-8 string java

I have this string
"=?UTF-8?B?VGLNBGNDQA==?="
to decode in a standard java String.
I wrote this quick and dirty main to get the String, but I'm having troubles
String s = "=?UTF-8?B?VGLNBGNDQA==?=";
s = s.split("=\\?UTF-8\\?B\\?")[1].split("\\?=")[0];
System.out.println(s);
byte[] decoded = Base64.getDecoder().decode(s);
String x = new String(decoded, "UTF8");
System.out.println(decoded);
System.out.println(x);
It is actually printing a strange string
"Tb�cC#"
I do not know what is the text behind the encoded string, but I can assume my program works, since I can convert without problems any other encoded string, for example
"=?UTF-8?B?SGlfR3V5cyE="
That is "Hi_Guys!".
Should I assume that string is malformed?

What encoding Java uses to create string from give unicode data?

I am quite perplexed on why I should not be encoding unicode text with UTF-8 for comparison when other text(to compare) has been encoded with UTF-8?
I wanted to compare a text(= アクセス拒否 - means Access denied) stored in external file encoded as UTF-8 with a constant string stored in a .java file as
public static final String ACCESS_DENIED_IN_JAPANESE = "\u30a2\u30af\u30bb\u30b9\u62d2\u5426"; // means Access denied
The java file was encoded as Cp1252.
I read the file as as input stream by using below code. Point to note that I am using UTF-8 for encoding.
InputStream in = new FileInputStream("F:\\sample.txt");
int b1;
byte[] bytes = new byte[4096];
int i = 0;
while (true) {
b1 = in.read();
if (b1 == -1)
break;
bytes[i++] = (byte) b1;
}
String japTextFromFile = new String(bytes, 0, i, Charset.forName("UTF-8"));
Now when I compare as
System.out.println(ACCESS_DENIED_IN_JAPANESE.equals(japTextFromFile)); // result is `true` , and works fine
but when I encode ACCESS_DENIED_IN_JAPANESE with UTF-8 and try to compare it with japTextFromFile result is false. The code is
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(),Charset.forName("UTF-8"));
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `false`
So my doubt is why above comparison is failing, when both the strings are same and have been encoded with UTF-8? The result should be true.
However, in first case, when compared different encoded strings- one with UTF-16(Java default way of encoding string) and other with UTF-8 , result is true, which I think should be false as it is different encoding ,no matter text we read, is same.
Where I am wrong in my understanding? Any clarification is greatly appreciated.
ACCESS_DENIED_IN_JAPANESE.getBytes() does not use UTF-8. It uses your platform's default charset. But then you use UTF-8 to turn those bytes back into a String. This gets you a different String to the one you started with.
Try this:
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(StandardCharsets.UTF_8),StandardCharsets.UTF_8
);
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `true`
The best way I know is put all static texts into a text file encoded with UTF-8. And then read those resources with FileReader, setting encoding parameter to "UTF-8"

Why those calls to base64 classes return different results?

My code:
private static String convertToBase64(String string)
{
final byte[] encodeBase64 =
org.apache.commons.codec.binary.Base64.encodeBase64(string
.getBytes());
System.out.println(Hex.encodeHexString(encodeBase64));
final byte[] data = string.getBytes();
final String encoded =
javax.xml.bind.DatatypeConverter.printBase64Binary(data);
System.out.println(encoded);
return encoded;
}
Now I'm calling it: convertToBase64("stackoverflow"); and get following result:
6333526859327476646d56795a6d787664773d3d
c3RhY2tvdmVyZmxvdw==
Why I get different results?
I think Hex.encodeHexString will encode your String to hexcode, and the second one is a normal String
From the API doc of Base64.encodeBase64():
byte[] containing Base64 characters in their UTF-8 representation.
So instead
System.out.println(Hex.encodeHexString(encodeBase64));
you should write
System.out.println(new String(encodeBase64, "UTF-8"));
BTW: You should never use the String.getBytes() version without explicit encoding, because the result depends on the default platform encoding (for Windows this is usually "Cp1252" and Linux "UTF-8").

converting byte[] to string

I am having a bytearray of byte[] type having the length 17 bytes, i want to convert this to string and want to give this string for another comparison but the output i am getting is not in the format to validate, i am using the below method to convert.I want to output as string which is easy to validate and give this same string for comparison.
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = new String(byteArray);
System.out.println(value);
Output : ���{nY
What encoding is it? You should define it explicitly:
new String(byteArray, Charset.forName("UTF-32")); //or whichever you use
Otherwise the result is unpredictable (from String.String(byte[]) constructor JavaDoc):
Constructs a new String by decoding the specified array of bytes using the platform's default charset
BTW I have just tried it with UTF-8, UTF-16 and UTF-32 - all produce bogus results. The long series of 0 makes me believe that this isn't actually a text. Where do you get this data from?
UPDATE: I have tried it with all character sets available on my machine:
for (Map.Entry<String, Charset> entry : Charset.availableCharsets().entrySet())
{
final String value = new String(byteArray, entry.getValue());
System.out.println(entry.getKey() + ": " + value);
}
and no encoding produces anything close to human-readable text... Your input is not text.
Use as follows:
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = Arrays.toString(byteArray);
System.out.println(value);
Your output will be
[0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0]
Is it actually encoded text? If so, specify the encoding.
However, the data you've got doesn't look like it's actually meant to be text. It just looks like arbitrary binary data to me. If it isn't really text, I'd recommend converting it to hex or base64, depending on requirements. There's a good public domain base64 encoder you can use.
String text = Base64.encodeBytes(byteArray);
And decoding:
byte[] data = Base64.decode(text):
not 100% sure if I get you right. Is this what you want?
String s = null;
StringBuffer buf = new StringBuffer("");
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
for(byte b : byteArray) {
s = String.valueOf(b);
buf.append(s + ",");
}
String value = new String(buf);
System.out.println(value);
Maybe you should specify a charset:
String value = new String(byteArray, "UTF-8");

Categories