I have this string
"=?UTF-8?B?VGLNBGNDQA==?="
to decode in a standard java String.
I wrote this quick and dirty main to get the String, but I'm having troubles
String s = "=?UTF-8?B?VGLNBGNDQA==?=";
s = s.split("=\\?UTF-8\\?B\\?")[1].split("\\?=")[0];
System.out.println(s);
byte[] decoded = Base64.getDecoder().decode(s);
String x = new String(decoded, "UTF8");
System.out.println(decoded);
System.out.println(x);
It is actually printing a strange string
"Tb�cC#"
I do not know what is the text behind the encoded string, but I can assume my program works, since I can convert without problems any other encoded string, for example
"=?UTF-8?B?SGlfR3V5cyE="
That is "Hi_Guys!".
Should I assume that string is malformed?
Background: I'm working with the java.util.Base64 class that's new with Java 1.8.
In the documentation, it specifies that the encodeToString takes a byte array (there are some other options, but byte[] is the one I'm using). However, the doc doesn't specify how the byte array needs to be encoded. Here's my functional code:
import java.util.Base64;
import java.util.Base64.Encoder;
public class Test64 {
public static void main(String[] args){
try{
System.out.println(print64("This should be base64"));
} catch(Exception e) {
e.printStackTrace();
}
}
public static String print64(String test) throws Exception {
String test64 = "";
byte[] testBytes = test.getBytes("US-ASCII");
Base64.Encoder encoder64 = Base64.getUrlEncoder();
test64 = encoder64.encodeToString(testBytes);
return test64;
}
}
The question I have is whether the Base64 encodeToString will accept a byte[] with ANY encoding. I've tried US-ASCII and UTF-8, and those both work, but I'm hoping for a general conclusion.
Link to Javadoc for Base64.Encoder
The documentation does not specify an encoding, so any byte[] data will work. Base64 conversion is numerical, not character-oriented, so whoever interprets the Base64 number will have to know what it means. So as long as your documentation is clear how to interpret the bytes, you could use the Base64 string for any data serialization.
The following test fails on converted Latin1, because illegal characters are replaced with byte with the value 63 (question mark). The problem is that these characters should better cause some exception ...
#Test
public void testEncoding() throws UnsupportedEncodingException {
final String czech = "Řízeček a šampáňo a žízeň";
// okay
final byte[] bytesInLatin2 = czech.getBytes("ISO8859-2");
// different bytes, but okay
final byte[] bytesInWin1250 = czech.getBytes("Windows-1250");
// different bytes, but okay
final byte[] bytesInUtf8 = czech.getBytes("UTF-8");
// nonsense; Ř,č,... are not in Latin1 code set!!!
final byte[] bytesInLatin1 = czech.getBytes("ISO8859-1");
System.out.println(Arrays.toString(bytesInLatin2));
System.out.println(Arrays.toString(bytesInWin1250));
System.out.println(Arrays.toString(bytesInUtf8));
System.out.println(Arrays.toString(bytesInLatin1));
System.out.flush();
final String latin2 = new String(bytesInLatin2, "ISO8859-2");
final String win1250 = new String(bytesInWin1250, "Windows-1250");
final String utf8 = new String(bytesInUtf8, "UTF-8");
final String latin1 = new String(bytesInLatin1, "ISO8859-1");
Assert.assertEquals("latin2", czech, latin2);
Assert.assertEquals("win1250", czech, win1250);
Assert.assertEquals("utf8", czech, utf8);
Assert.assertEquals("latin1", czech, latin1); // this test will fail!
}
There are many situations where the data are finally corrupted because of this behaviour of Java. Is there any library available to validate Strings if they are encodable with some encoding?
I suspect you're looking for CharsetEncoder.canEncode(CharSequence).
Charset latin2 = Charset.forName("ISO8859-2");
boolean validInLatin2 = latin2.newEncoder().canEncode(czech);
...
As an alternative to Jon Skeet's suggestion, you can also use CharsetEncoder class to do the encoding directly (with the encode method), but first call the onMalformedInput and onUnmappableCharacter methods to specify what the encoder should do when it encounters bad input.
That way most of the time you're just doing a simple encode call, but if anything goes wrong you'll get an exception.
My code:
private static String convertToBase64(String string)
{
final byte[] encodeBase64 =
org.apache.commons.codec.binary.Base64.encodeBase64(string
.getBytes());
System.out.println(Hex.encodeHexString(encodeBase64));
final byte[] data = string.getBytes();
final String encoded =
javax.xml.bind.DatatypeConverter.printBase64Binary(data);
System.out.println(encoded);
return encoded;
}
Now I'm calling it: convertToBase64("stackoverflow"); and get following result:
6333526859327476646d56795a6d787664773d3d
c3RhY2tvdmVyZmxvdw==
Why I get different results?
I think Hex.encodeHexString will encode your String to hexcode, and the second one is a normal String
From the API doc of Base64.encodeBase64():
byte[] containing Base64 characters in their UTF-8 representation.
So instead
System.out.println(Hex.encodeHexString(encodeBase64));
you should write
System.out.println(new String(encodeBase64, "UTF-8"));
BTW: You should never use the String.getBytes() version without explicit encoding, because the result depends on the default platform encoding (for Windows this is usually "Cp1252" and Linux "UTF-8").
I am having a bytearray of byte[] type having the length 17 bytes, i want to convert this to string and want to give this string for another comparison but the output i am getting is not in the format to validate, i am using the below method to convert.I want to output as string which is easy to validate and give this same string for comparison.
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = new String(byteArray);
System.out.println(value);
Output : ���{nY
What encoding is it? You should define it explicitly:
new String(byteArray, Charset.forName("UTF-32")); //or whichever you use
Otherwise the result is unpredictable (from String.String(byte[]) constructor JavaDoc):
Constructs a new String by decoding the specified array of bytes using the platform's default charset
BTW I have just tried it with UTF-8, UTF-16 and UTF-32 - all produce bogus results. The long series of 0 makes me believe that this isn't actually a text. Where do you get this data from?
UPDATE: I have tried it with all character sets available on my machine:
for (Map.Entry<String, Charset> entry : Charset.availableCharsets().entrySet())
{
final String value = new String(byteArray, entry.getValue());
System.out.println(entry.getKey() + ": " + value);
}
and no encoding produces anything close to human-readable text... Your input is not text.
Use as follows:
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = Arrays.toString(byteArray);
System.out.println(value);
Your output will be
[0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0]
Is it actually encoded text? If so, specify the encoding.
However, the data you've got doesn't look like it's actually meant to be text. It just looks like arbitrary binary data to me. If it isn't really text, I'd recommend converting it to hex or base64, depending on requirements. There's a good public domain base64 encoder you can use.
String text = Base64.encodeBytes(byteArray);
And decoding:
byte[] data = Base64.decode(text):
not 100% sure if I get you right. Is this what you want?
String s = null;
StringBuffer buf = new StringBuffer("");
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
for(byte b : byteArray) {
s = String.valueOf(b);
buf.append(s + ",");
}
String value = new String(buf);
System.out.println(value);
Maybe you should specify a charset:
String value = new String(byteArray, "UTF-8");