decode base64 utf-8 string java

decode base64 utf-8 string java - java

I have this string
"=?UTF-8?B?VGLNBGNDQA==?="
to decode in a standard java String.
I wrote this quick and dirty main to get the String, but I'm having troubles
String s = "=?UTF-8?B?VGLNBGNDQA==?=";
s = s.split("=\\?UTF-8\\?B\\?")[1].split("\\?=")[0];
System.out.println(s);
byte[] decoded = Base64.getDecoder().decode(s);
String x = new String(decoded, "UTF8");
System.out.println(decoded);
System.out.println(x);
It is actually printing a strange string
"Tb�cC#"
I do not know what is the text behind the encoded string, but I can assume my program works, since I can convert without problems any other encoded string, for example
"=?UTF-8?B?SGlfR3V5cyE="
That is "Hi_Guys!".
Should I assume that string is malformed?

Related

MimeUtility.decode() doesn't work for every encoded text

I am working o a mail application and I have some troubles with decoding mime encoded text. I am using MimeUtility.decode() but it doesn't for every encoded text. Some texts are decoded properly but others couldn't.
These encoded text which can't be decoded especially have utf-8 and iso-8859-9 encoding type.
How I can solve this issue??
This is the code I used for decoding
MimeUtility.decodeText(text);
These are example of failing text:

****Solution***** (Thanks to #user_xtech007)
I solve this with problem with decoding encoded parts by splitting multiple encoded parts with regex .
Here is the codes of method I using
private final String ENCODED_PART_REGEX_PATTERN="=\\?([^?]+)\\?([^?]+)\\?([^?]+)\\?=";
private String decode(String s)
{
Pattern pattern=Pattern.compile(ENCODED_PART_REGEX_PATTERN);
Matcher m=pattern.matcher(s);
ArrayList<String> encodedParts=new ArrayList<String>();
while(m.find())
{
encodedParts.add(m.group(0));
}
if(encodedParts.size()>0)
{
try
{
for(String encoded:encodedParts)
{
s=s.replace(encoded, MimeUtility.decodeText(encoded));
}
return s;
} catch(Exception ex)
{
return s;
}
}
else
return s;
}

convert the string you receive into byte array and then use this to decode utf-8 text
String s2 = new String(bytes, "UTF-8");
first convert the ISO-8859-1 text into bye array then convert it to string
byte[] b2 = s.getBytes("ISO-8859-1");
For getting the encoded string from the uri , you can use Regex

You can also decode this string by putting
System.setProperty("mail.mime.decodetext.strict", "false");
Before you use MimeUtility.decodeText(text);
This will ensure that also "inner words" get decoded:
The mail.mime.decodetext.strict property controls decoding of MIME
encoded words. The MIME spec requires that encoded words start at the
beginning of a whitespace separated word. Some mailers incorrectly
include encoded words in the middle of a word. If the
mail.mime.decodetext.strict System property is set to "false", an
attempt will be made to decode these illegal encoded words. The
default is true.
https://docs.oracle.com/javaee/7/api/javax/mail/internet/MimeUtility.html

What's the difference between new String(byte[]) and DatatypeConverter.printBase64Binary(byte[])?

I need to pass base64 encoded data into xml as a string value. I noticed that code below prints different string representation. Which one is correct and why?
String example = "Hello universe!";
byte[] base64data = Base64.encodeBase64(example.getBytes());
System.out.println(new String(base64data));
System.out.println(DatatypeConverter.printBase64Binary(base64data));
System.out.println(new String(Base64.decodeBase64(base64data), "UTF-8"));
And what I get as a result:
SGVsbG8gdW5pdmVyc2Uh
U0dWc2JHOGdkVzVwZG1WeWMyVWg=
Hello universe!

U0dWc2JHOGdkVzVwZG1WeWMyVWg= decoded is SGVsbG8gdW5pdmVyc2Uh which is Hello universe! encoded. So you did the encoding twice.
There is no difference. You are using the API the wrong way. Don't encode the already encoded data again.

Why those calls to base64 classes return different results?

My code:
private static String convertToBase64(String string)
{
final byte[] encodeBase64 =
org.apache.commons.codec.binary.Base64.encodeBase64(string
.getBytes());
System.out.println(Hex.encodeHexString(encodeBase64));
final byte[] data = string.getBytes();
final String encoded =
javax.xml.bind.DatatypeConverter.printBase64Binary(data);
System.out.println(encoded);
return encoded;
}
Now I'm calling it: convertToBase64("stackoverflow"); and get following result:
6333526859327476646d56795a6d787664773d3d
c3RhY2tvdmVyZmxvdw==
Why I get different results?

I think Hex.encodeHexString will encode your String to hexcode, and the second one is a normal String

From the API doc of Base64.encodeBase64():
byte[] containing Base64 characters in their UTF-8 representation.
So instead
System.out.println(Hex.encodeHexString(encodeBase64));
you should write
System.out.println(new String(encodeBase64, "UTF-8"));
BTW: You should never use the String.getBytes() version without explicit encoding, because the result depends on the default platform encoding (for Windows this is usually "Cp1252" and Linux "UTF-8").

converting byte[] to string

I am having a bytearray of byte[] type having the length 17 bytes, i want to convert this to string and want to give this string for another comparison but the output i am getting is not in the format to validate, i am using the below method to convert.I want to output as string which is easy to validate and give this same string for comparison.
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = new String(byteArray);
System.out.println(value);
Output : ���{nY

What encoding is it? You should define it explicitly:
new String(byteArray, Charset.forName("UTF-32")); //or whichever you use
Otherwise the result is unpredictable (from String.String(byte[]) constructor JavaDoc):
Constructs a new String by decoding the specified array of bytes using the platform's default charset
BTW I have just tried it with UTF-8, UTF-16 and UTF-32 - all produce bogus results. The long series of 0 makes me believe that this isn't actually a text. Where do you get this data from?
UPDATE: I have tried it with all character sets available on my machine:
for (Map.Entry<String, Charset> entry : Charset.availableCharsets().entrySet())
{
final String value = new String(byteArray, entry.getValue());
System.out.println(entry.getKey() + ": " + value);
}
and no encoding produces anything close to human-readable text... Your input is not text.

Use as follows:
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = Arrays.toString(byteArray);
System.out.println(value);
Your output will be
[0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0]

Is it actually encoded text? If so, specify the encoding.
However, the data you've got doesn't look like it's actually meant to be text. It just looks like arbitrary binary data to me. If it isn't really text, I'd recommend converting it to hex or base64, depending on requirements. There's a good public domain base64 encoder you can use.
String text = Base64.encodeBytes(byteArray);
And decoding:
byte[] data = Base64.decode(text):

not 100% sure if I get you right. Is this what you want?
String s = null;
StringBuffer buf = new StringBuffer("");
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
for(byte b : byteArray) {
s = String.valueOf(b);
buf.append(s + ",");
}
String value = new String(buf);
System.out.println(value);

Maybe you should specify a charset:
String value = new String(byteArray, "UTF-8");

How to convert String into Byte and Back

For converting a string, I am converting it into a byte as follows:
byte[] nameByteArray = cityName.getBytes();
To convert back, I did: String retrievedString = new String(nameByteArray); which obviously doesn't work. How would I convert it back?

What characters are there in your original city name? Try UTF-8 version like this:
byte[] nameByteArray = cityName.getBytes("UTF-8");
String retrievedString = new String(nameByteArray, "UTF-8");

which obviously doesn't work.
Actually that's exactly how you do it. The only thing that can go wrong is that you're implicitly using the platform default encoding, which could differ between systems, and might not be able to represent all characters in the string.
The solution is to explicitly use an encoding that can represent all characts, such as UTF-8:
byte[] nameByteArray = cityName.getBytes("UTF-8");
String retrievedString = new String(nameByteArray, "UTF-8");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

decode base64 utf-8 string java - java

Related

MimeUtility.decode() doesn't work for every encoded text

What's the difference between new String(byte[]) and DatatypeConverter.printBase64Binary(byte[])?

Why those calls to base64 classes return different results?

converting byte[] to string

How to convert String into Byte and Back

Categories

Resources