Decoding String (from header) encoded by Base64 and RFC2047 in Java - java

I'm working on a function to decode a string (from a header) that is encoded in both Base64 and RFC2047 in Java.
Given this header:
SGVhZGVyOiBoZWFkZXJ2YWx1ZQ0KQmFkOiBOYW1lOiBiYWRuYW1ldmFsdWUNClVuaWNvZGU6ID0/VVRGLTg/Qj81YmV4NXF5eTU2dUw2SUNNNTZ1TDVMcTY3N3lNNWJleDVxeXk2WUdVNklDTTZZR1U/PSA9P1VURi04P0I/NUxxNjc3eU01YmV4NW9tQTVMaU41cXl5Nzd5TTVZdS81cGE5NXBhODVMcTY0NENDPz0NCg0K
My expected output is:
Header: headervalue Bad: Name: badnamevalue Unicode:
己欲立而立人,己欲達而達人,己所不欲,勿施於人。
The only relevant function that I have found and tried was Base64.decodeBase64(headers), which produces this when printed out:
Header: headervalue Bad: Name: badnamevalue Unicode:
=?UTF-8?B?5bex5qyy56uL6ICM56uL5Lq677yM5bex5qyy6YGU6ICM6YGU?= =?UTF-8?B?5Lq677yM5bex5omA5LiN5qyy77yM5Yu/5pa95pa85Lq644CC?=
To solve this, I've been trying MimeUtility.decode() by converting the byte array returned from Base64.decodeBase64(headers) to InputStream, but the result was identical as above.
InputStream headerStream = new ByteArrayInputStream(Base64.decodeBase64(headers));
InputStream result = MimeUtility.decode(headerStream, "quoted-printable");
Have been searching around the internet but have yet found a solution, wondering if anyone knows ways to decode MIME headers from resulted byte arrays?
Any help is appreciated! It's also my first stack overflow post, apologies if I'm missing anything but please let me know if there's more information that I can provide!

The base64 you have there actually is what you pasted. Including the bizarre =?UTF-8?B? weirdness.
The stuff that follows is again base64.
There's base64-encoded data inside your base-64 encoded data. As Xzibit would say: I put some Base64 in your base64 so you can base64 while you base64. Why do I feel old all of a sudden?
In other words, the base64 input you get is a crazy, extremely inefficient format invented by a crazy person.
My advice is that you tell them to come up with something less insane.
Failing that:
Search the resulting string for the regex pattern and then again apply base64 decode to the stuff in the middle.
Also, you're using some third party base64 decoder, probably apache. Apache libraries tend to suck. Base64 is baked into java, there is no reason to use worse libraries here. I've fixed that; the Base64 in this snippet is java.util.Base64. Its API is slightly different.
String sourceB64 = "SGV..."; // that input base64 you have.
byte[] sourceBytes = Base64.decodeBase64(sourceB64);
String source = new String(sourceBytes, StandardCharsets.UTF_8);
Pattern p = Pattern.compile("=\\?UTF-8\\?B\\?(.*?)\\?=");
Matcher m = p.matcher(source);
StringBuilder out = new StringBuilder();
int curPos = 0;
while (m.find()) {
out.append(source.substring(curPos, m.start()));
curPos = m.end();
String content = new String(Base64.getDecoder().decode(m.group(1)), StandardCharsets.UTF_8);
out.append(content);
}
out.append(source.substring(curPos));
System.out.println(out.toString());
If I run that, I get:
Header: headervalue
Bad: Name: badnamevalue
Unicode: 己欲立而立人,己欲達而達 人,己所不欲,勿施於人。
Which looks exactly like what you want.
Explanation of that code:
It first base64-decodes the input, and turns that into a string. (Your idea of using InputStream is a red herring. That doesn't help at all here. You just want to turn bytes into a string, you do it as per line 3 of that snippet. Pass the byte array and the encoding those bytes are in, that's all you need to do).
It then goes on the hunt for =?UTF-8?B?--base64here--?= inside your base64. The base64-in-the-base64.
It then decoder that base64, turns it into a string in the same fashion, and replaces it.
It just adds everything besides those =?UTF-8?B?...?= segments verbatim.

Related

How to get rid of incorrect symbols during Java NIO decoding?

I need to read a text from file and, for instance, print it in console. The file is in UTF-8. It seems that I'm doing something wrong because some russian symbols are printed incorrectly. What's wrong with my code?
StringBuilder content = new StringBuilder();
try (FileChannel fChan = (FileChannel) Files.newByteChannel(Paths.get("D:/test.txt")) ) {
ByteBuffer byteBuf = ByteBuffer.allocate(16);
Charset charset = Charset.forName("UTF-8");
while(fChan.read(byteBuf) != -1) {
byteBuf.flip();
content.append(new String(byteBuf.array(), charset));
byteBuf.clear();
}
System.out.println(content);
}
The result:
Здравствуйте, как поживае��е?
Это п��имер текста на русском яз��ке.ом яз�
The actual text:
Здравствуйте, как поживаете?
Это пример текста на русском языке.
UTF-8 uses a variable number of bytes per character. This gives you a boundary error: You have mixed buffer-based code with byte-array based code and you can't do that here; it is possible for you to read enough bytes to be stuck halfway into a character, you then turn your input into a byte array, and convert it, which will fail, because you can't convert half a character.
What you really want is either to first read ALL the data and then convert the entire input, or, to keep any half-characters in the bytebuffer when you flip back, or better yet, ditch all this stuff and use code that is written to read actual characters. In general, using the channel API complicates matters a ton; it's flexible, but complicated - that's how it goes.
Unless you can explain why you need it, don't use it. Do this instead:
Path target = Paths.get("D:/test.txt");
try (var reader = Files.newBufferedReader(target)) {
// read a line at a time here. Yes, it will be UTF-8 decoded.
}
or better yet, as you apparently want to read the whole thing in one go:
Path target = Paths.get("D:/test.txt");
var content = Files.readString(target);
NB: Unlike most java methods that convert bytes to chars or vice versa, the Files API defaults to UTF-8 (instead of the useless and dangerous, untestable-bug-causing 'platform default encoding' that most java API does). That's why this last incredibly simple code is nevertheless correct.

Ruby base 64 decoding for Java base 64 encoding

I having a string that is encoded in java using
data = new String(Base64.getEncoder().encode(encVal), StandardCharsets.UTF_8);
I am receiving this encoded data as an API response. I want to base64 decode this in ruby. I am using
Base64.strict_decode64(data)
for this. but this is not working. Can anyone help me with this?
Your Java code is correct:
byte[] encVal = "Hello World".getBytes();
String data = new String(Base64.getEncoder().encode(encVal), StandardCharsets.UTF_8);
System.out.println(data); // SGVsbG8gV29ybGQ=
The SGVsbG8gV29ybGQ= decodes correctly using multiple tools, e.g. https://www.base64decode.org/.
You are observing garbage characters decoding your value most likely due to an error in creating byte[]. Possibly you have to specify the correct encoding when creating byte[].

converting String to ASCII--> BASE64

I'm writing a REST client from a C# usage example. Now i need to convert a string in the proper format but can't find the equivalent method on Java.
original:
string Credentials = Convert.ToBase64String(ASCIIEncoding.ASCII.GetBytes(string);
At this point I've done this:
String Credentials = new String(DatatypeConverter.parseBase64Binary(String));
but i still need the ASCII conversion and I'm not sure that the things I've fount will work fine, like: Convert character to ASCII numeric value in java
any clues?
Thank you.
If you're using java 8 you should take a look at its new Base64 class. It will provide you with a Base64.Encoder whose encodeToString(byte[] src) method accepts a byte array and return a base64 encoded String.
String base64 = Base64.getEncoder().encodeToString("I'm a String".getBytes());
System.out.println(base64); // prints SSdtIGEgU3RyaW5n

Input byte array has incorrect ending byte at 40

I have a string that is base64 encoded. It looks like this:
eyJibGExIjoiYmxhMSIsImJsYTIiOiJibGEyIn0=
Any online tool can decode this to the proper string which is {"bla1":"bla1","bla2":"bla2"}. However, my Java implementation fails:
import java.util.Base64;
System.out.println("payload = " + payload);
String json = new String(Base64.getDecoder().decode(payload));
I'm getting the following error:
payload = eyJibGExIjoiYmxhMSIsImJsYTIiOiJibGEyIn0=
java.lang.IllegalArgumentException: Input byte array has incorrect ending byte at 40
What is wrong with my code?
Okay, I found out. The original String is encoded on an Android device using android.util.Base64 by Base64.encodeToString(json.getBytes("UTF-8"), Base64.DEFAULT);. It uses android.util.Base64.DEFAULT encoding scheme.
Then on the server side when using java.util.Base64 this has to be decoded with Base64.getMimeDecoder().decode(payload) not with Base64.getDecoder().decode(payload)
I was trying to use the strings from the args. I found that if I use arg[0].trim() that it made it work. eg
Base64.getDecoder().decode(arg[0].trim());
I guess there's some sort of whitespace that gets it messed up.
Maybe too late, but I also had this problem.
By default, the Android Base64 util adds a newline character to the end of the encoded string.
The Base64.NO_WRAP flag tells the util to create the encoded string without the newline character.
Your android app should encode src something like this:
String encode = Base64.encodeToString(src.getBytes(), Base64.NO_WRAP);

Base64 InputStream to String

I have been trying to get an input stream reading a file, which isa plain text and has embeded some images and another files in base64 and write it again in a String. But keeping the encoding, I mean, I want to have in the String something like:
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCIf
IiEmKzcvJik0KSEiMEExNDk7Pj4+JS5ESUM8SDc9Pjv/2wBDAQoLCw4NDhwQEBw7KCIoOzs7Ozs7
I have been trying with the classes Base64InputStream and more from packages as org.apache.commons.codec but I just can not fiugure it out. Any kind of help would be really appreciated. Thanks in advance!
Edit
Piece of code using a reader:
BufferedReader br= new BufferedReader(new InputStreamReader(bodyPart.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
Getting as a result something like: .DIC;ÿÛC;("(;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;ÿÀ##"ÿÄ
Have you tried doing this:
final byte[] bytes64bytes = Base64.encodeBase64(IOUtils.toByteArray(is));
final String content = new String(bytes64bytes);
A text file containing some base64 data can be read with the charset of the rest of the file.
Base64 encoding is a mean to encode bytes in a limited set of characters that are unchanged with almost all char encodings, for example ASCII or UTF-8.
Base64 isn't a charset encoding, you don't have to specify you have some base64 encoded data when reading a file into a string.
So if your text file is generally UTF-8 (that's probable), you can read it without problem even if it contains a base64 encoded stream. Simply use a basic reader and don't use a Base64InputStream if you don't want to decode it.
When opening a file with a reader, you have to specify the encoding. If you don't know it, I suggest you test with the probable ones, like UTF-8, US-ASCII or ISO-8859-1.
If you have a normal InputStream object than You can directly get Base64 encoded stream from it using apache common library class Base64InputStream constructor
I found the solution, inspired by this post getting base64 content string of an image from a mimepart in Java
I think it is kind of stupid decode and encode again the base64 code, but it is the only way I found to manage this issue. If someone could give a better solution, it would be also really appreciated.
Thanks

Categories