How to implement rawurldecode in Java? - java

I'd like to convert PHP code to Java, that is to decode a string stored as an encoded URI format.
That is, change
This%20is%20a%20%2Burl%2B%21
into
This is a +url+!
I've looked at java.net.URI, but there are no suitable examples, and it seems that anything to be decoded by it needs to be in a proper URI format. I'd like to convert a string that isn't in proper format, but contains HTML encoding.

java.net.URLDecoder.decode("This%20is%20a%20%2Burl%2B%21", "UTF-8");
UTF-8 is of course just an example. Use whatever your input encoding is.

You could use URLDecoder (doc here). It just decodes an x-www-form-urlencoded String.
String decodedString = URLDecoder.decode("This%20is%20a%20%2Burl%2B%21");
System.out.println(decodedString);

Related

Decoding String (from header) encoded by Base64 and RFC2047 in Java

I'm working on a function to decode a string (from a header) that is encoded in both Base64 and RFC2047 in Java.
Given this header:
SGVhZGVyOiBoZWFkZXJ2YWx1ZQ0KQmFkOiBOYW1lOiBiYWRuYW1ldmFsdWUNClVuaWNvZGU6ID0/VVRGLTg/Qj81YmV4NXF5eTU2dUw2SUNNNTZ1TDVMcTY3N3lNNWJleDVxeXk2WUdVNklDTTZZR1U/PSA9P1VURi04P0I/NUxxNjc3eU01YmV4NW9tQTVMaU41cXl5Nzd5TTVZdS81cGE5NXBhODVMcTY0NENDPz0NCg0K
My expected output is:
Header: headervalue Bad: Name: badnamevalue Unicode:
己欲立而立人,己欲達而達人,己所不欲,勿施於人。
The only relevant function that I have found and tried was Base64.decodeBase64(headers), which produces this when printed out:
Header: headervalue Bad: Name: badnamevalue Unicode:
=?UTF-8?B?5bex5qyy56uL6ICM56uL5Lq677yM5bex5qyy6YGU6ICM6YGU?= =?UTF-8?B?5Lq677yM5bex5omA5LiN5qyy77yM5Yu/5pa95pa85Lq644CC?=
To solve this, I've been trying MimeUtility.decode() by converting the byte array returned from Base64.decodeBase64(headers) to InputStream, but the result was identical as above.
InputStream headerStream = new ByteArrayInputStream(Base64.decodeBase64(headers));
InputStream result = MimeUtility.decode(headerStream, "quoted-printable");
Have been searching around the internet but have yet found a solution, wondering if anyone knows ways to decode MIME headers from resulted byte arrays?
Any help is appreciated! It's also my first stack overflow post, apologies if I'm missing anything but please let me know if there's more information that I can provide!
The base64 you have there actually is what you pasted. Including the bizarre =?UTF-8?B? weirdness.
The stuff that follows is again base64.
There's base64-encoded data inside your base-64 encoded data. As Xzibit would say: I put some Base64 in your base64 so you can base64 while you base64. Why do I feel old all of a sudden?
In other words, the base64 input you get is a crazy, extremely inefficient format invented by a crazy person.
My advice is that you tell them to come up with something less insane.
Failing that:
Search the resulting string for the regex pattern and then again apply base64 decode to the stuff in the middle.
Also, you're using some third party base64 decoder, probably apache. Apache libraries tend to suck. Base64 is baked into java, there is no reason to use worse libraries here. I've fixed that; the Base64 in this snippet is java.util.Base64. Its API is slightly different.
String sourceB64 = "SGV..."; // that input base64 you have.
byte[] sourceBytes = Base64.decodeBase64(sourceB64);
String source = new String(sourceBytes, StandardCharsets.UTF_8);
Pattern p = Pattern.compile("=\\?UTF-8\\?B\\?(.*?)\\?=");
Matcher m = p.matcher(source);
StringBuilder out = new StringBuilder();
int curPos = 0;
while (m.find()) {
out.append(source.substring(curPos, m.start()));
curPos = m.end();
String content = new String(Base64.getDecoder().decode(m.group(1)), StandardCharsets.UTF_8);
out.append(content);
}
out.append(source.substring(curPos));
System.out.println(out.toString());
If I run that, I get:
Header: headervalue
Bad: Name: badnamevalue
Unicode: 己欲立而立人,己欲達而達 人,己所不欲,勿施於人。
Which looks exactly like what you want.
Explanation of that code:
It first base64-decodes the input, and turns that into a string. (Your idea of using InputStream is a red herring. That doesn't help at all here. You just want to turn bytes into a string, you do it as per line 3 of that snippet. Pass the byte array and the encoding those bytes are in, that's all you need to do).
It then goes on the hunt for =?UTF-8?B?--base64here--?= inside your base64. The base64-in-the-base64.
It then decoder that base64, turns it into a string in the same fashion, and replaces it.
It just adds everything besides those =?UTF-8?B?...?= segments verbatim.

converting String to ASCII--> BASE64

I'm writing a REST client from a C# usage example. Now i need to convert a string in the proper format but can't find the equivalent method on Java.
original:
string Credentials = Convert.ToBase64String(ASCIIEncoding.ASCII.GetBytes(string);
At this point I've done this:
String Credentials = new String(DatatypeConverter.parseBase64Binary(String));
but i still need the ASCII conversion and I'm not sure that the things I've fount will work fine, like: Convert character to ASCII numeric value in java
any clues?
Thank you.
If you're using java 8 you should take a look at its new Base64 class. It will provide you with a Base64.Encoder whose encodeToString(byte[] src) method accepts a byte array and return a base64 encoded String.
String base64 = Base64.getEncoder().encodeToString("I'm a String".getBytes());
System.out.println(base64); // prints SSdtIGEgU3RyaW5n

Converting decode utf-8 string to file

I am trying to save image which I am receiving from android device. From Android getting utf-8 encode string and below is the code I am using to save.
String test = java.net.URLDecoder.decode(image_base64, "UTF-8");
byte[] data = Base64.decodeBase64(test.getBytes());
FileOutputStream stream = null;
try {
stream = new FileOutputStream("/var/lib/easy-tomcat7/webapps/test/test1.bmp");
stream.write(data);
stream.flush();
test1 += "success";
}
catch (IOException e)
{
test1 = "failuare";
e.getMessage();
}
finally
{
test1 += "finally";
stream.close();
}
File is creating but the it is corrupted. I did lot of research on this but not getting why it is happening. Please help me to solve this issue.
I assume you are using Base64 from Apache Commons Codec.
Note that you are dealing with multiple different kinds of encodings:
URL encoding
Base64 encoding
UTF-8 character encoding
Those are three totally different things, and you should understand all of them to understand what is happening exactly.
Check how exactly the image is encoded that you get from the Android device. Your code is assuming that you are getting it as URL-encoded Base64 data, using the UTF-8 character set. Is that indeed how the Android device is sending the data? You will have to check that with whoever wrote the Android application.
What does the string image_base64 contain? Is it valid, URL-encoded Base64 data?
You shouldn't call getBytes() on the string before you pass it to Base64.decodeBase64 - that will convert the string into a byte array using the default character encoding of the system you're running it on. Just do this instead:
byte[] data = Base64.decodeBase64(test);
To make matters worse, there are several variants of Base64 encoding (as you can see on the Wikipedia page about Base64). It may be the case that whatever variant the Android app used is different from what the Base64 class is using.
Use the encoding also for getBytes()
Base64.decodeBase64(test.getBytes("utf-8"));

Base64 InputStream to String

I have been trying to get an input stream reading a file, which isa plain text and has embeded some images and another files in base64 and write it again in a String. But keeping the encoding, I mean, I want to have in the String something like:
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCIf
IiEmKzcvJik0KSEiMEExNDk7Pj4+JS5ESUM8SDc9Pjv/2wBDAQoLCw4NDhwQEBw7KCIoOzs7Ozs7
I have been trying with the classes Base64InputStream and more from packages as org.apache.commons.codec but I just can not fiugure it out. Any kind of help would be really appreciated. Thanks in advance!
Edit
Piece of code using a reader:
BufferedReader br= new BufferedReader(new InputStreamReader(bodyPart.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
Getting as a result something like: .DIC;ÿÛC;("(;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;ÿÀ##"ÿÄ
Have you tried doing this:
final byte[] bytes64bytes = Base64.encodeBase64(IOUtils.toByteArray(is));
final String content = new String(bytes64bytes);
A text file containing some base64 data can be read with the charset of the rest of the file.
Base64 encoding is a mean to encode bytes in a limited set of characters that are unchanged with almost all char encodings, for example ASCII or UTF-8.
Base64 isn't a charset encoding, you don't have to specify you have some base64 encoded data when reading a file into a string.
So if your text file is generally UTF-8 (that's probable), you can read it without problem even if it contains a base64 encoded stream. Simply use a basic reader and don't use a Base64InputStream if you don't want to decode it.
When opening a file with a reader, you have to specify the encoding. If you don't know it, I suggest you test with the probable ones, like UTF-8, US-ASCII or ISO-8859-1.
If you have a normal InputStream object than You can directly get Base64 encoded stream from it using apache common library class Base64InputStream constructor
I found the solution, inspired by this post getting base64 content string of an image from a mimepart in Java
I think it is kind of stupid decode and encode again the base64 code, but it is the only way I found to manage this issue. If someone could give a better solution, it would be also really appreciated.
Thanks

Java servlet json object containing XML, encoding problems

I have a servlet which should reply to requests in Json {obj:XML} (meaning a Json containing an xml object inside).
The XML is encoded in UTF-8 and has several chars like => पोलैंड.
The XML is in a org.w3c.dom.Document and I am using JSON.org library to parse JSON. When i try to print it on the ServletOutputStream, the characters are not well encoded. I have tested it trying to print the response in a file, but the encoding is not UTF-8.
Parser.printTheDom(documentFromInputStream,byteArrayOutputStream);
OutputStreamWriter oS=new OutputStreamWriter(servletOutputStream, "UTF-8");
oS.write((jsonCallBack+"("));
oS.write(byteArrayOutputStream);
oS.write(");");
I have tryed even in local (without deploing the servlet) the previous and the next code :
oS.write("पोलैंड");
and the result is the same.
Instead when I try to print the document,the file is a well formed xml.
oS.write((jsonCallBack+"("));
Parser.printTheDom(documentFromInputStream,oS);
oS.write(");");
Any help?
Typically, if binary data needs to be part of an xml doc, it's base64 encoded. See this question for more details. I suggest you base64 encode the fields that can have exotic UTF-8 chars and and base64 decode them on the client side.
See this question for 2 good options for base64 encoding/decoding in java.

Categories