Related
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
Giving the following implementation I face the problem that, on another system, the XML file is missing the Umlaute (ä, ü, ö) compared to the origin XML file. Instead of the Umlaute the replacement character is inserted in the XML file. (0xEF 0xBF 0xBD (efbfbd))
Get a zip file containing a XML with Umlauts
Decompress the zip file
Encode the xml content to a Base64 payload and save it to the db
Querys the entity
Get the Base64 payload
Decode the Base64 content
Decoded Base64 content is a XML which should contain the origin Umlauts
Whats driving me crazy is the fact that the decoded Base64 content is missing the Umlaute on another system. Instead of the umlaute I get the replacement character. On my system the same implementation is working without the replacement.
The following code is just a MCVE to explain the problem which works fine on my system but on a other system (Windows Server 2013) misses the umlaute after decode.
String requestUrl = "https://myserver/mypath/Message_166741.zip";
HttpGet httpget = new HttpGet(String requestUrl = "https://myserver/mypath/Message_166741.zip";);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
String content = null;
content = new String(decompressedInputStream, StandardCharsets.UTF_8);
String originFileName = new SimpleDateFormat("yyyyMMddHHmm'_origin.xml'").format(new Date());
String originFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", originFileName);
// File contains the expected umlauts
FileUtils.writeStringToFile(new File(originFileNameWithPath), content);
String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(content.getBytes("UTF-8")));
String payload = Base64.encodeBase64String(ZipUtils.compress(content.getBytes()));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(ZipUtils.compress(content.getBytes())));
String xmlMessageJavaBase64;
byte[] compressedBinaryJavaBase64 = java.util.Base64.getDecoder().decode(payloadJavaBase64);
byte[] decompressedBinaryJavaBase64= ZipUtils.decompress(compressedBinaryJavaBase64);
xmlMessageJavaBase64 = new String(decompressedBinaryJavaBase64, "UTF-8");
String xmlMessageUTF8;
byte[] compressedBinaryUTF8 = java.util.Base64.getDecoder().decode(payloadUTF8);
byte[] decompressedBinaryUTF8 = ZipUtils.decompress(compressedBinaryUTF8);
xmlMessageUTF8 = new String(decompressedBinaryUTF8, "UTF-8");
String xmlMessage;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
xmlMessage = new String(decompressedBinary, "UTF-8");
String processedFileName = new SimpleDateFormat("yyyyMMddHHmm'_processed.xml'").format(new Date());
String processedFileNameUTF8 = new SimpleDateFormat("yyyyMMddHHmm'_processedUTF8.xml'").format(new Date());
String processedFileNameJavaBase64 = new SimpleDateFormat("yyyyMMddHHmm'_processedJavaBase64.xml'").format(new Date());
// These files do not contain the umlauts anymore.
// Instead of the umlauts a replacement character is inserted (0xEF 0xBF 0xBD (efbfbd))
String processedFileNameWithPath = String.format("C:\\temp\\Tests\\%1$s", processedFileName);
String processedFileNameWithPathUTF8 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameUTF8);
String processedFileNameWithPathJavaBase64 = String.format("C:\\temp\\Tests\\%1$s", processedFileNameJavaBase64);
FileUtils.writeStringToFile(new File(processedFileNameWithPath), xmlMessage);
FileUtils.writeStringToFile(new File(processedFileNameWithPathUTF8), xmlMessageUTF8);
FileUtils.writeStringToFile(new File(processedFileNameWithPathJavaBase64), xmlMessageJavaBase64);
The three files are just for testing purpose but I hope you getting the problem
Edit
Both ways create XML file with ü, ö, ä on my machine
Only the WITHOUT implementation create an XML XML file with ü, ö, ä on another system The "content" string of WITH UTF-8 contains for ü =>
// WITHOUT UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = new String(dci);
byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary);
// WITH UTF-8 IN BYTE[] => STRING CTOR
byte[] dci = decompress(inputStream);
content = String(dci, StandardCharsets.UTF_8);;
byte[] compressedBinary = java.util.Base64.getDecoder().decode(content);
byte[] decompressedBinary = ZipUtils.decompress(compressedBinary);
String xml = new String(decompressedBinary, "UTF-8");
Edit #2
There also seems to be a difference between running the code in IntelliJ and outside of IntelliJ on my machine. Did not know that this makes such a huge difference. So - if I run the code outside of IntelliJ (java.exe -jar myjarfile) the WITH UTF8 Part replaces the Ü. with ... I don't know. Notepad++ shows xFC. Funny: My raspberry pi shows both files with Ü where my Windows / notepad++ shows xFC.
That whole thing confuses me and I would like to understand whats the problem is. Also because the XML file contains the UTF8 as encode in header.
Edit #3 Final Solution
// ## SERVER
// Get ZIP from request URL
HttpGet httpget = new HttpGet(requestUrl);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
// Produces a XML string which SHOULD contain ü, ö, ä
String xmlOfZipFileContent = new String(decompressedInputStream, StandardCharsets.UTF_8);
// Just for testing write to file
String xmlOfZipFileSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_original.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfZipFileSavePath), xmlOfZipFileContent, StandardCharsets.UTF_8);
// The payloadExplicitUtf8 gets stored into the DB
String payload = java.util.Base64.getEncoder().encodeToString(ZipUtils.compress(xmlOfZipFileContent.getBytes(StandardCharsets.UTF_8)));
// Store payload to db
// Client queries database and gets the payload
// payload = dbEntity.get().payload
// The following three lines is on client
byte[] compressedBinaryPayload = java.util.Base64.getDecoder().decode(payload);
byte[] decompressedBinaryPayload = ZipUtils.decompress(compressedBinaryPayload);
String xmlMessageOutOfPayload = new String(decompressedBinaryPayload, StandardCharsets.UTF_8);
String xmlOfPayloadSavePath = String.format("C:\\temp\\Tests\\%1$s", new SimpleDateFormat("yyyyMMddHHmm'_payload.xml'").format(new Date()));
FileUtils.writeStringToFile(new File(xmlOfPayloadSavePath), xmlMessageOutOfPayload, StandardCharsets.UTF_8);
If I understood correctly, your situation seems to be the following:
// Decompress data from the server, it's in ISO-8859-1 or similar 1 byte encoding
byte[] dci = decompress(inputStream);
// Data gets corrupted because of wrong charset
// This is where ü gets converted to unicode replacement character
content = new String(dci, StandardCharsets.UTF_8);
The rest of the code uses UTF8 explicitly, but it doesn't matter as the data has already been corrupted at this point. In the end you expect an UTF-8 encoded file.
Also because the XML file contains the UTF8 as encode in header.
That doesn't prove anything. If you treat it as just a text file, you can write it out in as many encodings as you want to, and it would still claim to be UTF8.
InputStream inputStream = entity.getContent();
byte[] decompressedInputStream = decompress(inputStream);
Fine, and it is assumed that the bytes are in UTF-8, as:
String content = new String(decompressedInputStream, StandardCharsets.UTF_8);
Should the bytes not be in UTF-8, you could try Windows Latin-1:
Charset.forName("Windows-1252")
Otherwise decompressedInputStream can be used whereever content is converted to bytes in UTF-8.
...
The FileUtils.writeStringToFile without encoding specified uses the default platform encoding.
// File contains the expected umlauts
//FileUtils.writeStringToFile(new File(originFileNameWithPath), content);
Better is to ensure that UTF-8 is written. Either add the encoding to convert the Unicode String to bytes in UTF-8, or simply write the original bytes:
Files.write(Paths.get(originFileNameWithPath), decompressedInputStream);
Also the Base64 encoded UTF-8 bytes of the String should be used:
String payloadUTF8 = Base64.encodeBase64String(ZipUtils.compress(
content.getBytes(StandardCharsets.UTF_8)));
String payloadJavaBase64 = new String(java.util.Base64.getEncoder().encode(
ipUtils.compress(content.getBytes(StandardCharsets.UTF_8))));
The standard JavaSE Base64 will do; though do not use its decodeString and encodeString as that uses ISO-8859-1, Latin-1.
I am working o a mail application and I have some troubles with decoding mime encoded text. I am using MimeUtility.decode() but it doesn't for every encoded text. Some texts are decoded properly but others couldn't.
These encoded text which can't be decoded especially have utf-8 and iso-8859-9 encoding type.
How I can solve this issue??
This is the code I used for decoding
MimeUtility.decodeText(text);
These are example of failing text:
****Solution***** (Thanks to #user_xtech007)
I solve this with problem with decoding encoded parts by splitting multiple encoded parts with regex .
Here is the codes of method I using
private final String ENCODED_PART_REGEX_PATTERN="=\\?([^?]+)\\?([^?]+)\\?([^?]+)\\?=";
private String decode(String s)
{
Pattern pattern=Pattern.compile(ENCODED_PART_REGEX_PATTERN);
Matcher m=pattern.matcher(s);
ArrayList<String> encodedParts=new ArrayList<String>();
while(m.find())
{
encodedParts.add(m.group(0));
}
if(encodedParts.size()>0)
{
try
{
for(String encoded:encodedParts)
{
s=s.replace(encoded, MimeUtility.decodeText(encoded));
}
return s;
} catch(Exception ex)
{
return s;
}
}
else
return s;
}
convert the string you receive into byte array and then use this to decode utf-8 text
String s2 = new String(bytes, "UTF-8");
first convert the ISO-8859-1 text into bye array then convert it to string
byte[] b2 = s.getBytes("ISO-8859-1");
For getting the encoded string from the uri , you can use Regex
You can also decode this string by putting
System.setProperty("mail.mime.decodetext.strict", "false");
Before you use MimeUtility.decodeText(text);
This will ensure that also "inner words" get decoded:
The mail.mime.decodetext.strict property controls decoding of MIME
encoded words. The MIME spec requires that encoded words start at the
beginning of a whitespace separated word. Some mailers incorrectly
include encoded words in the middle of a word. If the
mail.mime.decodetext.strict System property is set to "false", an
attempt will be made to decode these illegal encoded words. The
default is true.
https://docs.oracle.com/javaee/7/api/javax/mail/internet/MimeUtility.html
I'm trying to parse Facebook signed_request inside Java Servlet's doPost. And I decode the signed request using commons-codec-1.3's Base64.
Here is the code which I used to do it inside servlet's doPost
String signedRequest = (String) req.getParameter("signed_request");
String payload = signedRequest.split("[.]", 2)[1];
payload = payload.replace("-", "+").replace("_", "/").trim();
String jsonString = new String(Base64.decodeBase64(payload.getBytes()));
when I System.out the jsonString it's malformed. Sometime's it misses the ending } of JSON
sometime it misses "} in the end of the string.
How can I get the proper JSON response from Facebook?
facebook is using Base64 for URLs and you are probably trying to decode the text using the standard Base64 algorithm.
among other things, the URL variant doesn't required padding with "=".
you could add the required characters in code (padding, etc)
you can use commons-codec 1.5 ( new Base64(true)), where they added support for this encoding.
The Facebook is sending you "unpadded" Base64 values (the URL "standard") and this is problematic for Java decoders that don't expect it. You can tell you have the problem when the Base64 encoded data that you want to decode has a length that is not a multiple of 4.
I used this function to fix the values:
public static String padBase64(String b64) {
String padding = "";
// If you are a java developer, *this* is the critical bit.. FB expects
// the base64 decode to do this padding for you (as the PHP one
// apparently
// does...
switch (b64.length() % 4) {
case 0:
break;
case 1:
padding = "===";
break;
case 2:
padding = "==";
break;
default:
padding = "=";
}
return b64 + padding;
}
I have never done this in Java so I don't have a full answer, but the fact that you are sometimes losing one and sometimes two characters from the end of the string suggests it may be an issue with Base64 padding. You might want to output the value of payload and see if when it ends with '=' then jsonString is missing '}' and when payload ends with '==' then jsonString is missing '"}'. If that seems to be the case then something is going wrong with the interpretation of the equals signs at the end of payload which are supposed to represent empty bits.
Edit: On further reflection I believe this is because Facebook is using Base64 URL encoding (which does not add = as pad chars) instead of regular Base64, whereas your decoding function is expecting regular Base64 with the trailing = chars.
I've upgraded to common-codec-1.5 using code very similar to this and am not experiencing this issue. Have you confirmed that payload really is malformed by using an online decoder?
Hello in the year 2021.
The other answers are obsolete, because with Java 8 and newer you can decode the base64url scheme by using the new Base64.getUrlDecoder() (instead of getDecoder).
The base64url scheme is a URL and filename safe dialect of the main base64 scheme and uses "-" instead of "+" and "_" instead of "/" (because the plus and slash chars have special meanings in URLs). Also it does not use "=" chars for the padding (0 to 4 chars) at the end of string.
Here is how you can parse the Facebook signed_request parameter in Java into a Map object:
public static Map<String, String> parseSignedRequest(HttpServletRequest httpReq, String facebookSecret) throws ServletException {
String signedRequest = httpReq.getParameter("signed_request");
String splitArray[] = signedRequest.split("\\.", 2);
String sigBase64 = splitArray[0];
String payloadBase64 = splitArray[1];
String payload = new String(Base64.getUrlDecoder().decode(payloadBase64));
try {
Mac sha256_HMAC = Mac.getInstance("HmacSHA256");
SecretKeySpec secretKey = new SecretKeySpec(facebookSecret.getBytes(), "HmacSHA256");
sha256_HMAC.init(secretKey);
String sigExpected = Base64.getUrlEncoder().withoutPadding().encodeToString(sha256_HMAC.doFinal(payloadBase64.getBytes()));
if (!sigBase64.equals(sigExpected)) {
LOG.warn("sigBase64 = {}", sigBase64);
LOG.warn("sigExpected = {}", sigExpected);
throw new ServletException("Invalid sig = " + sigBase64);
}
} catch (IllegalStateException | InvalidKeyException | NoSuchAlgorithmException ex) {
throw new ServletException("parseSignedRequest", ex);
}
// use Jetty JSON parsing or some other library
return (Map<String, String>) JSON.parse(payload);
}
I have used the Jetty JSON parser:
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<version>9.4.43.v20210629</version>
</dependency>
but there are more libraries available in Java for parsing JSON.