is it possible to retain Euro symbol post encoding for example.
HttpClient httpClient = getHttpClient();
// set POST method details
PostMethod post = new PostMethod(url_p);
post.setRequestHeader(
"Content-Type", PostMethod.FORM_URL_ENCODED_CONTENT_TYPE);
String beforeEncoding = "Price is €100";
String afterEncoding = java.net.URLEncoder.encode(beforeEncoding,UTF-8);
post.setRequestBody(afterEncoding);
httpClient.executeMethod(post);
it displays Price+is+%80100
is possible to display Price+is+€100
If the communication is using UTF-8 (which is sensible for the € Symbol), you should do:
String afterEncoding = java.net.URLEncoder.encode(beforeEncoding, "UTF-8");
String afterEncoding = java.net.URLEncoder.encode(beforeEncoding, StandardCharsets.UTF_8);
The overloaded encode without Encoding is deprecated anyway.
Mind, that System.out uses the platform's Encoding: System.getProperty("file.encoding") or Charset.defaultCharset().
After comment
Or do not encode at all, and set the encoding of the body.
PostMethod post = new PostMethod(url);
post.getParams().setContentCharset("UTF-8");
No, not with URLEncoder.encode. If your example is that trivial, you may stick to a simple String.replace:
String beforeEncoding = "Price is €100";
String afterEncoding = beforeEncoding.replace(' ', '+');
System.out.println(afterEncoding);
Related
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
I am using Google translator API to generate Arabic property file from English property file.
Making a URL connection and making a GET request to the URL.,passing original language, translation language and value to be translated
URLConnection urlCon = null;
String urlStr = "https://www.googleapis.com/language/translate/v2";
URL url = new URL(urlStr + "?key=" + apikey + "&source=" + origlang + "&target=" + translateToLang + "&q=" + value);
urlCon = url.openConnection();
urlCon.setConnectTimeout(1000 * 60 * 5);
urlCon.setReadTimeout(1000 * 60 * 5);
urlCon.setDoInput(true);
urlCon.setDoOutput(true);
urlCon.setUseCaches(false);
((HttpURLConnection) urlCon).setRequestMethod("GET");
urlCon.setRequestProperty("Accept-Charset", "UTF-8");
Reading the response from the URL connection through inputstream reader. Passing UTF-8 in the encoding parameter.
BufferedReader br = new BufferedReader(new InputStreamReader(((URLConnection) urlCon).getInputStream(), "UTF-8"));
/* Reading the response line by line */
StringBuffer responseString = new StringBuffer();
String nextLine = null;
while ((nextLine = br.readLine()) != null) {
responseString.append(nextLine);
}
// if response is null or empty, throw exception
String response = responseString.toString();
Parsing the JSON received through GSON parser
JsonElement jelement = new JsonParser().parse(response);
JsonObject jobject = jelement.getAsJsonObject();
jobject = jobject.getAsJsonObject("data");
JsonArray jarray = jobject.getAsJsonArray("translations");
jobject = jarray.get(0).getAsJsonObject();
String result = jobject.get("translatedText").toString();
Writing the translated value in a new property file through fileoutstream
FileOutputStream foutStream = new FileOutputStream(outFile);
foutStream.write(key.getBytes());
foutStream.write("=".getBytes());
foutStream.write(transByte.getBytes());foutStream.write("\n".getBytes());
The issue is I am getting garbled text(?????) written in the new property file for Arabic language.
When you call transByte.getBytes(), the Arabic translation is encoded with your platform default encoding, which will only handle Arabic if your machine is configured for UTF-8 or Arabic. Otherwise, characters will be replaced by '�' or '?' .
Create a new Properties instance, and populate it using setProperty() calls. Then when you store it, the proper escaping will be applied to your Arabic text, which is necessary because property files are encoded with ISO-8859-1 (an encoding for Western Latin characters).
Alternatively, you can store the Properties using a Writer instance that is configured with whatever encoding you choose, but the encoding isn't stored in the file itself, so you will need meta-data or a convention to set the correct encoding when reading the file again.
Finally, you can store the Properties in an XML format, which will use UTF-8 by default, or you can specify another encoding. The file itself will specify the encoding, so it's easier to use an optimal encoding for each language.
Trying to emit a file format using custom string concatenation, as you are doing, is an oft-repeated recipe for disaster. Whether it's XML, JSON, or a simple properties file, it's far too easy to overlook special cases that require escape sequences, etc. Use a library designed to emit the format instead.
I have DB which all of its columns are set to be "hebrew_general_ci".
When I try to manually insert hebrew values to my DB, or through Postman, I can see that the values at the DB are indeed in hebrew.
But, When I try to insert the values from my app (android app - coded in java), the values become question marks - ????
I tried to code my text to UTF-8 at the app itself but it didn't work.
here is the code which suppose to do this:
private String POST(String url, String jsonParamsAsString) {
String result = "";
String fixedUrl = url.replace(" ","%20");
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost(fixedUrl);
byte ptext[] = jsonParamsAsString.getBytes();
jsonParamsAsString = new String(ptext, "UTF-8");
StringEntity input = new StringEntity(jsonParamsAsString);
input.setContentType("application/json; charset=utf-8" );
//input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response = httpClient.execute(postRequest);
result = convertInputStreamToString(response.getEntity().getContent());
/*byte ptext[] = result.getBytes();
result = new String(ptext, "UTF-8");*/
} catch (Exception e) {
Log.d("InputStream", e.getLocalizedMessage());
}
return result;
}
You need to set your encoding for the database as utf8 / utf8_general_ci or utf8mb4 / utf8mb4_general_ci if you are running latest version of MySQL & need to handle emoji. Here is the documentation. Basically, you don't need to set your table to a specific char set for a particular language. I've used the above settings and it handles Arabic, Russian, Chinese, English, etc. out of the box, it is language agnostic and just works. Good luck.
Edit: You also need to make sure your query connection has the following two parameters: useUnicode=yes and characterEncoding=UTF-8
This seems like an extremely easy problem but alas I cannot figure it out nor find a solution anywhere else. I'm concatenating a string that has a % within and for some reason it adds the number 25 after the %. Anyone know of a solution to this easy problem?
String buttonCheck = "%26" + DATABASE.getValue("buttonCheck") + "%26";
Comes out to
"%2526value%2526"
EDIT: It has become apparent that the issue is actually within URL encoding and I will add more relevant data to the issue.
I am developing an Android App that parses HTML from a site and allows the user to interact with it through the Android UI. I am having an issue with encoding % into a parameter for a form.
public class CLASS extends Activity
{
DefaultHttpClient client = new DefaultHttpClient;
String url = "http://www.url.com"
HttpRequestBase method = new HttpPost(url);
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
nvps.add("buttoncheck", "%26" + DATABASE.getValue("buttonCheck") + "%26");
//DATABASE is simply a class that handles a HashMap
HttpPost methodPost = (HttpPost) method;
methodPost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));
execute(method);
}
Rather than sending the form value of
buttoncheck=%26value%26
I get
buttoncheck=%2526value%2526
It won't come out as "%2526value%2526" in buttonCheck. I strongly suspect you're looking at a value later on - for example, after URI encoding. Work out what's doing that encoding, and what you actually want to be encoded.
Just to be clear, the 25 is completely distinct from the 26. You'll see the same thing if you get rid of the 26 completely, with
String buttonCheck = "%" + DATABASE.getValue("buttonCheck") + "%";
At that point I suspect you'll get
%25value%25
Basically something is just encoding the % as %25. For example, this would do it:
import java.net.*;
public class Test {
public static void main(String[] args) throws Exception {
String input = "%value%";
String encoded = URLEncoder.encode(input, "utf-8");
System.out.println(encoded); // Prints %25value%25
}
}
apparently the string went through "percent-encoding" when it becomes part of a URI.
If that's the case, you should not do percent-encoding so early. instead
String buttonCheck = "&" + DATABASE.getValue("buttonCheck") + "&";
which will end up in the URI as
"%26value%26"
Based on the answer to this question, I would guess that the literal % is being encoded to %25 in your string. Which explains the added 25. Without seeing relevant code, we won't be able to know why it gets there in the first place.