I've tried to use java.net.URI to manipulate query strings but I failed to even on very simple task like getting the query string from one url and placing it in another.
Do you know how to make this code below work
URI sample = new URI("test?param1=x%3D1");
URI uri2 = new URI(
"http",
"domain",
"/a-path",
sample.getRawQuery(),
sample.getFragment());
Call to uri2.toASCIIString() should return: http://domain/a-path?param1=x%3D1
but it returns: http://domain/a-path?param1=x%253D1 (double encoding)
if I use getQuery() instead of getRawQuery() the query string is not encoded at all and the url looks like this: http://domain/a-path?param1=x=1
The problem is that the second constructor will encode the query and fragment using URL encoding. But = is a legal URI character, so it will not encode that for you; and % is not a legal URI character, so it will encode it. That's exactly the opposite of what you want, in this case.
So, you can't use the second constructor. Use the first one, by concatenating the parts of the string together yourself.
Could you wrap the call to getQuery() with a call to java.net.URLEncoder.encode(String)?
URI sample = new URI("test?param1=x%3D1");
URI uri2 = new URI(
"http",
"domain",
"/a-path",
URLEncoder.encode(sample.getQuery(), "UTF-8"),
sample.getFragment());
Related
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.
URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding
I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.
Guava 15 has now added a set of straightforward URL escapers.
The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.
Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()
The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil
Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}
In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri
In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.
Use this:
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
I have an application that is calling a rest service. I need to pass it a URL and right now I'm creating the URL by concatenating a string.
I'm doing it this way:
String urlBase = "http:/api/controller/";
String apiMethod = "buy";
String url = urlBase + apiMethod;
The above is fake obviously, but the point is I'm using simple string concats.
Is this the best practice? I'm relatively new to Java. Should I be building a URL object instead?
Thanks
if you have a base path which needs some additional string to be added to it you have 2 options:
First is, using String.format():
String baseUrl = "http:/api/controller/%s"; // note the %s at the end
String apiMethod = "buy";
String url = String.format(baseUrl, apiMethod);
Or using String.replace():
String baseUrl = "http:/api/controller/{apiMethod}";
String apiMethod = "buy";
String url = baseUrl.replace("\\{apiMethod}", apiMethod);
The nice thing about both answers is, that the string that needs to be inserted, doesn't have to be at the end.
If you are using jersey-client. The following would be the best practice to access the subresources without making the code ugly
Resource: /someApp
Sub-Resource: /someApp/getData
Client client = ClientBuilder.newClient();
WebTarget webTarget = client.target("https://localhost:7777/someApp/").path("getData");
Response response = webTarget.request().header("key", "value").get();
If you are using plain Java, it's better to use dedicated class for URL building which throws exception if provided data are invalid in semantic way.
It has various constructors, you can read about it here.
Example
URL url = new URL(
"http",
"stackoverflow.com",
"/questions/50989746/creating-a-url-using-java-whats-the-best-practive"
);
System.out.println(url);
Currently there is final URL url = new URL(urlString); but I run into server not supporting non-ASCII in path.
Using Java (Android) I need to encode URL from
http://acmeserver.com/download/agc/fcms/儿子去哪儿/儿子去哪儿.png
to
http://acmeserver.com/download/agc/fcms/%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF/%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF.png
just like browsers do.
I checked URLEncoder.encode(s, "UTF-8"); but it also encodes / slashes
http%3A%2F%2acmeserver.com%2Fdownload%2Fagc%2Ffcms%2F%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF%2F%E5%84%BF%E5%AD%90%E5%8E%BB%E5%93%AA%E5%84%BF.png
Is there way to do it simply without parsing string that the method gets?
from http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
B.2.1 Non-ASCII characters in URI attribute values Although URIs do
not contain non-ASCII values (see [URI], section 2.1) authors
sometimes specify them in attribute values expecting URIs (i.e.,
defined with %URI; in the DTD). For instance, the following href value
is illegal:
...
We recommend that user agents adopt the following convention for
handling non-ASCII characters in such cases:
Represent each character in UTF-8 (see [RFC2279]) as one or more
bytes.
Escape these bytes with the URI escaping mechanism (i.e., by
converting each byte to %HH, where HH is the hexadecimal notation of
the byte value).
You should just encode the special characters and the parse them together. If you tried to encode the entire URI then you'd run into problems.
Stick with:
String query = URLEncoder.encode("apples oranges", "utf-8");
String url = "http://stackoverflow.com/search?q=" + query;
Check out this great guide on URL encoding.
That being said, a little bit of searching suggests that there may be other ways to do what you want:
Give this a try:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
(You will need to have those spaces encoded so you can use it for a request.)
This takes advantage of a couple features available to you in Android
classes. First, the URL class can break a url into its proper
components so there is no need for you to do any string search/replace
work. Secondly, this approach takes advantage of the URI class
feature of properly escaping components when you construct a URI via
components rather than from a single string.
The beauty of this approach is that you can take any valid url string
and have it work without needing any special knowledge of it yourself.
final URL url = new URL( new URI(urlString).toASCIIString() );
worked for me.
I did it as below, which is cumbersome
//was: final URL url = new URL(urlString);
String asciiString;
try {
asciiString = new URL(urlString).toURI().toASCIIString();
} catch (URISyntaxException e1) {
Log.e(TAG, "Error new URL(urlString).toURI().toASCIIString() " + urlString + " : " + e1);
return null;
}
Log.v(TAG, urlString+" -> "+ asciiString );
final URL url = new URL(asciiString);
url is later used in
connection = (HttpURLConnection) url.openConnection();
String arg="http://www.example.com/user.php?id=<URLRequest Method='GetByUID' />";
java.net.URI uri = new java.net.URI( arg );
java.awt.Desktop desktop = java.awt.Desktop.getDesktop();
desktop.browse( uri );
I want to open the given link in default browser with the above code but it says the url is invalid...i tried escaping characters like ' also but its not working.
If i replace String arg="www.google.com"; then there is no problem and I am able to open google.com.
Please help.
Your string contains characters that aren't valid in a URI, per RFC 2396. You need to properly encode the query parameters. Many utilities support that, like the standard URLEncoder (lower level), JAX-RS UriBuilder, Spring UriUtils, Apache HttpClient URLEncodedUtils and so on.
Edit: Oh, and the URI class can handle it, too, but you have to use a different constructor:
URI uri = new URI("http", "foo.com", null, "a=<some garbage>&b= |{$0m3 m0r3 garbage}| &c=imokay", null);
System.out.println(uri);
Outputs:
http://foo.com?a=%3Csome%20garbage%3E&b=%20%7C%7B$0m3%20m0r3%20garbage%7D%7C%20&c=imokay
which, while ugly, is the correct representation.
Thats because it is invalid. <URLRequest Method='GetByUID' /> should be replaced by the value of the id, or an expression that returns the id which you can concatenate with the arg string. Something like
String arg="http://www.example.com/user.php?id="+getByUID(someUid);
import java.net.URL;
import java.net.URLEncoder;
class ARealURL {
public static void main(String[] args) throws Exception {
String s1 = "http://www.example.com/user.php?id=";
String param = "<URLRequest Method='GetByUID' />";
String encodedParam = URLEncoder.encode(param,"UTF-8");
URL url = new URL(s1+encodedParam);
System.out.println(url);
}
}
Output
http://www.example.com/user.php?id=%3CURLRequest+Method%3D%27GetByUID%27+%2F%3E