'Long time reader, first time poster' here.
I'm in the process of making a bot for a spanish Wiki I administer. I wanted to make it from scratch, since one of the purposes of me making it is to practice Java. However, I ran into some trouble when trying to make GET requests with HttpClient to URIs that contain non-ASCII characters such as á,é,í,ó or ú.
String url = "http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas"
method = new GetMethod(url);
client.executeMethod(method);
When I do the above, GetMethod complains about the URI:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid uri 'http://es.pruebaloca.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras%20de%20las%20Botas&cmlimit=500&format=xml': Invalid query
at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:69)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:120)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Note that in the URI shown in the stack trace, spaces are encoded into %20 and the ís are left as is. That exact same URI works perfectly on a browser, but I can't get around into GetMethod accepting it.
I've also tried doing the following:
URI uri = new URI(url, false);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
This way, URI escaped the is, but double escaped the spaces (%2520)...
http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categor%C3%ADa:Mejoras%2520de%2520las%2520Botas&cmlimit=500&format=xml
Now, if I don't use any spaces in the query, there's no double escaping and I get the desired output. So if there wasn't any possibility of non-ASCII characters, I wouldn't need to use the URI class and wouldn't get the double escaping. In an attempt to avoid the first escaping of the spaces, I tried this:
URI uri = new URI(url, true);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
But the URI class didn't like it:
org.apache.commons.httpclient.URIException: Invalid query
at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2049)
at org.apache.commons.httpclient.URI.<init>(URI.java:167)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:66)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:121)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:39)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Any input on how to avoid this double escaping would be greatly appreciated. I've lurked all around with absolutely no luck.
Thanks!
Edit: The solution that works best for me is parsifal's one, but, as an addition, I'd like to say that setting the path with method.setPath(url) made HttpMethod reject a cookie I needed to save:
Aug 26, 2011 4:07:08 PM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders
WARNING: Cookie rejected: "wikicities_session=900beded4191ff880e09944c7c0aaf5a". Illegal path attribute "/". Path of origin: "http://es.metroid.wikia.com/api.php"
However, if I send the URI to the constructor and forget about the setPath(url), the cookie gets saved without problem.
String url = "http://es.metroid.wikia.com/api.php";
NameValuePair[] query = { new NameValuePair("action", "query"), new NameValuePair("list", "categorymembers"),
new NameValuePair("cmtitle", "Categoría:Mejoras de las Botas"), new NameValuePair("cmlimit", "500"),
new NameValuePair("format", "xml") };
HttpMethod method = null;
...
method = new GetMethod(url); // Or PostMethod(url)
method.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // It had been like this the whole time
method.setQueryString(query);
client.executeMethod(method);
I would recommend using UrlEncoder to encode your queryString values (not the whole queryString).
UrlEncoder.encode("Categoría:Mejoras de las Botas", "UTF-8");
Looking at the documentation of HttpMethodBase, it appears that all String parameters have to be pre-encoded. The simplest solution is to constructor your URL in stages, with setPath() and the variant of setQueryString() that takes an array of name-value parameters.
why don't you try adding the params as NameValuePair, the problem here is that when you escape the URL everything in the URL is escaped including things like http://.. thats why the system is complaining.
you can also escape just the arguments using URLEncoder.encode(), just pass the get params to this & append the return value to the URL.
String url = "http://es.metroid.wikia.com/api.php?"+URLEncoder.encode("action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas");
Related
I created the following simple test to query iTunes:
#Test
fun loadArtist()
{
val restTemplate = RestTemplate()
val builder = UriComponentsBuilder.fromHttpUrl("https://itunes.apple.com/search")
builder.queryParam("term", "howling wolf")
builder.queryParam("entity", "allArtist")
builder.queryParam("limit", 1)
println("\n\nURL ${builder.toUriString()}")
val result = restTemplate.getForObject(builder.toUriString(), String::class.java);
println("Got artist: $result")
}
And the output was unexpected:
URL https://itunes.apple.com/search?term=howling%20wolf&entity=allArtist&limit=1
Got artist:
{
"resultCount":0,
"results": []
}
Pasting the generated URL into a browser does give expected results - artist returned.
https://itunes.apple.com/search?term=howling%20wolf&entity=allArtist&limit=1
Also, hard-coding the query works:
val result = restTemplate.getForObject("https://itunes.apple.com/search?term=howling%20wolf&entity=allArtist&limit=1", String::class.java);
. . the problem only seems to occur for term queries that include spaces.
What went wrong? Other than assemble the URL by hand, how to fix?
Seems like a case of double encoding the whitespace. From the RestTemplate Javadoc:
For each HTTP method there are three variants: two accept a URI
template string and URI variables (array or map) while a third accepts
a URI. Note that for URI templates it is assumed encoding is
necessary, e.g. restTemplate.getForObject("http://example.com/hotel
list") becomes "http://example.com/hotel%20list". This also means if
the URI template or URI variables are already encoded, double encoding
will occur, e.g. http://example.com/hotel%20list becomes
http://example.com/hotel%2520list). To avoid that use a URI method
variant to provide (or re-use) a previously encoded URI. To prepare
such an URI with full control over encoding, consider using
UriComponentsBuilder.
So it looks like getForObject will actually query for https://itunes.apple.com/search?term=howling%2520wolf&entity=allArtist&limit=1 and thus result in an empty result. You can always just replace whitespaces with a "+" in your term or try to make one of those classes skip the encoding process.
I need to replace the spaces inside a string with the % symbol but I'm having some issues, what I tried is:
imageUrl = imageUrl.replace(' ', "%20");
But It gives me an error in the replace function.
Then:
imageUrl = imageUrl.replace(' ', "%%20");
But It still gives me an error in the replace function.
The I tried with the unicode symbol:
imageUrl = imageUrl.replace(' ', (char) U+0025 + "20");
But it still gives error.
Is there an easy way to do it?
String.replace(String, String) is the method you want.
replace
imageUrl.replace(' ', "%");
with
imageUrl.replace(" ", "%");
System.out.println("This is working".replace(" ", "%"));
I suggest you to use a URL Encoder for Encoding Strings in java.
String searchQuery = "list of banks in the world";
String url = "http://mypage.com/pages?q=" + URLEncoder.encode(searchQuery, "UTF-8");
I've ran into issues like this in the past with certain frameworks. I don't have enough of your code to know for sure, but what might be happening is whatever http framework you are using, in my case it was spring, is encoding the URL again. I spent a few days trying to solve a similar problem where I thought that string replace and the URI.builder() was broken. What ended up being the problem was that my http framework had taken my encoded url, and encoded it again. that means that any place it saw a "%20", it would see the '%' charictor and switch it out for '%' http code, "%25", resulting in. "%2520". The request would then fail because %2520 didn't translate into the space my server was expecting. While the issue apeared to be one of my encoding not working, it was really an issue of encoding too many times. I have an example from some working code in one of my projects below
//the Url of the server
String fullUrl = "http://myapiserver.com/path/";
//The parameter to append. contains a space that will need to be encoded
String param 1 = "parameter 1"
//Use Uri.Builder to append parameter
Uri.Builder uriBuilder = Uri.parse(fullUrl).buildUpon();
uriBuilder.appendQueryParameter("parameter1",param1);
/* Below is where it is important to understand how your
http framework handles unencoded url. In my case, which is Spring
framework, the urls are encoded when performing requests.
The result is that a url that is already encoded will be
encoded twice. For instance, if you're url is
"http://myapiserver.com/path?parameter1=param 1"
and it needs to be read by the server as
"http://myapiserver.com/path?parameter1=param%201"
it makes sense to encode the url using URI.builder().append, or any valid
solutions listed in other posts. However, If the framework is already
encoding your url, then it is likely to run into the issue where you
accidently encode the url twice: Once when you are preparing the URL to be
sent, and once again when you are sending the message through the framework.
this results in sending a url that looks like
"http://myapiserver.com/path?parameter1=param%25201"
where the '%' in "%20" was replaced with "%25", http's representation of '%'
when what you wanted was
"http://myapiserver.com/path?parameter1=param%201"
this can be a difficult bug to squash because you can copy the url in the
debugger prior to it being sent and paste it into a tool like fiddler and
have the fiddler request work but the program request fail.
since my http framework was already encoding the urls, I had to unencode the
urls after appending the parameters so they would only be encoded once.
I'm not saying it's the most gracefull solution, but the code works.
*/
String finalUrl = uriBuilder.build().toString().replace("%2F","/")
.replace("%3A", ":").replace("%20", " ");
//Call the server and ask for the menu. the Menu is saved to a string
//rest.GET() uses spring framework. The url is encoded again as
part of the framework.
menuStringFromIoms = rest.GET(finalUrl);
There is likely a more graceful way to keep a url from encoding twice. I hope this example helps point you on the right direction or eliminate a possability. Good luck.
Try this:
imageUrl = imageUrl.replaceAll(" ", "%20");
Replace spaces is not enought, try this
url = java.net.URLEncoder.encode(url, "UTF-8");
I'm trying to use the MapQuest API. The API is a little funny, requiring a JSON string as an input. When this code executes, I've verified the URL is correct that is strung together, but I never get to the Log.v statement after calling HTTPGet(url.toString()). I've done some research and see that this can be caused by missing certificates, but I'm only using an http connection, not https. Of course more work is done after the httpGet, but I've only posted the relevant code. No error is ever thrown, the code just simply stops executing beyond that. I've used essentially the same code, only slightly different URLs for parsing other RESTFUL APIs. Any thoughts?
private JSONObject callMapQuestGeoCoder(Location location)
{
String APIkey=decryptKey(MapQuestEncryptedKey);
StringBuilder url=new StringBuilder();
url.append("http://open.mapquestapi.com/geocoding/v1/reverse?key="+APIkey);
url.append("&callback=renderReverse");
url.append("&json={location:{latLng:{lat:"+location.getLatitude());
url.append(",lng:"+location.getLongitude());
url.append("}}}");
HttpGet httpGet = new HttpGet(url.toString());
Log.v(TAG,""+httpGet);
EDIT: Per advice, I stuck the code in a try catch, and got this stack trace (Modified only to remove my API Key, and change the location slightly). The character that isn't valid is the { character.
10-26 17:42:58.733: E/GeoLoc(19767): Unknown Exception foundjava.lang.IllegalArgumentException: Illegal character in query at index 117: http://open.mapquestapi.com/geocoding/v1/reverse?key=API_KEY&callback=renderReverse&json={location:{latLng:{lat:33.0207687439397,lng:-74.50922234728932}}}
According to the URI Specification (RFC 3986), the curly bracket characters are neither "reserved characters" or "unreserved characters". That means that they can only be used in a URL (or any other kind of URI) if they are "percent encoded".
Your URL contains plain (unencoded) curly bracket characters. That is invalid according to the spec ... and it is why the HttpGet constructor is throwing an exception.
Pearson's answer gives one possible way to create a legal URL. Another would be to assemble the URL using a URI object; e.g.
url = new URI("http", "open.mapquestapi.com", "/geocoding/v1/reverse",
("key=" + APIkey + "&callback=renderReverse" +
"&json={location:{latLng:{lat:" + location.getLatitude() +
",lng:" + location.getLongitude() + "}}}"),
"").toString();
The multi-argument URI constructors take care of any required encoding of the components ... as per the specific details in the respective javadocs. (Read them carefully!)
The issue is that the use of { is illegal in an HTTP get. The solution is to run the URL through a "Safe URL Encoder". The trick, per this question, is to ensure that you only run it through the part of the URL that needs it, and don't include things like &, http://, etc.
url.append("http://open.mapquestapi.com/geocoding/v1/reverse?key="+APIkey);
url.append("&callback=renderReverse");
url.append(URLEncoder.encode("&json={location:{latLng:{lat:"+location.getLatitude(),"UTF-8"));
url.append(",lng:"+location.getLongitude());
url.append(URLEncoder.encode("}}}","UTF-8"));
And the even better solution, use the non-JSON input API for Mapquest. The output still is JSON.
url.append("http://open.mapquestapi.com/geocoding/v1/reverse?key="+APIkey);
url.append("&lat="+location.getLatitude());
url.append("&lng="+location.getLongitude());
I need to be testing my server for several URLs daily since these URLs are updated by my users - and this will be dine in Java. However, these URLs contains strange characters (like the german umlaut). Basicly what I am doing is:
for every URL in the list to check
URL u = new URL(the_url);
u.openConnection(..);
// read the content and handle it
Now, what Ive found is that org.apache.commons.codec.net.URLCodec is fine for encoding string to paste into the QueryString, it is not as suitable to encode strange URLs into their hex counterparts. Here are some examples of URLs:
http:// www.example com/u/überraum-03/
http:// www.example com/u/são-paulo-dude/
http:// www.example com/u/håkon-hellström/
The desired result for the first would be;
http:// www.example com/u/%c3%9berraum-03/
Are there any library in the Apache Commons or java itself, to convert special character in the ACTUAL url (not querystring - and therefore not replace the same kind of characters) ?
Thank you for your time.
Edited
Firefox translates "yr.no/place/Norway/Nordland/Moskenes/Å/data.html"; into "yr.no/place/Norway/Nordland/Moskenes/%C3%85/data.html" (try this by entering the first URL, press enter, then copy the url into a document). It is this effect that I am looking for - since this is the actual translation. What is most likely happening is either FF knows Å is a bad thing, it tries multiple versions or it accepts the servers "Location" header; either way - there is a tranformation from "Å" to "%C3%85" on only a subset of the URL. This is the function we need.
Edited
I just verified that the code given by commentor does not work sadly. As an example, try this:
try{
String urlStr = "http://www.yr.no/place/Norway/Nordland/Moskenes/Å/data.html";
URL u=new URL(urlStr);
URI uri = new URI(u.getProtocol(),
u.getUserInfo(), u.getHost(), u.getPort(),
u.getPath(), u.getQuery(),
null); // removing ref
URL urlObj = uri.toURL();
HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
connection.setInstanceFollowRedirects(false);
connection.connect();
for (int i=0;i<connection.getHeaderFields().size();i++)
System.out.println(connection.getHeaderFieldKey(i)+": "+connection.getHeaderField(i));
System.exit(0);
}catch(Exception e){e.printStackTrace();};
Will yield a 404 error - strangely enough the encoded part does also not work.
If you need a URL that is a valid URI (RFC 2396 compliant) you can create one like this in Java
String urlString = "http://www.example.com/u/håkon-hellström/";
URL url = new URL(urlString);
URI uri = new URI(url.getProtocol(),url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
url = new URL(uri.toASCIIString());
That being said all three sample strings you provided are RFC 2396 compliant and do not need to be encoded. I am assuming the spaces in the authority part of the URLs you provided are typos.
EDIT:
I updated the code block above. By using URI.toASCIIString() you can limit the resulting URI to only US-ASCII characters (other characters are encoded). The resulting string can then be used to create a new, valid URL.
http://www.example.com/u/håkon-hellström/
changes to
http://www.example.com/u/h%C3%A5kon-hellstr%C3%B6m/
How can I encode URL query parameter values? I need to replace spaces with %20, accents, non-ASCII characters etc.
I tried to use URLEncoder but it also encodes / character and if I give a string encoded with URLEncoder to the URL constructor I get a MalformedURLException (no protocol).
URLEncoder has a very misleading name. It is according to the Javadocs used encode form parameters using MIME type application/x-www-form-urlencoded.
With this said it can be used to encode e.g., query parameters. For instance if a parameter looks like &/?# its encoded equivalent can be used as:
String url = "http://host.com/?key=" + URLEncoder.encode("&/?#");
Unless you have those special needs the URL javadocs suggests using new URI(..).toURL which performs URI encoding according to RFC2396.
The recommended way to manage the encoding and decoding of URLs is to use URI
The following sample
new URI("http", "host.com", "/path/", "key=| ?/#ä", "fragment").toURL();
produces the result http://host.com/path/?key=%7C%20?/%23ä#fragment. Note how characters such as ?&/ are not encoded.
For further information, see the posts HTTP URL Address Encoding in Java or how to encode URL to avoid special characters in java.
EDIT
Since your input is a string URL, using one of the parameterized constructor of URI will not help you. Neither can you use new URI(strUrl) directly since it doesn't quote URL parameters.
So at this stage we must use a trick to get what you want:
public URL parseUrl(String s) throws Exception {
URL u = new URL(s);
return new URI(
u.getProtocol(),
u.getAuthority(),
u.getPath(),
u.getQuery(),
u.getRef()).
toURL();
}
Before you can use this routine you have to sanitize your string to ensure it represents an absolute URL. I see two approaches to this:
Guessing. Prepend http:// to the string unless it's already present.
Construct the URI from a context using new URL(URL context, String spec)
So what you're saying is that you want to encode part of your URL but not the whole thing. Sounds to me like you'll have to break it up into parts, pass the ones that you want encoded through the encoder, and re-assemble it to get your whole URL.