java.net.URI chokes on special characters in host part - java

I have a URI string like the following:
http://www.christlichepartei%F6sterreichs.at/steiermark/
I'm creating a java.lang.URI instance with this string and it succeeds but when I want to retrieve the host it returns null. Opera and Firefox also choke on this URL if I enter it exactly as shown above. But shouldn't the URI class throw a URISyntaxException if it is invalid? How can I detect that the URI is illegal then?
It also behaves the same when I decode the string using URLDecoder which yields
http://www.christlicheparteiösterreichs.at/steiermark/
Now this is accepted by Opera and Firefox but java.net.URI still doesn't like it. How can I deal with such a URL?
thanks

Java 6 has IDN class to work with internationalized domain names. So, the following produces URI with encoded hostname:
URI u = new URI("http://" + IDN.toASCII("www.christlicheparteiösterreichs.at") + "/steiermark/");

The correct way to encode non-ASCII characters in hostnames is known as "Punycode".

URI throws an URISyntaxException, when you choose the appropriate constructor:
URI someUri=new URI("http","www.christlicheparteiösterreichs.at","/steiermark",null);
java.net.URISyntaxException: Illegal character in hostname at index 28: http://www.christlicheparteiösterreichs.at/steiermark
You can use IDN for this to fix:
URI someUri=new URI("http",IDN.toASCII("www.christlicheparteiösterreichs.at"),"/steiermark",null);
System.out.println(someUri);
System.out.println("host: "+someUri.getHost()));
Output:
http://www.xn--christlicheparteisterreichs-5yc.at/steiermark
host: www.xn--christlicheparteisterreichs-5yc.at
UPDATE regarding the chicken-egg-problem:
You can let URL do the job:
public static URI createSafeURI(final URL someURL) throws URISyntaxException
{
return new URI(someURL.getProtocol(),someURL.getUserInfo(),IDN.toASCII(someURL.getHost()),someURL.getPort(),someURL.getPath(),someURL.getQuery(),someURL.getRef());
}
URI raoul=createSafeURI(new URL("http://www.christlicheparteiösterreichs.at/steiermark/readme.html#important"));
This is just a quick-shot, it is not checked all issues concerning converting an URL to an URI. Use it as a starting point.

Related

How to handle "#" in password of java URI?

I have a URL "ssh://root:zstackqwe:!###172.16.36.184" which contains "#" and ":" in the password part. I use java.net.URI to wrap the string like:
URI u1 = new URI("ssh://root:zstackqwe:!###172.16.36.184/");
System.out.println(u1.getAuthority());
System.out.println(u1.getHost());
the output is:
root:zstackqwe:!#
null
The authority part is correct while the host part returns null. How should correctly handle those special chars?
UPDATE:
The string "ssh://root:zstackqwe:!###172.16.36.184" is a raw string without any encoding, passed from the API to my application. I am unable to use constructors other than UIR(string). So I am looking for a way to handle the raw string making it work with jave.net.URI.
If it's a URI use percent encoding.
URI u1 = new URI("ssh://root:zstackqwe%3a!%40%23#172.16.36.184/");
So that string comes from someone else:
It's mal-formed. Get them to submit a correctly formed URL instead.

android get URL path

I've got a string:
public://imageifarm/3600.jpg
How can I extract the
imageifarm/3600.jpg
Part out using android?
What I've tried so far:
URL drupalQuestionNodeImageURI = new URL("public://imageifarm/3600.jpg");
Log.d("TAG", drupalQuestionNodeImageURI.getPath());
but it throws this exception:
09-16 17:24:39.992: W/System.err(3763): java.net.MalformedURLException: Unknown protocol: public
How can I solve this?
I know I can use regular expressions but that seems to defeat the purpose of URL(URI) in this case.
You should use android.net.Uri
Uri mUri = Uri.parse(public://imageifarm/3600.jpg);
String extract = mUri.getEncodedSchemeSpecificPart();
Use java.net.URI, not java.net.URL.
If you want have to use URL class (when you image sits on Internet) you have to provide valid URL (that begins from valid URL prefix, like http://, https:// etc). In you case you should use Uri class. Uri object can point on files in your local file system. For example:
Uri.fromFile(new File("public://imageifarm/3600.jpg"));

URL encoding + sign

I have an application with + sign in its name (eg. DB+JSP.jws).
I get an error when trying to create connection as java encodes url + with spaces and hence cannot add the connection to DB JSP/../META-INF/connection.xml (File not found exception).
Any way to circumvent this only by using URLEncoder.encode() and URLDecoder.decode() methods?
You need to encode the URL correctly since '+' is a reserved character in a URL and can only be used in the correct context otherwise needs to be encoded with %2B.
Your URL string would encoded as "DB%2BJSP.jws".
So if you defined the following:
String url = URLEncoder.encode("DB+JSP.jws");
System.out.println(url);
The output would be the same:
DB%2BJSP.jws
You can prepend "http://localhost/" to the encoded URL as you need to.

HttpClient and non-ASCII URL characters (á,é,í,ó,ú)

'Long time reader, first time poster' here.
I'm in the process of making a bot for a spanish Wiki I administer. I wanted to make it from scratch, since one of the purposes of me making it is to practice Java. However, I ran into some trouble when trying to make GET requests with HttpClient to URIs that contain non-ASCII characters such as á,é,í,ó or ú.
String url = "http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas"
method = new GetMethod(url);
client.executeMethod(method);
When I do the above, GetMethod complains about the URI:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid uri 'http://es.pruebaloca.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras%20de%20las%20Botas&cmlimit=500&format=xml': Invalid query
at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:69)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:120)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Note that in the URI shown in the stack trace, spaces are encoded into %20 and the ís are left as is. That exact same URI works perfectly on a browser, but I can't get around into GetMethod accepting it.
I've also tried doing the following:
URI uri = new URI(url, false);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
This way, URI escaped the is, but double escaped the spaces (%2520)...
http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categor%C3%ADa:Mejoras%2520de%2520las%2520Botas&cmlimit=500&format=xml
Now, if I don't use any spaces in the query, there's no double escaping and I get the desired output. So if there wasn't any possibility of non-ASCII characters, I wouldn't need to use the URI class and wouldn't get the double escaping. In an attempt to avoid the first escaping of the spaces, I tried this:
URI uri = new URI(url, true);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
But the URI class didn't like it:
org.apache.commons.httpclient.URIException: Invalid query
at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2049)
at org.apache.commons.httpclient.URI.<init>(URI.java:167)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:66)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:121)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:39)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Any input on how to avoid this double escaping would be greatly appreciated. I've lurked all around with absolutely no luck.
Thanks!
Edit: The solution that works best for me is parsifal's one, but, as an addition, I'd like to say that setting the path with method.setPath(url) made HttpMethod reject a cookie I needed to save:
Aug 26, 2011 4:07:08 PM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders
WARNING: Cookie rejected: "wikicities_session=900beded4191ff880e09944c7c0aaf5a". Illegal path attribute "/". Path of origin: "http://es.metroid.wikia.com/api.php"
However, if I send the URI to the constructor and forget about the setPath(url), the cookie gets saved without problem.
String url = "http://es.metroid.wikia.com/api.php";
NameValuePair[] query = { new NameValuePair("action", "query"), new NameValuePair("list", "categorymembers"),
new NameValuePair("cmtitle", "Categoría:Mejoras de las Botas"), new NameValuePair("cmlimit", "500"),
new NameValuePair("format", "xml") };
HttpMethod method = null;
...
method = new GetMethod(url); // Or PostMethod(url)
method.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // It had been like this the whole time
method.setQueryString(query);
client.executeMethod(method);
I would recommend using UrlEncoder to encode your queryString values (not the whole queryString).
UrlEncoder.encode("Categoría:Mejoras de las Botas", "UTF-8");
Looking at the documentation of HttpMethodBase, it appears that all String parameters have to be pre-encoded. The simplest solution is to constructor your URL in stages, with setPath() and the variant of setQueryString() that takes an array of name-value parameters.
why don't you try adding the params as NameValuePair, the problem here is that when you escape the URL everything in the URL is escaped including things like http://.. thats why the system is complaining.
you can also escape just the arguments using URLEncoder.encode(), just pass the get params to this & append the return value to the URL.
String url = "http://es.metroid.wikia.com/api.php?"+URLEncoder.encode("action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas");

Encode URL query parameters

How can I encode URL query parameter values? I need to replace spaces with %20, accents, non-ASCII characters etc.
I tried to use URLEncoder but it also encodes / character and if I give a string encoded with URLEncoder to the URL constructor I get a MalformedURLException (no protocol).
URLEncoder has a very misleading name. It is according to the Javadocs used encode form parameters using MIME type application/x-www-form-urlencoded.
With this said it can be used to encode e.g., query parameters. For instance if a parameter looks like &/?# its encoded equivalent can be used as:
String url = "http://host.com/?key=" + URLEncoder.encode("&/?#");
Unless you have those special needs the URL javadocs suggests using new URI(..).toURL which performs URI encoding according to RFC2396.
The recommended way to manage the encoding and decoding of URLs is to use URI
The following sample
new URI("http", "host.com", "/path/", "key=| ?/#ä", "fragment").toURL();
produces the result http://host.com/path/?key=%7C%20?/%23ä#fragment. Note how characters such as ?&/ are not encoded.
For further information, see the posts HTTP URL Address Encoding in Java or how to encode URL to avoid special characters in java.
EDIT
Since your input is a string URL, using one of the parameterized constructor of URI will not help you. Neither can you use new URI(strUrl) directly since it doesn't quote URL parameters.
So at this stage we must use a trick to get what you want:
public URL parseUrl(String s) throws Exception {
URL u = new URL(s);
return new URI(
u.getProtocol(),
u.getAuthority(),
u.getPath(),
u.getQuery(),
u.getRef()).
toURL();
}
Before you can use this routine you have to sanitize your string to ensure it represents an absolute URL. I see two approaches to this:
Guessing. Prepend http:// to the string unless it's already present.
Construct the URI from a context using new URL(URL context, String spec)
So what you're saying is that you want to encode part of your URL but not the whole thing. Sounds to me like you'll have to break it up into parts, pass the ones that you want encoded through the encoder, and re-assemble it to get your whole URL.

Categories