How to encode an URL with Java - java

Given an URL such as this one,
http://www.example.com/some directory/some file
how do you encode this URL? Browsers automatically encode it. In Java I couldn't find a ready made function. I suspect there should be such a function because this is generally needed.
When I try to use the URI class using the constructor with single String, and parse components of the URL, such as authority, path, etc, it gives error because it expects an encoded URL.
Do you know a ready made function that will produce, for example in this case:
http://www.example.com/some%20directory/some%20file

Try this:
final URL url = new URL("http://www.example.com/some directory/some file");
final URI uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), null);
System.out.println(uri.toASCIIString());

Related

URL encode/decode on file name replace spaces with +, need alternative.

My product is a web application.
I have files that I upload and download later on, to/from my server.
I am using java.net.URLDecoder.decode() when uploading files with unicode characters and java.net.URLDecoder.encode() when downloading files in order to save the file name and finally return it to the client as expected with no question marks and stuff (?????) .
The problem is that if the file name consists spaces then the encode/decode replace them with + character which is perfectly normal because that's their business implementation, but clearly as you can understand it does not fit to my purpose.
The question is what alternative do I have to overcome this situation?
Is there build-in method for that or 3rd party package?
You could also convert a space to %20.
See: URL encoding the space character: + or %20?
There are also various other Java libraries that do URL encoding, with %20. Here are a two examples:
Guava:
UrlEscapers.urlPathSegmentEscaper().escape(urlToEscape);
Spring Framework:
UriUtils.encodePath(urlToEscape, Charsets.UTF_8.toString());
You don't tell where this filename is used. The characters to encode will be different whether, for instance, it is in a URI query string or fragment part.
You probably want to have a look at Guava's (15.0+) Escapers; and, in particular here, UnicodeEscaper implementations and its derived class PercentEscaper. Guava already provides a few of them usable in various parts of URLs.
EDIT: here is how to do with Guava:
public final class FilenameEscaper
extends PercentEscaper
{
public PercentEscaper()
{
super("", false);
}
}
Done! See here. Of course, you may want to declare that some more characters than the default ones are safe.
Also have a look at RFC 5987 to make a better encoder.
This worked for me:
URLEncoder.encode(someString, "UTF-8").replace("+", "%20");
I found the cure!
I was just needed to use java.net.URI for that:
public static String encode(String urlString) throws UnsupportedEncodingException
{
try
{
URI uri = new URI(urlString);
return uri.toASCIIString();
}
catch (URISyntaxException e)
{
e.printStackTrace();
}
}
The toASCIIString() escapes the special characters so when the string arrives to the browser it is shown correctly.
Had the same problem with spaces. Combination of URL and URI solved it:
URL url = new URL("file:/E:/Program Files/IBM/SDP/runtimes/base");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
* Please note that URLEncoder is used for web forms application/x-www-form-urlencoded mime-type - not http network addresses.
* Source: https://stackoverflow.com/a/749829/435605

Java API/util to find and replace unsafe characters with their percent encoded forms?

Does anyone know of a decent Java API/util to find and replace unsafe characters in a URL with their percent-encoded forms?
"http://google.com?" + URLEncoder.encode("...", "UTF-8");
See javadocs.
One should add the expected character encoding.
See Java - Convert String to valid URI object for pure Java style:
String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();
Beware that it handles protocols available to Java natively. Otherwise you may need to register your own one like "s3" for Amazon S3 URIs.

Returned URL as String is not valid in JSF

I'm trying to make use of google api as text-to-speech. So, I build a String then should pass it as a URL to a component to obtain a MP3 with the spoken words.
So, this is my code:
URI uri = new URI("http://translate.google.com/translate_tts?tl=es&q="+ URLEncoder.encode((String)this.text.getValue(), "UTF-8"));
When I make uri.toString() its return a well formed URL. If I copy and paste this output in the browser works pefectly.
But if I assign this returned String to the source property of a ice:outputMedia is not working. Then inspect the HTML generated in the page and the String in src property is:
http://translate.google.com/translate_tts?tl=es&q=Bobby+need+peanuts
The & symbol has been replaced by &.
How can I avoid this to make a valid URL?
You need to decode the url on the client side using Javascript.
var decoded = decodeURI(URI)

Encoding an URL sent to a server (not in query)

I need to be testing my server for several URLs daily since these URLs are updated by my users - and this will be dine in Java. However, these URLs contains strange characters (like the german umlaut). Basicly what I am doing is:
for every URL in the list to check
URL u = new URL(the_url);
u.openConnection(..);
// read the content and handle it
Now, what Ive found is that org.apache.commons.codec.net.URLCodec is fine for encoding string to paste into the QueryString, it is not as suitable to encode strange URLs into their hex counterparts. Here are some examples of URLs:
http:// www.example com/u/überraum-03/
http:// www.example com/u/são-paulo-dude/
http:// www.example com/u/håkon-hellström/
The desired result for the first would be;
http:// www.example com/u/%c3%9berraum-03/
Are there any library in the Apache Commons or java itself, to convert special character in the ACTUAL url (not querystring - and therefore not replace the same kind of characters) ?
Thank you for your time.
Edited
Firefox translates "yr.no/place/Norway/Nordland/Moskenes/Å/data.html"; into "yr.no/place/Norway/Nordland/Moskenes/%C3%85/data.html" (try this by entering the first URL, press enter, then copy the url into a document). It is this effect that I am looking for - since this is the actual translation. What is most likely happening is either FF knows Å is a bad thing, it tries multiple versions or it accepts the servers "Location" header; either way - there is a tranformation from "Å" to "%C3%85" on only a subset of the URL. This is the function we need.
Edited
I just verified that the code given by commentor does not work sadly. As an example, try this:
try{
String urlStr = "http://www.yr.no/place/Norway/Nordland/Moskenes/Å/data.html";
URL u=new URL(urlStr);
URI uri = new URI(u.getProtocol(),
u.getUserInfo(), u.getHost(), u.getPort(),
u.getPath(), u.getQuery(),
null); // removing ref
URL urlObj = uri.toURL();
HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
connection.setInstanceFollowRedirects(false);
connection.connect();
for (int i=0;i<connection.getHeaderFields().size();i++)
System.out.println(connection.getHeaderFieldKey(i)+": "+connection.getHeaderField(i));
System.exit(0);
}catch(Exception e){e.printStackTrace();};
Will yield a 404 error - strangely enough the encoded part does also not work.
If you need a URL that is a valid URI (RFC 2396 compliant) you can create one like this in Java
String urlString = "http://www.example.com/u/håkon-hellström/";
URL url = new URL(urlString);
URI uri = new URI(url.getProtocol(),url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
url = new URL(uri.toASCIIString());
That being said all three sample strings you provided are RFC 2396 compliant and do not need to be encoded. I am assuming the spaces in the authority part of the URLs you provided are typos.
EDIT:
I updated the code block above. By using URI.toASCIIString() you can limit the resulting URI to only US-ASCII characters (other characters are encoded). The resulting string can then be used to create a new, valid URL.
http://www.example.com/u/håkon-hellström/
changes to
http://www.example.com/u/h%C3%A5kon-hellstr%C3%B6m/

Escape non english characters in a url

How can I escape non-english characters like "ö" from my url since it causes 404 response error. I am using Java. Please help me.
E.g. by using URL-Encoding as specified in RFC3986 (http://tools.ietf.org/html/rfc3986). Please also have a look at: http://en.wikipedia.org/wiki/Percent-encoding
Java provides some methods to do this:
http://download.oracle.com/javase/1.4.2/docs/api/java/net/URLEncoder.html
Be aware of different encodings like ISO-8859-1/15, UTF-8. Depending on this for example an 'ö' will be encoded to %F6 or &C3%D6 (or sth. like this).
use URLEncoder/ URLDecoder in the java.net package
Try the java.net.URLEncoder
I had a similar problem, there was a 'ü' in URL path. After a few hours of experimenting with various SO posts I got this (from here):
URL url = new URL(urlString);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = new URL(uri.toASCIIString());
Trick is in converting URI to URL. Most answers ended with URI.toURL() method call. While this method correctly encodes whitespaces and non-letter characters, it doesn't encode non-ASCII letters. Method URI.toASCIIString() is answer to that problem.

Categories