This question already has answers here:
How to do URL decoding in Java?
(11 answers)
Closed 8 years ago.
I see that java.net.URLDecoder.decode(String) is deprecated in 6.
I have the following String:
String url ="http://172.20.4.60/jsfweb/cat/%D7%9C%D7%97%D7%9E%D7%99%D7%9D_%D7%A8%D7%92%D7%99%D7%9C%D7%99%D7%9"
How should I decode it in Java 6?
You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data).
Now you need to specify the character encoding of your string. Based off the information on the URLDecoder page:
Note: The World Wide Web Consortium
Recommendation states that UTF-8
should be used. Not doing so may
introduce incompatibilites.
The following should work for you:
java.net.URLDecoder.decode(url, "UTF-8");
Please see Draemon's answer below.
As the documentation mentions, decode(String) is deprecated because it always uses the platform default encoding, which is often wrong.
Use the two-argument version instead. You will need to specify the encoding used n the escaped parts.
Only the decode(String) method is deprecated. You should use the decode(String, String) method to explicitly set a character encoding for decoding.
As noted by previous posters, you should use java.net.URI class to do it:
System.out.println(String.format("Decoded URI: '%s'", new URI(url).getPath()));
What I want to note additionally is that if you have a path fragment of a URI and want to decode it separately, the same approach with one-argument constructor works, but if you try to use four-argument constructor it does not:
String fileName = "Map%20of%20All%20projects.pdf";
URI uri = new URI(null, null, fileName, null);
System.out.println(String.format("Not decoded URI *WTF?!?*: '%s'", uri.getPath()));
This was tested in Oracle JDK 7. The fact that this does not work is counter-intuitive, runs contrary to JavaDocs and it should be probably considered a bug.
It could trip people who are trying to use an approach symmetrical to encoding. As noted for example in this post: "how to encode URL to avoid special characters in java", in order to encode URI, it's a good idea to construct a URI by passing different URI parts separately since different encoding rules apply to different parts:
String fileName2 = "Map of All projects.pdf";
URI uri2 = new URI(null, null, fileName2, null);
System.out.println(String.format("Encoded URI: '%s'", uri2.toASCIIString()));
Related
I have a request as follows:
localhost:8000/location/:01
My code takes as input an HttpContext request.
func(HttpExchange r) {
String area_path = r.getRequestURI(); // Equals string "/location/"
}
How do I parse an HttpExchange correctly so I can pull out the "01" from this path and store it as a variable?
That (localhost:8000/location/:01) is not a valid URL or URI
A plain colon character is not legal in the path of a URL or URI. If you want to put a colon in the path, it must be percent-encoded. Furthermore, if this was a URL, it would start with a protocol; e.g. http:.
Now ... it is unclear what the HTTP stack you are using will do with a syntactically incorrect URL / URI, but it could simply be ignoring the colon and the characters after it.
Your code looks a bit odd too. You have tagged the question as [java]. But the code looks like JavaScript rather than Java; i.e. func is a Javascript keyword. But it also looks like you are using the (deprecated) com.sun.net.httpserver.HttpExchange Java class. I don't know what to make of that ...
My advice:
Don't use a colon character in the URL path.
If you must do it, then percent-encode the colon it.
If you cannot encode it properly, then you may need to find and use a different framework for your HTTP request handling. One that will accept and handle a malformed URL / URI in the way that you want. (Good luck finding one!)
Unfortunately, the details in your question are too sketchy to give more detailed advice.
My product is a web application.
I have files that I upload and download later on, to/from my server.
I am using java.net.URLDecoder.decode() when uploading files with unicode characters and java.net.URLDecoder.encode() when downloading files in order to save the file name and finally return it to the client as expected with no question marks and stuff (?????) .
The problem is that if the file name consists spaces then the encode/decode replace them with + character which is perfectly normal because that's their business implementation, but clearly as you can understand it does not fit to my purpose.
The question is what alternative do I have to overcome this situation?
Is there build-in method for that or 3rd party package?
You could also convert a space to %20.
See: URL encoding the space character: + or %20?
There are also various other Java libraries that do URL encoding, with %20. Here are a two examples:
Guava:
UrlEscapers.urlPathSegmentEscaper().escape(urlToEscape);
Spring Framework:
UriUtils.encodePath(urlToEscape, Charsets.UTF_8.toString());
You don't tell where this filename is used. The characters to encode will be different whether, for instance, it is in a URI query string or fragment part.
You probably want to have a look at Guava's (15.0+) Escapers; and, in particular here, UnicodeEscaper implementations and its derived class PercentEscaper. Guava already provides a few of them usable in various parts of URLs.
EDIT: here is how to do with Guava:
public final class FilenameEscaper
extends PercentEscaper
{
public PercentEscaper()
{
super("", false);
}
}
Done! See here. Of course, you may want to declare that some more characters than the default ones are safe.
Also have a look at RFC 5987 to make a better encoder.
This worked for me:
URLEncoder.encode(someString, "UTF-8").replace("+", "%20");
I found the cure!
I was just needed to use java.net.URI for that:
public static String encode(String urlString) throws UnsupportedEncodingException
{
try
{
URI uri = new URI(urlString);
return uri.toASCIIString();
}
catch (URISyntaxException e)
{
e.printStackTrace();
}
}
The toASCIIString() escapes the special characters so when the string arrives to the browser it is shown correctly.
Had the same problem with spaces. Combination of URL and URI solved it:
URL url = new URL("file:/E:/Program Files/IBM/SDP/runtimes/base");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
* Please note that URLEncoder is used for web forms application/x-www-form-urlencoded mime-type - not http network addresses.
* Source: https://stackoverflow.com/a/749829/435605
The request parameter is like decrypt?param=5FHjiSJ6NOTmi7/+2tnnkQ==.
In the servlet, when I try to print the parameter by String param = request.getParameter("param"); I get 5FHjiSJ6NOTmi7/ 2tnnkQ==. It turns the character + into a space. How can I keep the orginal paramter or how can I properly handle the character +.
Besides, what else characters should I handle?
You have two choices
URL encode the parameter
If you have control over the generation of the URL you should choose this. If not...
Manually retrieve the parameter
If you can't change how the URL is generated (above) then you can manually retrieve the raw URL. Certain methods decode parameters for you. getParameter is one of them. On the other hand, getQueryString does not decode the String. If you have only a few parameters it shouldn't be difficult to parse the value yourself.
request.getQueryString();
//?param=5FHjiSJ6NOTmi7/+2tnnkQ==
If you want to use the '+' character in a URL you need to encode it when it is generated. For '+' the correct encoding is %2b
Use URLEncoder,URLDecoder's static methods for encoding and decoding URLs.
For example : -
Encode the URL param using
URLEncoder.encode(url,"UTF-8")
Back in the server side , decode this parameter using
URLDecoder.decode(url,"UTF-8")
decode method returns a String type of the decoded URL.
Allthough the question is some years old, I'd like to write down how I fixed the problem in my case: the download link to a file is created in a GWT page where
com.google.gwt.http.client.URL.encode(finalurl)
is used to encode the URL.
The problem was that the "+" sign a customer of us had in the filename wasn't encoded/escaped. So I had to remove the URL.encode(finalurl) and encode each parameter in the url with
URL.encodePathSegment(fileName)
I know my question is bound to GWT but it seems, URLEncoder.encode(string, encoding) should be applied to the parameter only aswell.
The documentation for java.net.URI specifies that
For any URI u that ... and that does not encode characters except those that must be quoted, the following identities also hold...
But what about URIs that do encode characters that don't need to be quoted?
URI test1 = new URI("http://foo.bar.baz/%E2%82%AC123");
URI test2 = new URI(test1.getScheme(), test1.getUserInfo(), test1.getHost(), test1.getPort(), test1.getPath(), test1.getQuery(), test1.getFragment());
assert test1.equals(test2); // blows up
This fails, because what test2 comes out as, is http://foo.bar.baz/€123 -- with the escaped characters un-escaped.
My question, then, is: how can I construct a URI equal to test1 -- preserving the escaped characters -- out of its components? It's no good using getRawPath() instead of getPath(), because then the escaping characters themselves get escaped, and you end up with http://foo.bar.baz/%25E2%2582%25AC123.
Additional notes:
Don't ask why I need to preserve escaped characters that in theory don't need to be escaped -- trust me, you don't want to know.
In reality I don't want to preserve all of the original URL, just most of it -- possibly replacing the host, port, protocol, even parts of the path, so new URI(test1.toString()) is not the answer. Maybe the answer is to do everything with strings and replicate the URI class's ability to parse and construct URIs in my own code, but that seems daft.
Updated to add:
Note that the same issue exists with query parameters etc. -- it's not just the path.
I think this hack will work for you:
URI test1 = new URI("http://foo.bar.baz/example%E2%82%AC123");
URI test2 = new URI(test1.getScheme(),
test1.getUserInfo(),
test1.getHost(),
test1.getPort(),
test1.getPath(),
test1.getQuery(),
test1.getFragment());
test2 = new URI(test2.toASCIIString());
assert test1.equals(test2);
System.out.println(test1);
System.out.println(test2);
}
I use an additional step using toASCIIString()
How can I encode URL query parameter values? I need to replace spaces with %20, accents, non-ASCII characters etc.
I tried to use URLEncoder but it also encodes / character and if I give a string encoded with URLEncoder to the URL constructor I get a MalformedURLException (no protocol).
URLEncoder has a very misleading name. It is according to the Javadocs used encode form parameters using MIME type application/x-www-form-urlencoded.
With this said it can be used to encode e.g., query parameters. For instance if a parameter looks like &/?# its encoded equivalent can be used as:
String url = "http://host.com/?key=" + URLEncoder.encode("&/?#");
Unless you have those special needs the URL javadocs suggests using new URI(..).toURL which performs URI encoding according to RFC2396.
The recommended way to manage the encoding and decoding of URLs is to use URI
The following sample
new URI("http", "host.com", "/path/", "key=| ?/#ä", "fragment").toURL();
produces the result http://host.com/path/?key=%7C%20?/%23ä#fragment. Note how characters such as ?&/ are not encoded.
For further information, see the posts HTTP URL Address Encoding in Java or how to encode URL to avoid special characters in java.
EDIT
Since your input is a string URL, using one of the parameterized constructor of URI will not help you. Neither can you use new URI(strUrl) directly since it doesn't quote URL parameters.
So at this stage we must use a trick to get what you want:
public URL parseUrl(String s) throws Exception {
URL u = new URL(s);
return new URI(
u.getProtocol(),
u.getAuthority(),
u.getPath(),
u.getQuery(),
u.getRef()).
toURL();
}
Before you can use this routine you have to sanitize your string to ensure it represents an absolute URL. I see two approaches to this:
Guessing. Prepend http:// to the string unless it's already present.
Construct the URI from a context using new URL(URL context, String spec)
So what you're saying is that you want to encode part of your URL but not the whole thing. Sounds to me like you'll have to break it up into parts, pass the ones that you want encoded through the encoder, and re-assemble it to get your whole URL.