My product is a web application.
I have files that I upload and download later on, to/from my server.
I am using java.net.URLDecoder.decode() when uploading files with unicode characters and java.net.URLDecoder.encode() when downloading files in order to save the file name and finally return it to the client as expected with no question marks and stuff (?????) .
The problem is that if the file name consists spaces then the encode/decode replace them with + character which is perfectly normal because that's their business implementation, but clearly as you can understand it does not fit to my purpose.
The question is what alternative do I have to overcome this situation?
Is there build-in method for that or 3rd party package?
You could also convert a space to %20.
See: URL encoding the space character: + or %20?
There are also various other Java libraries that do URL encoding, with %20. Here are a two examples:
Guava:
UrlEscapers.urlPathSegmentEscaper().escape(urlToEscape);
Spring Framework:
UriUtils.encodePath(urlToEscape, Charsets.UTF_8.toString());
You don't tell where this filename is used. The characters to encode will be different whether, for instance, it is in a URI query string or fragment part.
You probably want to have a look at Guava's (15.0+) Escapers; and, in particular here, UnicodeEscaper implementations and its derived class PercentEscaper. Guava already provides a few of them usable in various parts of URLs.
EDIT: here is how to do with Guava:
public final class FilenameEscaper
extends PercentEscaper
{
public PercentEscaper()
{
super("", false);
}
}
Done! See here. Of course, you may want to declare that some more characters than the default ones are safe.
Also have a look at RFC 5987 to make a better encoder.
This worked for me:
URLEncoder.encode(someString, "UTF-8").replace("+", "%20");
I found the cure!
I was just needed to use java.net.URI for that:
public static String encode(String urlString) throws UnsupportedEncodingException
{
try
{
URI uri = new URI(urlString);
return uri.toASCIIString();
}
catch (URISyntaxException e)
{
e.printStackTrace();
}
}
The toASCIIString() escapes the special characters so when the string arrives to the browser it is shown correctly.
Had the same problem with spaces. Combination of URL and URI solved it:
URL url = new URL("file:/E:/Program Files/IBM/SDP/runtimes/base");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
* Please note that URLEncoder is used for web forms application/x-www-form-urlencoded mime-type - not http network addresses.
* Source: https://stackoverflow.com/a/749829/435605
Related
Now I know about FilenameUtils.getExtension() from apache.
But in my case I'm processing extensions from http(s) urls, so in case I have something like
https://your_url/logo.svg?position=5
this method is gonna return svg?position=5
Is there the best way to handle this situation? I mean without writing this logic by myself.
You can use the URL library from JAVA. It has a lot of utility in this cases. You should do something like this:
String url = "https://your_url/logo.svg?position=5";
URL fileIneed = new URL(url);
Then, you have a lot of getter methods for the "fileIneed" variable. In your case the "getPath()" will retrieve this:
fileIneed.getPath() ---> "/logo.svg"
And then use the Apache library that you are using, and you will have the "svg" String.
FilenameUtils.getExtension(fileIneed.getPath()) ---> "svg"
JAVA URL library docs >>>
https://docs.oracle.com/javase/7/docs/api/java/net/URL.html
If you want a brandname® solution, then consider using the Apache method after stripping off the query string, if it exists:
String url = "https://your_url/logo.svg?position=5";
url = url.replaceAll("\\?.*$", "");
String ext = FilenameUtils.getExtension(url);
System.out.println(ext);
If you want a one-liner which does not even require an external library, then consider this option using String#replaceAll:
String url = "https://your_url/logo.svg?position=5";
String ext = url.replaceAll(".*/[^.]+\\.([^?]+)\\??.*", "$1");
System.out.println(ext);
svg
Here is an explanation of the regex pattern used above:
.*/ match everything up to, and including, the LAST path separator
[^.]+ then match any number of non dots, i.e. match the filename
\. match a dot
([^?]+) match AND capture any non ? character, which is the extension
\??.* match an optional ? followed by the rest of the query string, if present
I am trying to construct a URL in Java, with the query parameters encoded.The encoder should escape the character '+'. All the methods that I've come across encode it. Is there any way I can accomplish this or do I need to write a custom encoder?
Thanks!
Try "UriComponentsBuilder" if your using Spring.This package also contains other options that may fullfill your requirement.
UriComponentsBuilder.fromPath("URL String");
If your not using Spring can use "UrlValidator" given by "apache.commons"
String[] schemes = {"http","https"}; // DEFAULT schemes = "http", "https", "ftp"
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("urlString")) {
System.out.println("URL is valid");
} else {
System.out.println("URL is invalid");
}
The URI class should be able to do this for you. Specifically, use
URI(String scheme, String userInfo, String host,
int port, String path, String query, String fragment)
or
URI(String scheme, String authority, String path, String query,
String fragment)
to create the URI object, where the components that you provided are not encoded. Then use URI.toString() or URI.toASCIIString() to produce the properly encoded URL.
The encoder should escape the character '+'. All the methods that I've come across encode it.
Encoding the '+' as %2B% is the correct behavior. The URL / URI specifications do not support escaping. Any unencoded '+' in a URL will decode as a space character, no matter how you try to escape it. That's what the spec says.
I need to be testing my server for several URLs daily since these URLs are updated by my users - and this will be dine in Java. However, these URLs contains strange characters (like the german umlaut). Basicly what I am doing is:
for every URL in the list to check
URL u = new URL(the_url);
u.openConnection(..);
// read the content and handle it
Now, what Ive found is that org.apache.commons.codec.net.URLCodec is fine for encoding string to paste into the QueryString, it is not as suitable to encode strange URLs into their hex counterparts. Here are some examples of URLs:
http:// www.example com/u/überraum-03/
http:// www.example com/u/são-paulo-dude/
http:// www.example com/u/håkon-hellström/
The desired result for the first would be;
http:// www.example com/u/%c3%9berraum-03/
Are there any library in the Apache Commons or java itself, to convert special character in the ACTUAL url (not querystring - and therefore not replace the same kind of characters) ?
Thank you for your time.
Edited
Firefox translates "yr.no/place/Norway/Nordland/Moskenes/Å/data.html"; into "yr.no/place/Norway/Nordland/Moskenes/%C3%85/data.html" (try this by entering the first URL, press enter, then copy the url into a document). It is this effect that I am looking for - since this is the actual translation. What is most likely happening is either FF knows Å is a bad thing, it tries multiple versions or it accepts the servers "Location" header; either way - there is a tranformation from "Å" to "%C3%85" on only a subset of the URL. This is the function we need.
Edited
I just verified that the code given by commentor does not work sadly. As an example, try this:
try{
String urlStr = "http://www.yr.no/place/Norway/Nordland/Moskenes/Å/data.html";
URL u=new URL(urlStr);
URI uri = new URI(u.getProtocol(),
u.getUserInfo(), u.getHost(), u.getPort(),
u.getPath(), u.getQuery(),
null); // removing ref
URL urlObj = uri.toURL();
HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
connection.setInstanceFollowRedirects(false);
connection.connect();
for (int i=0;i<connection.getHeaderFields().size();i++)
System.out.println(connection.getHeaderFieldKey(i)+": "+connection.getHeaderField(i));
System.exit(0);
}catch(Exception e){e.printStackTrace();};
Will yield a 404 error - strangely enough the encoded part does also not work.
If you need a URL that is a valid URI (RFC 2396 compliant) you can create one like this in Java
String urlString = "http://www.example.com/u/håkon-hellström/";
URL url = new URL(urlString);
URI uri = new URI(url.getProtocol(),url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());
url = new URL(uri.toASCIIString());
That being said all three sample strings you provided are RFC 2396 compliant and do not need to be encoded. I am assuming the spaces in the authority part of the URLs you provided are typos.
EDIT:
I updated the code block above. By using URI.toASCIIString() you can limit the resulting URI to only US-ASCII characters (other characters are encoded). The resulting string can then be used to create a new, valid URL.
http://www.example.com/u/håkon-hellström/
changes to
http://www.example.com/u/h%C3%A5kon-hellstr%C3%B6m/
This question already has answers here:
How to do URL decoding in Java?
(11 answers)
Closed 8 years ago.
I see that java.net.URLDecoder.decode(String) is deprecated in 6.
I have the following String:
String url ="http://172.20.4.60/jsfweb/cat/%D7%9C%D7%97%D7%9E%D7%99%D7%9D_%D7%A8%D7%92%D7%99%D7%9C%D7%99%D7%9"
How should I decode it in Java 6?
You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data).
Now you need to specify the character encoding of your string. Based off the information on the URLDecoder page:
Note: The World Wide Web Consortium
Recommendation states that UTF-8
should be used. Not doing so may
introduce incompatibilites.
The following should work for you:
java.net.URLDecoder.decode(url, "UTF-8");
Please see Draemon's answer below.
As the documentation mentions, decode(String) is deprecated because it always uses the platform default encoding, which is often wrong.
Use the two-argument version instead. You will need to specify the encoding used n the escaped parts.
Only the decode(String) method is deprecated. You should use the decode(String, String) method to explicitly set a character encoding for decoding.
As noted by previous posters, you should use java.net.URI class to do it:
System.out.println(String.format("Decoded URI: '%s'", new URI(url).getPath()));
What I want to note additionally is that if you have a path fragment of a URI and want to decode it separately, the same approach with one-argument constructor works, but if you try to use four-argument constructor it does not:
String fileName = "Map%20of%20All%20projects.pdf";
URI uri = new URI(null, null, fileName, null);
System.out.println(String.format("Not decoded URI *WTF?!?*: '%s'", uri.getPath()));
This was tested in Oracle JDK 7. The fact that this does not work is counter-intuitive, runs contrary to JavaDocs and it should be probably considered a bug.
It could trip people who are trying to use an approach symmetrical to encoding. As noted for example in this post: "how to encode URL to avoid special characters in java", in order to encode URI, it's a good idea to construct a URI by passing different URI parts separately since different encoding rules apply to different parts:
String fileName2 = "Map of All projects.pdf";
URI uri2 = new URI(null, null, fileName2, null);
System.out.println(String.format("Encoded URI: '%s'", uri2.toASCIIString()));
How can I encode URL query parameter values? I need to replace spaces with %20, accents, non-ASCII characters etc.
I tried to use URLEncoder but it also encodes / character and if I give a string encoded with URLEncoder to the URL constructor I get a MalformedURLException (no protocol).
URLEncoder has a very misleading name. It is according to the Javadocs used encode form parameters using MIME type application/x-www-form-urlencoded.
With this said it can be used to encode e.g., query parameters. For instance if a parameter looks like &/?# its encoded equivalent can be used as:
String url = "http://host.com/?key=" + URLEncoder.encode("&/?#");
Unless you have those special needs the URL javadocs suggests using new URI(..).toURL which performs URI encoding according to RFC2396.
The recommended way to manage the encoding and decoding of URLs is to use URI
The following sample
new URI("http", "host.com", "/path/", "key=| ?/#ä", "fragment").toURL();
produces the result http://host.com/path/?key=%7C%20?/%23ä#fragment. Note how characters such as ?&/ are not encoded.
For further information, see the posts HTTP URL Address Encoding in Java or how to encode URL to avoid special characters in java.
EDIT
Since your input is a string URL, using one of the parameterized constructor of URI will not help you. Neither can you use new URI(strUrl) directly since it doesn't quote URL parameters.
So at this stage we must use a trick to get what you want:
public URL parseUrl(String s) throws Exception {
URL u = new URL(s);
return new URI(
u.getProtocol(),
u.getAuthority(),
u.getPath(),
u.getQuery(),
u.getRef()).
toURL();
}
Before you can use this routine you have to sanitize your string to ensure it represents an absolute URL. I see two approaches to this:
Guessing. Prepend http:// to the string unless it's already present.
Construct the URI from a context using new URL(URL context, String spec)
So what you're saying is that you want to encode part of your URL but not the whole thing. Sounds to me like you'll have to break it up into parts, pass the ones that you want encoded through the encoder, and re-assemble it to get your whole URL.