Java URL encoding: URLEncoder vs. URI

Java URL encoding: URLEncoder vs. URI - java

Looking on the W3 Schools URL encoding webpage, it says that # should be encoded as %40, and that space should be encoded as %20.
I've tried both URLEncoder and URI, but neither does the above properly:
import java.net.URI;
import java.net.URLEncoder;
public class Test {
public static void main(String[] args) throws Exception {
// Prints me%40home.com (CORRECT)
System.out.println(URLEncoder.encode("me#home.com", "UTF-8"));
// Prints Email+Address (WRONG: Should be Email%20Address)
System.out.println(URLEncoder.encode("Email Address", "UTF-8"));
// http://www.home.com/test?Email%20Address=me#home.com
// (WRONG: it has not encoded the # in the email address)
URI uri = new URI("http", "www.home.com", "/test", "Email Address=me#home.com", null);
System.out.println(uri.toString());
}
}
For some reason, URLEncoder does the email address correctly but not spaces, and URI does spaces currency but not email addresses.
How should I encode these 2 parameters to be consistent with what w3schools says is correct (or is w3schools wrong?)

Although I think the answer from #fge is the right one, as I was using a 3rd party webservice that relied on the encoding outlined in the W3Schools article, I followed the answer from Java equivalent to JavaScript's encodeURIComponent that produces identical output?
public static String encodeURIComponent(String s) {
String result;
try {
result = URLEncoder.encode(s, "UTF-8")
.replaceAll("\\+", "%20")
.replaceAll("\\%21", "!")
.replaceAll("\\%27", "'")
.replaceAll("\\%28", "(")
.replaceAll("\\%29", ")")
.replaceAll("\\%7E", "~");
} catch (UnsupportedEncodingException e) {
result = s;
}
return result;
}

URI syntax is defined by RFC 3986 (permissible content for a query string are defined in section 3.4). Java's URI complies to this RFC, with a few caveats mentioned in its Javadoc.
You will notice that the pchar grammar rule is defined by:
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
Which means a # is legal in a query string.
Trust URI. It will do the correct, "legal" stuff.
Finally, if you have a look at the Javadoc of URLEncoder, you see that it states:
This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.
Which is not the same thing as a query string as defined by the URI specification.

Related

URL change encoding + instead of %20

URL encoding normally replaces a space with a plus (+) sign or with %20.
In spring MVC it replaces with %20. My controller as:
#GetMapping(path = "/post/{id}/{title}")
public String postView(#PathVariable("id") Long id, #PathVariable("title") String title, Model model){
Post post = postService.findById(id);
model.addAttribute("post", post);
return "singlePost";
}
I need to replace the %20 with (+) or (-)
Thanks

You can use decode method of URLDecoder class. As an example, if title have url encoded values,
String urlDecodedTitle = URLDecoder.decode(title, StandardCharsets.UTF_8.toString())

In the path of a URL the spaces are replaced by %20 ([RFC3986][1]), while URL query parameters follow the application/x-www-form-urlencoded that replaces spaces by +.
If you need to encode a query string parameter, you can use java.net.URLEncoder.
But as you are using #PathVariable, your parameters are part of the path, hence they must be encoded with spaces replaced by %20. Spring provides UriUtils.encodePath for this task.
For example, to build a query to your /post/{id}/{title} mapping:
Long id = 1L;
String title = "My title";
String path = "/post/" + id + "/" + UriUtils.encodePathSegment(title, "UTF-8");
On your postView method you don't need to do any decoding, as Spring does it already.
[1]: https://www.rfc-editor.org/rfc/rfc3986

URLEncoder - what character set to use for empty space instead of %20 or +

I am trying to open new email from my Java app:
String str=String.valueOf(email);
String body="This is body";
String subject="Hello worlds";
String newStr="mailto:"+str.trim()+"?subject="+URLEncoder.encode(subject,"UTF-8")+"&body="+URLEncoder.encode(body, "UTF-8")+"";
Desktop.getDesktop().mail(new URI(newStr));
Here it is my URLEncoding. As I cannot use body or subject string in URL without encoding them, my output here is with "+" instead of whitespace. Which is normal, I understand that. I was thinking if there is a way to visualize subject and body normally in my message? I tried with .replace("+"," ") but it is not working as it is giving an error. This is how it is now:
I think there might be different character set but I am not sure.

That's the way URLEncoder works.
One possible approach would be to replace all + with %20 after URLEncoder.enocde(...)
Or you could rely on URI constructor to encode your parameters correctly:
String scheme = "mailto";
String recipient = "recipient#snakeoil.com";
String subject = "The Meaning of Life";
String content = "..., the universe and all the rest is 42.\n Rly? Just kidding. Special characters: äöü";
String path = "";
String query = "subject=" + subject + "&body=" + content;
Desktop.getDesktop().mail(new URI(scheme, recipient, path, query, null));
Both solutions have issues:
In the first approach, you might replace actual + signs, with the second, you'll have issues with & character.

How to pass caret symbol in URL?

I need to pass ^ like a value of parameter in URL. For example:
http://localhost:8080/myapp/books?filter=^
But have an error:java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986. I've read, that I need to encode. Have something like this, but it still doesn't work. I also try to add
System.setProperty("tomcat.util.http.parser.HttpParser.requestTargetAllow" ^ ");
but for ^ it doen't help.
I have a controller:
#RequestMapping("/books")
public String getBooks(#RequestParam(value = "filter") String filter, Model model)
throws UnsupportedEncodingException {
String par = URLEncoder.encode(nameFilter,"UTF-8");
List<Books> books = (List<Books>) booksService.findAll(filter);
model.addAttribute("books", books);
return "getBooks";
}
}

Try encoding the URI before doing a request to your REST Api
For instance, when you're using JS read this:
https://www.w3schools.com/jsref/jsref_encodeURI.asp
On Java: Java URL encoding: URLEncoder vs. URI
Goodluck!

Try to follow this, it will help:
https://secure.n-able.com/webhelp/NC_9-1-0_SO_en/Content/SA_docs/API_Level_Integration/API_Integration_URLEncoding.html
#Mark’s comment is also correct.

Java URL Class getPath(), getQuery() and getFile() inconsistent with RFC3986 URI Syntax

I am writing a utility class that semi-wraps Java's URL class, and I have written a bunch of test cases to verify the methods I have wrapped with a customized implementation. I don't understand the output of some of Java's getters for certain URL strings.
According to the RFC 3986 specification, a path component is defined as follows:
The path is terminated by the first question mark ("?") or number sign
("#") character, or by the end of the URI.
A query component is defined as follows:
The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
I have a couple test cases which are treated by Java as valid URLs, but getters for path, file and query don't return the values I had expected:
URL url = new URL("https://www.somesite.com/?param1=val1");
System.out.print(url.getPath());
System.out.println(url.getFile());
System.out.println(url.getQuery());
The above results in the following output:
//?param1=val1
param1=val1
<empty string>
My other test case:
URL url = new URL("https://www.somesite.com?param1=val1");
System.out.print(url.getPath());
System.out.println(url.getFile());
System.out.println(url.getQuery());
The above results in the following output:
?param1=val1
param1=val1
<empty string>
According to the documentation for Java URL:
public String getFile()
Gets the file name of this URL. The returned file portion will be the
same as getPath(), plus the concatenation of the value of getQuery(), if
any. If there is no query portion, this method and getPath() will return
identical results.
Returns:
the file name of this URL, or an empty string if one does not exist
So, my test cases result in empty string when getQuery() is invoked. In which case, I would expected getFile() to return the same value as getPath(). This is not the case.
I had expected the following output for both test cases:
<empty string>
?param1=val1
param1=val1
Maybe my interpretation of the RFC 3986 is not correct. But the output I have seen also does not line up with the documentation for the URL class either? Can anyone explain what I am seeing?

Here some executable code based on your fragments:
import java.net.MalformedURLException;
import java.net.URL;
public class URLExample {
public static void main(String[] args) throws MalformedURLException {
printURLInformation(new URL("https://www.somesite.com/?param1=val1"));
printURLInformation(new URL("https://www.somesite.com?param1=val1"));
}
private static void printURLInformation(URL url) {
System.out.println(url);
System.out.println("Path:\t" + url.getPath());
System.out.println("File:\t" + url.getFile());
System.out.println("Query:\t" + url.getQuery() + "\n");
}
}
Works fine, here is the result as you might have expected. The only difference is, that you used one System.out.print, followed by System.out.println that printed the result for path and file in the same line.
https://www.somesite.com/?param1=val1
Path: /
File: /?param1=val1
Query: param1=val1
https://www.somesite.com?param1=val1
Path:
File: ?param1=val1
Query: param1=val1

Get last part of url using a regex

How do I get the last part of the a URL using a regex, here is my URL, I want the segmeent between the last forward slash and the #
http://mycompany.com/test/id/1234#this
So I only want to get 1234.
I have the following but is not removing the '#this'
".*/(.*)(#|$)",
I need this while indexing data so don't want to use the URL class.

Just use URI:
final URI uri = URI.create(yourInput);
final String path = uri.getPath();
path.substring(path.lastIndexOf('/') + 1); // will return what you want
Will also take care of URIs with query strings etc. In any event, when having to extract any part from a URL (which is a URI), using a regex is not what you want: URI can handle it all for you, at a much lower cost -- since it has a dedicated parser.
Demo code using, in addition, Guava's Optional to detect the case where the URI has no path component:
public static void main(final String... args) {
final String url = "http://mycompany.com/test/id/1234#this";
final URI uri = URI.create(url);
final String path = Optional.fromNullable(uri.getPath()).or("/");
System.out.println(path.substring(path.lastIndexOf('/') + 1));
}

how about:
".*/([^/#]*)(#.*|$)"

Addition to what #jtahlborn answer to include query string:
".*/([^/#|?]*)(#.*|$)"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java URL encoding: URLEncoder vs. URI - java

Related

URL change encoding + instead of %20

URLEncoder - what character set to use for empty space instead of %20 or +

How to pass caret symbol in URL?

Java URL Class getPath(), getQuery() and getFile() inconsistent with RFC3986 URI Syntax

Get last part of url using a regex

Categories

Resources