java.net.URI and percent in query parameter value - java

System.out.println(
new URI("http", "example.com", "/servlet", "a=x%20y", null));
The result is http://example.com/servlet?a=x%2520y, where the query parameter value differs from the supplied one. Strange, but this does follow the Javadoc:
"The percent character ('%') is always quoted by these constructors."
We can pass the decoded string, a=x y and then we get a reasonable(?) result a=x%20y.
But what if the query parameter value contains an "&" character? This happens for example if the value is an URL itself with query parameters. Look at this (wrong) query string:
a=b&c. The ampersand must be escaped here (a=b%26c), otherwise this can be considered as a query parameter a=b and some garbage (c). If I pass this to an URI constructor, it encodes it, and returns a wrong URL: ...?a=b%2526c
This issue seems to render java.util.URI useless. Am I missing something here?
Summary of answers
java.net.URI does know about the existence of the query part of an URI, but it does not understand the internals of the query part, which can differ for each scheme. For example java.net.URI does not understand the internal structure of the HTTP query part. This would not be a problem, if java.net.URI considered query as an opaque string, and did not alter it. But it tries to apply some generic percent-encoding algorithm, which breaks HTTP URLs.
Therefore I cannot use the URI class to reliably assemble an URL from its parts, despite there are constructors for it. I would also mention that as of Java 7, the implementation of the relativize operation is quite limited, only works if one URL is the prefix of another one. These two functionality (and its leaner interface for these purposes) were the reason why I was interested in java.net.URI, but neither of them works for me.
At the end I used java.net.URL for parsing, and wrote code to assemble an URL from parts and to relativize two URLs. I also checked the Apache HttpClient URIBuilder class, and although it does understand the internals of an HTTP query string, but as of 4.3, it has the same problem with encoding like java.net.URI when dealing with the query part as a whole.

The query string
a=b&c
is not wrong in a URI. The RFC on URI Generic Syntax states
The query component is a string of information to be interpreted by
the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "#",
"&", "=", "+", ",", and "$" are reserved.
The character & in the query string is very much valid (uric represents reserved, mark, and alphanumeric characters). The RFC also states
Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.
Because the & is valid but reserved, it is up to the user to determine if it is meant to be encoded or not.
What you call a query parameter is not a feature of a URI and therefore the URI class has no reason to (and shouldn't) support it.
Related:
Which characters make a URL invalid?

The only workaround I found was to use the single-argument constructors and methods. Note that you must use URI#getRawQuery() to avoid decoding %26. For example:
URI uri = new URI("http://a/?b=c%26d&e");
// uri.getRawQuery() equals "b=c%26d&e"
uri = new URI(new URI(uri.getScheme(), uri.getAuthority(),
uri.getPath(), null, null) + "?f=g%26h&i");
// uri.getRawQuery() equals "f=g%26h&i"
uri = uri.resolve("?j=k%26l&m");
// uri.getRawQuery() equals "j=k%26l&m"
// uri.toString() equals "http://a/?j=k%26l&m"

Single working solution known for me is reflection (see https://blog.stackhunter.com/2014/03/31/encode-special-characters-java-net-uri/)
URI uri = new URI("http", null, "example.com", -1, "/accounts", null, null);
Field field = URI.class.getDeclaredField("query");
field.setAccessible(true);
field.set(uri, encodedQueryString);
//clear cached string representation
field = URI.class.getDeclaredField("string");
field.setAccessible(true);
field.set(uri, null);

Use URLEncoder.encode() method, in your case for example:
URLEncoder.encode("a=x%20y", "ISO-8859-1");

Related

Rest controller request multiple path variables with multiple query parameters

How can we create a Rest API (Spring controller) which allows multiple path variables to have query parameters?
Where
1) function is a path variable and id=functionname is query parameter
2) subfunction is a path variable and id=subfuntionname is query parameter
Request URL : /content/v1/clients/clientname/function?id=functionname&subfunction?id=subfunctionname
Update I am using matrix variations suggested by
/content/v1/clients/clientname/function;id=functionname/subfunction;id=subfunctionname
The method shown below is not working as expected.
What should the method definition look like?
public HashMap<String, List<Model>> getContent(
#PathVariable String clientname,
#MatrixVariable(name="id", pathVar="function") List<String> capabilitiesId,
#MatrixVariable(name="id", pathVar="subfunction") List<String> subcapabilitiesId) {
}
Error : Missing matrix variable 'id' for method parameter of type List
It's not possible.
In REST controller you have two type of parameters:
Path parameter: parameter usefull to select a resource. (a you class's method)
Query parameter: parameter useful to send other information.
In your case I think that is a good idea send all this informations inside payload, using POST or PUT http method.
If you can't use payload you can obtain the following solution:
Request URL : /content/v1/clients/clientname/function1/function2?id1=functionnamec&id2=subfunctionaname
In this way you can create your controller with 2 path parameters and 2 query parameters:
#GET
#Path("/basePath/{funct1}/{funct2}")
public Response <methodName>(#PathParam("funct1") String funct1, #PathParam("funct2") String funct2, #QueryParam("id1") String id1, #QueryParam("id2") String id2)
/content/v1/clients/clientname/function?id=functionnamec&subfunction?id=subfunctionaname
The parsing of URI is defined by RFC 3986. In particular, U+003F QUESTION MARK is a reserved character, the first instance of which serves a the delimiter between the relative-part and the query.
So your example breaks would parse as
path: /content/v1/clients/clientname/function
query: id=functionnamec&subfunction?id=subfunctionaname
And if we were to parse the query, as though it were an application/x-www-form-urlencoded value....
>>> import urllib.parse
>>> urllib.parse.parse_qs("id=functionnamec&subfunction?id=subfunctionaname")
{'id': ['functionnamec'], 'subfunction?id': ['subfunctionaname']}
We see that the second question mark becomes part of the parameter name.
In short, it's a perfectly valid URI, but it isn't likely to produce the results that you are hoping for.
/content/v1/clients/clientname/function/subfunction?id=functionnamec&id=subfunctionaname
This might be usable, but there's likely to be some confusion about the duplicate id query parameters
>>> urllib.parse.parse_qs("id=functionnamec&id=subfunctionaname")
{'id': ['functionnamec', 'subfunctionaname']}
/content/v1/clients/clientname/function/subfunction?function.id=functionnamec&subfunction.id=subfunctionaname
>>> urllib.parse.parse_qs("function.id=functionnamec&subfunction.id=subfunctionaname")
{'function.id': ['functionnamec'], 'subfunction.id': ['subfunctionaname']}
That might be easier.
I think it would be common to take the data out of the query and put it on the path instead
/content/v1/clients/clientname/function/functionname/subfunction/subfunctionaname
And then extract the path parameters you need.

How to pass date(dd/MM/yyyy HH:mm) as a parameter in REST API

I am trying to write a rest api in which I am passing date as a URL parameter.
Date formate is dd/MM/yyyy HH:mm;
REST API URL Is
public static final String GET_TestDate = "/stay/datecheck?dateCheckIn={dateCheckIn}";
and Rest Method is
#RequestMapping(value = HotelRestURIConstants.GET_TestDate, method = RequestMethod.GET)
public #ResponseBody String getDate(#PathVariable("dateCheckIn") #DateTimeFormat(iso= DateTimeFormat.ISO.DATE) String dateCheckIn) {
logger.info("passing date as a param");
String str="date"+dateCheckIn;
return str;
}
but when am calling this api using REST client I am getting 404 error.
Here is REST URL
http://localhost:8089/stay/datecheck?dateCheckIn="28/01/2016 19:00"
Instead of space, use %20. Instead of slash, you can use %2F. But, you have to decode (transform %20 to space and %2F to slash) after you get the value. Instead of colon, use %3A. You have an URL enconding table here: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
The last hint: don't use quotes.
Try something like:
http://localhost:8089/stay/datecheck?dateCheckIn=28%2F01%2F2016%2019%3A00
Remember to decode it.
Something like: String result = java.net.URLDecoder.decode(url, "UTF-8");
The main problem is here: #PathVariable("dateCheckIn") #DateTimeFormat(iso= DateTimeFormat.ISO.DATE) String dateCheckIn
dateCheckIn should not be #PathVariable but #RequestParam
Let's see the difference:
http://localhost:8089/stay/{path_var}/datecheck?{query_param}=some_value
Path variable is a part of the path, it must be there for the path to map correctly to your method. In the actual call, you never actually specify any name for the variable. Query parameter (or request parameter) is a parameter that occurs after "?" which appears after the path. There you always write the name of a parameter followed by the "=" sign and the value. It might or may not be required. See following example:
Path String:
String GET_TestDate = "/stay/{path_var}/datecheck";
Parameter annotations:
#PathVariable("path_var") Integer var1, #RequestParam("query_param") String
Actual call:
http://localhost:8089/stay/1/datecheck?query_param=abc
Values populated:
var1 = 1
var2 = "abc"
There might be other problems (such as the date format you used in your URL - you shouldn't use quotes and spaces and should URL encode it or change the format to use dashes for example or send time and date in Epoch (unix time) format), but I believe the 404 is because of the wrong Path String and annotations on your method.
More on the topic: http://docs.spring.io/spring/docs/current/spring-framework-reference/html/mvc.html#mvc-ann-requestparam
http://docs.spring.io/spring/docs/current/spring-framework-reference/html/mvc.html#mvc-ann-requestmapping-uri-templates
You actually have 2 problems.
Your 404 is because your URL doesn't match any patterns. This is almost certainly because you didn't MIME encode your date parameter. An actual browser will do this for you but code/REST clients probably won't as they wisely should never mess with your input.
Your next problem is that your date is a #QueryParam and not #PathParam. Once you fix the encoding issue you would then discover that your date would be null since there is not PathParam by that name

How to pass an url as path param?In JAX-RS #Path

http://localhost:8181/RESTfulExample/entityid/https://www.youtube.com
#GET
#Path("/entityid/{entityid : [a-zA-Z][a-zA-Z_0-9]}")
public Response getUserByentityid(#PathParam("entityid") String entityid) {
return Response.status(200)
.entity("getUserByentityid is called, username : " + entityid)
.build();
}
How to modify the regular expression to accept an url in it? Or any other alternate solution to fetch the entityid which is an URL?
You shouldn't pass a URL as a path parameter. Either the entity id is a regular identifier (integer, GUID, ...), in which case it can be in the path like you have it, or it is a URL, in which case the URL would be https://example.com/myapp/entityid/123 and you're back to the entity id part of the entity id URL being just a regular identifier.
Now, technically, you can pass a URL as a path parameter by encoding all special characters using percent encoding, however I would not recommend it.
Let's say your app is at https://example.net/otherapp/, then the combined url would be:
https://example.net/otherapp/entityid/https%3A%2F%2Fexample.com%2Fmyapp%2Fentityid%2F123
The regular expression would match against the unencoded value, so this might work:
{entityid : https?://.*}
NOTE: Encoding a path segment must be done for all values, not just for URL value. Integer numbers are safe, but pretty much all other values must be encoded.

Java: Search in a wrong encoded String without modifying it

I have to find a user-defined String in a Document (using Java), which is stored in a database in a BLOB. When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one).
Some additional information:
The document's content is returned as String in "ISO-8859-1" (Latin1).
Here is an example, what a String could look like:
Die Erkenntnis, daà der Künstler Schutz braucht, ...
This is how it should look like:
Die Erkenntnis, daß der Künstler Schutz braucht, ...
If I am searching for Künstler it would fail to find it, because it looks for ü but only finds ü.
Is it possible to convert Künstler into Künstler so I can search for the wrong encoded version instead?
Note:
We are using the Hibernate Framework for Database access. The original Getter for the Document's Content returns a byte[]. The String is than returned by calling
new String(getContent(), "ISO-8859-1")
The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way.
Okay, looks like I've found a way to mess up the encoding on purpose.
new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")
By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to Künstler. It's a hell of a hack but seems to work well.
Already answered by yourself.
An altoghether different approach:
If you can search the blob, you could search using
"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"
This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design.

Any RFC 2397 Data URI Parser for Java?

dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype := [ type "/" subtype ] *( ";" parameter )
data := *urlchar
parameter := attribute "=" value
value := token / quoted-string
According to these BNF from the RFCs, the comma that separates the data from the mime type can actually appear in both the mime type and the data, so there's no simple way (i.e. reg ex) to break the URI into parts. Thus a full parser is needed.
I am wondering does any one know any data URI libraries in Java? My Google search didn't yield anything.
There is a Java data URI parser implementation available on GitHub called jDataUri.
Disclaimer: I am the author
I ended up having to implement my own parser. The RFCs provided BNFs, so it's possible to implement full lexers and syntax analysers. However, for this simple case, I jused used a simple scanning + stack mechamism to trace the quoted strings and locate the separating comma. javax.activation's MimeType is used for actual Mime parsing.

Categories