Any RFC 2397 Data URI Parser for Java?

Any RFC 2397 Data URI Parser for Java? - java

dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype := [ type "/" subtype ] *( ";" parameter )
data := *urlchar
parameter := attribute "=" value
value := token / quoted-string
According to these BNF from the RFCs, the comma that separates the data from the mime type can actually appear in both the mime type and the data, so there's no simple way (i.e. reg ex) to break the URI into parts. Thus a full parser is needed.
I am wondering does any one know any data URI libraries in Java? My Google search didn't yield anything.

There is a Java data URI parser implementation available on GitHub called jDataUri.
Disclaimer: I am the author

I ended up having to implement my own parser. The RFCs provided BNFs, so it's possible to implement full lexers and syntax analysers. However, for this simple case, I jused used a simple scanning + stack mechamism to trace the quoted strings and locate the separating comma. javax.activation's MimeType is used for actual Mime parsing.

Related

Encoding Parameter passed as URL- Alternate options?

I am passing a String value in a URL
eg: http://localhost:8080/webservice/useradmin/a%bghijlk123/0978+gh
The String "ab%ghijlk123/0978+gh" breaks the URL.
What are the available options to overcome this.
Is encoding the string the only option? There must be minimal code change. Any server side configurations can be used to achieve this?
Kindly provide suggestions please.

Is encoding the string the only option?
It is the only correct option.
Use URLEncoder.encode("ab%ghijlk123/0978+gh", "UTF-8"),
which will give you ab%25ghijlk123%2F0978%2Bgh, for a full URL of:
http://localhost:8080/webservice/useradmin/ab%25ghijlk123%2F0978%2Bgh
The URL http://localhost:8080/webservice/useradmin/a%bghijlk123/0978+gh is invalid.
The URL specification (RFC3986) says that path segments (the values separated by a /) may only consist of:
ALPHA: "a"-"z", "A"-"Z"
DIGIT: "0"-"9"
Special chars: - . _ ~ ! $ & ' ( ) * + , ; = : #
pct-encoded: "%" HEXDIG HEXDIG
Values that has to be disallowed because they have other meanings are: / (path separator), ? (start of query), # (start of fragment), and % (start of 2-digit hex encoded char).
As you can see, the % sign is only allowed as a percent-encoded character, so %bg makes the URL invalid.
If the part after the useradmin/ is supposed to be the value ab%ghijlk123/0978+gh, then it must be encoded as shown above.
If the server rejects that as "400:Bad request", then the server is in error.

How to express advanced expressions between query parameters in a REST API?

The problem (or missing feature) is the lack of expression possibility between different query parameters. As I see it you can only specify and between parameters, but how do you solve it if you want to have not equal, or or xor?
I would like to be able to express things like:
All users with age 20 or the name Bosse
/users?age=22|name=Bosse
All users except David and Lennart
/users?name!=David&name!=Lennart
My first idea is to use a query parameter called _filter and take a String with my expression like this:
All users with with age 22 or a name that is not Bosse
/users?_filter=age eq 22 or name neq Bosse
What is the best solution for this problem?
I am writing my API with Java and Jersey, so if there is any special solution for Jersey, let me know.

I can see two solutions to achieve that:
Using a special query parameter containing the expression when executing a GET method. It's the way OData does with its $filter parameter (see this link: https://msdn.microsoft.com/fr-fr/library/gg309461.aspx#BKMK_filter). Here is a sample:
/AccountSet?$filter=AccountCategoryCode/Value eq 2 or AccountRatingCode/Value eq 1
Parse.com also uses such approach with its where parameter but the query is described using a JSON structure (see this link: https://parse.com/docs/rest/guide#queries). Here is a sample:
curl -X GET \
-H "X-Parse-Application-Id: ${APPLICATION_ID}" \
-H "X-Parse-REST-API-Key: ${REST_API_KEY}" \
-G \
--data-urlencode 'where={"score":{"$gte":1000,"$lte":3000}}' \
https://api.parse.com/1/classes/GameScore
If it's something too difficult to describe, you could also use a POST method and specify the query in the request payload. ElasticSearch uses such approach for its query support (see this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html). Here is a sample:
$ curl -XGET 'http://localhost:9200/twitter/tweet/_search?routing=kimchy' -d '{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "some query string here"
}
},
"filter" : {
"term" : { "user" : "kimchy" }
}
}
}
}
'
Hope it helps you,
Thierry

OK so here it is
You could add + or - to include or exclude , and an inclusive filter keyword for AND and OR
For excluding
GET /users?name=-David,-Lennart
For including
GET /users?name=+Bossee
For OR
GET /users?name=+Bossee&age=22&inclusive=false
For AND
GET /users?name=+Bossee&age=22&inclusive=true
In this way the APIs are very intuitive, very readable also does the work you want it to do.
EDIT - very very difficult question , however I would do it this way
GET /users?name=+Bossee&age=22&place=NewYork&inclusive=false,true
Which means the first relation is not inclusive - or in other words it is OR
second relation is inclusive - or in other words it is AND
The solution is with the consideration that evaluation is from left to right.

Hey it seems impossible if you go for queryparams...
If it is the case to have advanced expressions go for PathParams so you will be having regular expressions to filter.
But to allow only a particular name="Bosse" you need to write a stringent regex.
Instead of taking a REST end point only for condition sake, allow any name value and then you need to write the logic to check manually with in the program.

java.net.URI and percent in query parameter value

System.out.println(
new URI("http", "example.com", "/servlet", "a=x%20y", null));
The result is http://example.com/servlet?a=x%2520y, where the query parameter value differs from the supplied one. Strange, but this does follow the Javadoc:
"The percent character ('%') is always quoted by these constructors."
We can pass the decoded string, a=x y and then we get a reasonable(?) result a=x%20y.
But what if the query parameter value contains an "&" character? This happens for example if the value is an URL itself with query parameters. Look at this (wrong) query string:
a=b&c. The ampersand must be escaped here (a=b%26c), otherwise this can be considered as a query parameter a=b and some garbage (c). If I pass this to an URI constructor, it encodes it, and returns a wrong URL: ...?a=b%2526c
This issue seems to render java.util.URI useless. Am I missing something here?
Summary of answers
java.net.URI does know about the existence of the query part of an URI, but it does not understand the internals of the query part, which can differ for each scheme. For example java.net.URI does not understand the internal structure of the HTTP query part. This would not be a problem, if java.net.URI considered query as an opaque string, and did not alter it. But it tries to apply some generic percent-encoding algorithm, which breaks HTTP URLs.
Therefore I cannot use the URI class to reliably assemble an URL from its parts, despite there are constructors for it. I would also mention that as of Java 7, the implementation of the relativize operation is quite limited, only works if one URL is the prefix of another one. These two functionality (and its leaner interface for these purposes) were the reason why I was interested in java.net.URI, but neither of them works for me.
At the end I used java.net.URL for parsing, and wrote code to assemble an URL from parts and to relativize two URLs. I also checked the Apache HttpClient URIBuilder class, and although it does understand the internals of an HTTP query string, but as of 4.3, it has the same problem with encoding like java.net.URI when dealing with the query part as a whole.

The query string
a=b&c
is not wrong in a URI. The RFC on URI Generic Syntax states
The query component is a string of information to be interpreted by
the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "#",
"&", "=", "+", ",", and "$" are reserved.
The character & in the query string is very much valid (uric represents reserved, mark, and alphanumeric characters). The RFC also states
Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.
Because the & is valid but reserved, it is up to the user to determine if it is meant to be encoded or not.
What you call a query parameter is not a feature of a URI and therefore the URI class has no reason to (and shouldn't) support it.
Related:
Which characters make a URL invalid?

The only workaround I found was to use the single-argument constructors and methods. Note that you must use URI#getRawQuery() to avoid decoding %26. For example:
URI uri = new URI("http://a/?b=c%26d&e");
// uri.getRawQuery() equals "b=c%26d&e"
uri = new URI(new URI(uri.getScheme(), uri.getAuthority(),
uri.getPath(), null, null) + "?f=g%26h&i");
// uri.getRawQuery() equals "f=g%26h&i"
uri = uri.resolve("?j=k%26l&m");
// uri.getRawQuery() equals "j=k%26l&m"
// uri.toString() equals "http://a/?j=k%26l&m"

Single working solution known for me is reflection (see https://blog.stackhunter.com/2014/03/31/encode-special-characters-java-net-uri/)
URI uri = new URI("http", null, "example.com", -1, "/accounts", null, null);
Field field = URI.class.getDeclaredField("query");
field.setAccessible(true);
field.set(uri, encodedQueryString);
//clear cached string representation
field = URI.class.getDeclaredField("string");
field.setAccessible(true);
field.set(uri, null);

Use URLEncoder.encode() method, in your case for example:
URLEncoder.encode("a=x%20y", "ISO-8859-1");

how do I create a servlet with parameters and a "/" at the ending?

My servlet needs to receive 2 parameters to respond.
My favorite solution (but it doesn't work in my context):
http://domain.com/?param1=something&param2=anything
because: I've another application which requires that a url ends with "/". But I can't create a servlet which accepts urls like "http://domain.com/?param1=something&param2=anything/" <<- / at the end.
My second solution is:
http://domain.com/param1/param2/
I could split the requested url by "/" and I would have my 2 parameters. But it's not that nice..
Is there a better way to pass through 2 parameters and have an url which ends on a "/"?

I think it is not possible. As it is defined in the HTTP RFC
"http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
After the first "?" there is the query part. So in your example
http://domain.com/?param1=something&param2=anything/
That means param2 value is anything/ (with the slash in the end)
Of course you can bind your servlet to /* url-pattern and process the parameters in the servlet using ServletRequest.getParameter(). But don't forget that your param2 will end with a /

According to RFC 3986, section 3.3, it is possible to assign a set of parameters to each path segment like so:
http://domain.com/path;param1=value1;param2=value2/subpath/subsubpath/
So you can have parameters without the query part.
But the downside is:
What you want to achieve is mabye not the intended use case for that feature.
Other than for query parameters, there is no API support for segment parameters. So you have to parse the parameters on your own.

parsing a string by regular expression

I have a string of
"name"=>"3B Ae", "note"=>"Test fddd \"33 Ae\" FIXME", "is_on"=>"keke, baba"
and i want to parse it by a java program into segments of
name
3B Ae
note
Test fddd \"33 Ae\" FIXME
is_on
keke, baba
It is noted that the contents of the string, i.e. name, 3B Ae, are not fixed.
Any suggestion?

If you:
replace => with :
Wrap the full string with {}
The result will look like this, which is valid JSON. You can then use a JSON parser (GSON or Jackson, for example) to parse those values into a java object.
{
"name": "3B Ae",
"note": "Test fddd \"33 Ae\" FIXME",
"is_on": "keke, baba"
}
If you have control over the process that produces this string, I highly recommend that you use a standard format like JSON or XML that can be parsed more easily on the other end.

Because of the quoting rules, I'm not certain that a regular expression (even a PCRE with negative lookbehinds) can parse this consistently. What you probably want is to use a pushdown automaton, or some other parser capable of handling a context-free language.

If you can make sure your data (key or value) does not have a => or a , (or find some other delimiters that will not occur), the solution is pretty simple:
Split the string by , you get the key => value pairs
Split the key value => pairs by => you get what you want
if inputString holds
"name"=>"3B Ae", "note"=>"Test fddd \"33 Ae\" FIXME", "is_on"=>"keke baba"
(from a file for instance)
(I have changed the , to ; from between keke and baba)
String[] keyValuePairs = inputString.split(",");
for(String oneKeyValue : keyValuePairs)
{
String[] keyAndValue = oneKeyValue.split("=>");
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.