Definite guide to valid Cookie values - java

I know there are other questions but they seem to have answers which are assumptions rather than being definitive.
My limited understanding is that cookie values are:
semi-colons are already used to separate cookies attributes within a single cookie.
equals signs are used to separate cookie names and values
colons are used to separate multiple cookies within a header.
Are there any other "special" characters ?
Some other q/a suggest that one base64 encodes the value but this does of course may include equals signs which of course are not valid.
i have also seen some suggestions that values may be quoted this however leads to other questions.
do the special characters need to be quoted ?
do quoted values support the usual backslash escaping mechanisms.
RFC
I read a few RFCs including some of the many cookie RFCS but i am still unsure as there is cross reference to another RFC etc with no definitive simple explaination or sample that "answers" my query.
Hopefully no one will say read the RFC because the question becomes which RFC...?
I think i have also read that different browsers have slightly different rules so hopefully please note this in your answers if this matters.

The latest RFC is 6265, and it states that previous Cookie RFCs are obsoleted.
Here's what the syntax rules in the RFC say:
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
Thus:
The special characters are white-space characters, double quote, comma, semicolon and backslash. Equals is not a special character.
The special characters cannot be used at all, with the exception that double quotes may surround the value.
Special characters cannot be quoted.
Backslash does not act as an escape.
It follows that base-64 encoding can be used, because equals is not special.
Finally, from what I can tell, the RFC 6265 cookie values are defined so that they will work with any browser that implements any of the Cookie RFCs. However, if you tried to use cookie values that don't conform to RFC 6265 (but do arguably do conform to earlier RFCs), you may find that cookie behavior varies with different browsers.
In short, conform to the letter of RFC 6265 and you should be fine.
If you need pass cookie values that include any of the forbidden characters, your application needs to do its own encoding and decoding of the values; e.g. using base64.

There was the mention of base64, so here is a cooked cookie solution using that in cookies. The functions are about a modified version of base64, they only use [0-9a-zA-Z_-]
You can use it for both the name and value part of cookies, is binary safe, as they say.
The gzdeflate/gzinflate takes back 30% or so space created by base64, could not resist using it. Note that php gzdeflate/gzinflate is only available in most hosting companies, not all.
//write
setcookie
(
'mycookie'
,code_base64_FROM_bytes_cookiesafe(gzdeflate($mystring))
,time()+365*24*3600
);
//read
$mystring=gzinflate(code_bytes_FROM_base64_cookiesafe($_COOKIE['mycookie']));
function code_base64_FROM_bytes_cookiesafe($bytes)
{
//safe for name and value part [0-9a-zA-Z_-]
return strtr(base64_encode($bytes),Array
(
'/'=>'_',
'+'=>'-',
'='=>'',
' '=>'',
"\n"=>'',
"\r"=>'',
));
}
function code_bytes_FROM_base64_cookiesafe($enc)
{
$enc=str_pad($enc,strlen($enc)%4,'=',STR_PAD_RIGHT);//add back =
$enc=chunk_split($enc);//inserts \r\n every 76 chars
return base64_decode(strtr($enc,Array
(
'_'=>'/',
'-'=>'+',
)));
}

Related

Does opentsdb accept special characters like degree symbol (°), % or ²

I am posting the temperature value form my java code to opentsdb. So in one of the tags I wanted to display the measurement-type like whether the reading is in °C or °F. So I tried to post the unicode "\u00b0" from java, though in the System.out.println I am able to see the degree symbol but when I post the opentsdb is not accepting the value.
I also read the article where it defines the characters which are accepted by opentsdb.(in the Metrics and Tags section) and it defines that it accepts Unicode letters. but when I try to send the unicode of degree it doesn't work.
So does it accept the unicode of these characters? How can I send them.
http://opentsdb.net/docs/build/html/user_guide/writing.html
The following rules apply to metric and tag values:
Strings are case sensitive, i.e. "Sys.Cpu.User" will be stored separately from "sys.cpu.user"
Spaces are not allowed.
Only the following characters are allowed: a to z, A to Z, 0 to 9, -, _, ., / or Unicode letters (as per the specification)
But in fact, other than above mentioned characters no other is supported by opentsdb.
As of opentsdb version 2.3 there is support for specifying additional characters to allow via the config variable (cross posting from OpenTsdb: Is Space character allowed in Metric and tag information )
tsd.core.tag.allow_specialchars = !##$%^&*()_+{}|: <>?~`-=[]\;',./°
http://opentsdb.net/docs/build/html/user_guide/configuration.html gives more details

Safe sending String argument to JavaScript function from Java

My Java project based on WebView component.
Now, I want to call some JS function with single String argument.
To do this, I'm using simple code:
webEngine.executeScript("myFunc('" + str + "');");
*str text is getting from the texarea.
This solution works, but not safe enough.
Some times we can get netscape.javascript.JSException: SyntaxError: Unexpected EOF
So, how to handle str to avoid Exception?
Letfar's answer will work in most cases, but not all, and if you're doing this for security reasons, it's not sufficient. First, backslashes need to be escaped as well. Second, the line.separator property is the server side's EOL, which will only coincidentally be the same as the client side's, and you're already escaping the two possibilities, so the second line isn't necessary.
That all being said, there's no guarantee that some other control or non-ASCII character won't give some browser problems (for example, see the current Chrome nul in a URL bug), and browsers that don't recognize JavaScript (think things like screenreaders and other accessibility tools) might try to interpret HTML special characters as well, so I normally escape [^ -~] and [\'"&<>] (those are regular expression character ranges meaning all characters not between space and tilde inclusive; and backslash, single quote, double quote, ampersand, less than, greater than). Paranoid? A bit, but if str is a user entered string (or is calculated from a user entered string), you need to be a bit paranoid to avoid a security vulnerability.
Of course the real answer is to use some open source package to do the escaping, written by someone who knows security, or to use a framework that does it for you.
I have found this quick fix:
str = str.replace("'", "\\'");
str = str.replace(System.getProperty("line.separator"), "\\n");
str = str.replace("\n", "\\n");
str = str.replace("\r", "\\n");

Create cookie containing ";" character

I want to create a cookie that it's value contains many ";" characters that is usually used to separate multiple cookies in java, that's why my code isn't making his job.
if someone can help me how to create this "special" cookie and make my code work?
thanks.
You can use URL encoding to escape any special characters (+, %, =, ;).
The URL encoded value of ; is %3B.
For a better reference, check out the Java JSON API.
Semicolon is not allowed in cookies. Your best shot is to use some other separator. Read the linked answer to figure out what character can be used.

RFC3986 - which pchars need to be percent-encoded?

I need to generate a href to a URI. All easy with the exception when it comes to reserved characters which need percent-encoding, e.g. link to /some/path;element should appear as <a href="/some/path%3Belement"> (I know that path;element represents a single entity).
Initially I was looking for a Java library that does this but I ended up writing something myself (look below for what failed with Java, as this question isn't Java-specific).
So, RFC 3986 does suggest when NOT to encode. This should happen, as I read it, when character falls under unreserved (ALPHA / DIGIT / "-" / "." / "_" / "~") class. So far so good. But what about the opposite case? RFC only mentions that percent (%) always needs encoding. But what about the others?
Question: is it correct to assume that everything that is not unreserved, can/should be percent-encoded? For example, opening bracket ( does not necessarily need encoding but semicolon ; does. If I don't encode it I end up looking for /first* when following <a href="/first;second">. But following <a href="/first(second"> I always end up looking for /first(second, as expected. What confuses me is that both ( and ; are in the same sub-delims class as far as RFC goes. As I imagine, encoding everything non-unreserved is a safe bet, but what about SEOability, user friendliness when it comes to localized URIs?
Now, what failed with Java libs. I have tried doing it like
new java.net.URI("http", "site", "/pa;th", null).toASCIISTring()
but this gives http://site/pa;th which is no good. Similar results observed with:
javax.ws.rs.core.UriBuilder
Spring's UriUtils - I have tried both encodePath(String, String) and encodePathSegment(String, String)
[*] /first is a result of call to HttpServletRequest.getServletPath() in the server side when clicking on <a href="/first;second">
EDIT: I probably need to mention that this behaviour was observed under Tomcat, and I have checked both Tomcat 6 and 7 behave the same way.
Is it correct to assume that everything that is not unreserved, can/should be percent-encoded?
No. RFC 3986 says this:
"Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. "
The implication is that you decide which of the delimiters (i.e. the <delimiter> characters) need to be encoded depending on the context. Those which don't need to be encode shouldn't be encoded.
For instance, you should not percent-encode a / if it appears in a path component, but you should percent-encode it when it appears in a query or fragment.
So, in fact, a ; character (which is a member of <reserved> should not be automatically percent encoded. And indeed the java URL and URI classes won't do this; see the URI(...) javadoc, specifically step 7) for how the <path> component is handled.
This is reinforced by this paragraph:
"The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent- encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications. Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI."
So this says that a URL containing a percent-encoded ; is not the same as a URL that contains a raw ;. And the last sentence implies that they should NOT be percent encoded or decoded automatically.
Which leaves us with the question - why do you want ; to be percent encoded?
Let's say you have a CMS where people can create arbitrary pages having arbitrary paths. Later on, I need to generate href links to all pages in, for example, site map component. Therefore I need an algorithm to know which characters to escape. Semicolon has to be treated literally in this case and should be escaped.
Sorry, but it does not follow that semicolon should be escaped.
As far as the URL / URI spec is concerned, the ; has no special meaning. It might have special meaning to a particular web server / web site, but in general (i.e. without specific knowledge of the site) you have no way of knowing this.
If the ; does have special meaning in a particular URI, then if you percent-escape it, then you break that meaning. For instance, if the site uses ; to allow a session token to be appended to the path, then percent-encoding will stop it from recognizing the session token ...
If the ; is simply a data character provided by some client, then if you percent encode it, you are potentially changing the meaning of URI. Whether this matters depends on what the server does; i.e. whether is decodes or not as part of the application logic.
What this means knowing the "right thing to do" requires intimate knowledge of what the URI means to the end user and/or the site. This would require advanced mind-reading technology to implement. My recommendation would be to get the CMS to solve it by suitably escaping any delimiters the URI paths before it delivers them to your software. The algorithm is necessarily going to be specific to the CMS and content delivery platform. It/they will be responding to requests for documents identified by the URLs and will need to know how to interpret them.
(Supporting arbitrary people using arbitrary paths is a bit crazy. There have to be some limits. For instance, not even Windows allows you use a file separator character in a filename component. So you are going to have to have some boundaries somewhere. It is just a matter of deciding where they should be.)
The ABNF for an absolute path part:
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment = *pchar
segment-nz = 1*pchar
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved = gen-delims / sub-delims
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
pchar includes sub-delims so you would not have to encode any of these in the path part: :#-._~!$&'()*+,;=
I wrote my own URL builder which includes an encoder for the path - as always, caveat emptor.

HttpClient 2.0. Params "codified"

I have to use HttpClient 2.0 (can not use anything newer), and I am running into the next issue. When I use the method (post, in that case), it "codify" the parameters to the Hexadecimal ASCII code, and the "spaces" turned into "+" (something that the receiver don't want).
Does anyone know a way to avoid it?
Thanks a lot.
Even your browser does that, converting space character into +. See here http://download.oracle.com/javase/1.5.0/docs/api/java/net/URLEncoder.html
It encodes URL, converts to UTF-8 like string.
When encoding a String, the following rules apply:
The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
Also, see here http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by&'.
To answer your question, if you do not want to encode. I guess, URLDecoder.decode will help you to undo the encoded string.
You could in theory avoid this by constructing the query string or request body containing parameters by hand.
But this would be a bad thing to do, because the HTML, HTTP, URL and URI specs all mandate that reserved characters in request parameters are encoded. And if you violate this, you may find that server-side HTTP stacks, proxies and so on reject your requests as invalid, or misbehave in other ways.
The correct way to deal with this issue is to do one of the following:
If the server is implemented in Java EE technology, use the relevant servlet API methods (e.g. ServletRequest.getParam(...)) to fetch the request parameters. These will take care of any decoding for you.
If the parameters are part of a URL query string, you can instantiate a Java URL or URI object and use the getter to return you the query with the encoding removed.
If your server is implemented some other way (or if you need to unpick the request URL's query string or POST data yourself), then use URLDecoder.decode or equivalent to remove the % encoding and replace +'s ... after you have figured out where the query and parameter boundaries, etc are.

Categories