Is content-encoding being set to UTF-8 invalid?

Is content-encoding being set to UTF-8 invalid? - java

Hi I have a client that is trying to POST to us with the following http headers:
content-type: application/x-www-form-urlencoded
content-encoding: UTF-8
But our web application firewall keeps picking them up and throwing error:
Message: [file "/etc/httpd/modsecurity.d/10_asl_rules.conf"] [line "45"] [id "340362"] [msg "Atomicorp.com WAF Rules: ModSecurity does not support content encodings and can not detect attacks using it, therefore it must be blocked."] [severity "WARNING"] Access denied with code 501 (phase 2). Match of "rx ^Identity$" against "REQUEST_HEADERS:Content-Encoding" required.
Action: Intercepted (phase 2)
Anyone would like to shed some light into this matter?

It is invalid. The content-encoding specifies the data transfer encoding used by the issuer of the content. UTF-8 is not a content encoding, it is a character set. Specifying the character set is done in the content-type header:
content-type: text/plain; charset=utf-8
Valid content-encoding values are, for instance, gzip, deflate. An HTTP client should specify what content encoding it supports with the accept-encoding header; the HTTP server will reply with a content-encoding header.

Related

org.glassfish.jersey.message.internal.HeaderValueException: Unable to parse "Content-Type" header value: "multipart/byteranges"

I am trying to get multipart range byte from AWS cloudfront by using InvocationBuilder.header like:
invocationBuilder.header("Range", "bytes=100-200,300-400,500-600\r");
but I got this :
Exception in thread "Name of the thread" org.glassfish.jersey.message.internal.HeaderValueException: Unable to parse "Content-Type" header value: "multipart/byteranges; boundary=CloudFront:*number of file*"
Internal.InboundMessageContext.exception(InboundMessageContext.java:335)
at org.glassfish.jersey.message.internal.InboundMessageContext.singleHeader(InboundMessageContext.java:330)
at org.glassfish.jersey.message.internal.InboundMessageContext.getMediaType(InboundMessageContext.java:444)
at org.glassfish.jersey.message.internal.InboundMessageContext.readEntity(InboundMessageContext.java:847)
at org.glassfish.jersey.message.internal.InboundMessageContext.readEntity(InboundMessageContext.java:785)
at org.glassfish.jersey.client.ClientResponse.readEntity(ClientResponse.java:326)
I tried to use curl from cmd and it works well. Anyone helps? Thanks.

CloudFront returns an invalid content-type header in their response to MultiPart range requests.
Specifically, the boundary parameter value contains a ":" which MUST be quoted.
I've just filed a bug with AWS, but we'll see what happens. That your original question is 5 years old does not fill me with confidence 😬.
Example MultiPart request:
$ curl -vv https://foo.cloudfront.net/bar.bin -H "Range: bytes=1-100,150-200"
<snip>
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 206
< content-type: multipart/byteranges; boundary=CloudFront:32FA314F82E87793C3B34E2D1FCCE8ED
<snip>
Since they include a colon in their boundary parameter, then the parameter must be quoted like this:
< content-type: multipart/byteranges; boundary="CloudFront:32FA314F82E87793C3B34E2D1FCCE8ED"
This is according to RFC1521
WARNING TO IMPLEMENTORS: The grammar for parameters on the Content-
type field is such that it is often necessary to enclose the
boundaries in quotes on the Content-type line. This is not always
necessary, but never hurts. Implementors should be sure to study the
grammar carefully in order to avoid producing illegal Content-type
fields. Thus, a typical multipart Content-Type header field might
look like this:
Content-Type: multipart/mixed;
boundary=gc0p4Jq0M2Yt08jU534c0p
But the following is illegal:
Content-Type: multipart/mixed;
boundary=gc0p4Jq0M:2Yt08jU534c0p
(because of the colon) and must instead be represented as
Content-Type: multipart/mixed;
boundary="gc0p4Jq0M:2Yt08jU534c0p"

HTTP: What header to add in order to set the charset?

I got this Java code at my RESTFul Web Service:
Response response = Response.ok().
entity(method(paramether)).
header("Access-Control-Allow-Origin", "*").build();
I want to add some header that lets me set the charset to UTF-8 through the HTTP itself (because I'm facing some problems when trying to set only at the document). Thanks in advance.

Content-Type, e.g.:
Content-Type: text/html;charset=UTF-8

The Content-Type header can be used to specify the type of the content you return, along with it's character set.
As an example:
Content-Type: text/html; charset=utf-8
Taken from the Wikipedia article on HTTP headers https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

parsing form parameters from raw http requests in java

I'm trying to extract some information from raw HTTP Request messages (like the one below) and store them into instances of the org.apache.http.message.BasicHttpRequest (https://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/index.html) class.
I was able to employ org.apache.http.message.BasicLineParser class and its method parseHeader(String value, LineParser parser) to process (parse + store) the headers, but I don't know how to deal with the parameters passed with the form.
POST https://gist.github.com:443/gists HTTP/1.1
Host: gist.github.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en,fr;q=0.8,it-it;q=0.6,it;q=0.4,en-us;q=0.2
Accept-Encoding: gzip, deflate
DNT: 1
Referer: https://gist.github.com/
Connection: keep-alive
Content-Type: multipart/form-data; boundary=-------------------------10314229756379422411656878663
Content-Length: 1242
-----------------------------10314229756379422411656878663
Content-Disposition: form-data; name="parameter_name"
parameter_value
Do you know an utility which can parse the content starting after the header above (i.e. after the first empty line)? What I am interest in collecting are all the pairs <"parameter_name","parameter_value"> present in the request body.
I have already read similar answers/questions such as Parsing raw HTTP Request but I did not find anything helpful on the form-data component
Thanks in advance for your help and time

What you are seeing is MIME encoded content body. HttpClient is content agnostic and therefore does not provide a means of parsing such content. One can however use Apache Mime4J to do so.

Tumblr API Photo Post Returns 401 (Not Authorized)

I'm attempting to use the Tumblr API in an Android app to authorize users and make text and photo posts. I'm using the Scribe library. So for, I can successfully obtain an access token and use it to get user info. I can also make text posts without any issues. This tells me that I'm signing requests correctly.
However, I've spent the last week and a half attempting to make photo posts without success. I continuously receive 401 errors (Not Authorized) I've read through many posts on the Tumblr support forum as well as here on Stack Overflow, but was unable to find a solution.
I'm reluctant to include the Jumblr library because I'm trying to keep my app as lean as possible. That said, I reviewed the Jumblr code and decided to mimic how photo posts are sent (https://github.com/tumblr/jumblr/blob/master/src/main/java/com/tumblr/jumblr/request/MultipartConverter.java). I'm still receiving the exact same error.
Below is an example my multipart POST request and the response I receive. I've replace the blog name, and OAuth signature, consumer key, and token variables, and have removed the binary image data for brevity sake. Everything else is untouched. I have a few questions...
Are there any other variables that should be included in the
multipart section? A Stack Overflow user stated that placing the
"oauth_" signature variables in there fixed his problem. I didn't
have success with this, but maybe there was something I was missing.
The Jumblr app doesn't appear to do any encoding of the image data,
although the Tumblr documentation states that it should be URL
encoded. Right now I'm sending it as the Jumblr app appears to (raw
binary). Is this correct?
Does anything else in my request look
incorrect?
REQUEST:
NOTE: I learned that the OAuth signature should be generated WITHOUT the multipart form. My code takes that into account when building this request!
POST http://api.tumblr.com/v2/blog/**REMOVED**.tumblr.com/post HTTP/1.1
Content-Type: multipart/form-data, boundary=cbe6b79db1b3cbe6b79e104e
Authorization: OAuth oauth_signature="**REMOVED**", oauth_version="1.0", oauth_nonce="3181201716", oauth_signature_method="HMAC-SHA1", oauth_consumer_key="**REMOVED**", oauth_timestamp="1388791537", oauth_token="**REMOVED**"
Content-Length: 1001
User-Agent: Dalvik/1.6.0 (Linux; U; Android 4.3; SM-N900T Build/JSS15J)
Host: api.tumblr.com
Connection: Keep-Alive
Accept-Encoding: gzip
--cbe6b79db1b3cbe6b79e104e
Content-Disposition: form-data; name="type"
photo
--cbe6b79db1b3cbe6b79e104e
Content-Disposition: form-data; name="caption"
Another pic test...
--cbe6b79db1b3cbe6b79e104e
Content-Disposition: form-data; name="data[0]"; filename="postr_media_file_1388791537-1709648435.jpg"
Content-Type: image/jpeg
---- BINARY DATA REMOVED FOR BREVITY ----
RESPONSE:
HTTP/1.1 401 Not Authorized
Server: nginx
Date: Fri, 03 Jan 2014 23:25:39 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: tmgioct=52c746f34266840643527780; expires=Mon, 01-Jan-2024 23:25:39 GMT; path=/; httponly
P3P: CP="ALL ADM DEV PSAi COM OUR OTRo STP IND ONL"
3c
{"meta":{"status":401,"msg":"Not Authorized"},"response":[]}

I posted the answer in the "Tumblr API Discussion" Google Group. This is what I did:
The key to doing it correctly is NOT just signing without the multipart form!!! Here are the steps...
Add all fields EXCEPT the data field as regular url encoded POST
body variables
Sign the request
Remove ALL off the post variables you just added from the request
Add the multipart form, including the data field this time
Some things to consider...
The Content-Type in the header should be "multipart/form-data"
The Content-Disposition of all form parts should be "form-data" and, of course, include a valid "name" attribute (ie. type, caption, etc...)
The Content-Disposition of the data part should also include a "filename" attribute
The only form part that should contain a Content-Type is data, and it should be set to the mime type of the file you are uploading (ie. "image/jpeg")
I used "data[0]" as the name of the data field. I haven't tested this with just "data", but according to everything I've read it should work that way as well. If you are creating a photo set, I believe you simple add additional parts (ie. data1. data[2], etc...). Again, I haven't tested anything except "data[0]", so do your due diligence!!!
I did NOT encode the binary image data!!! I saw people spending considerable amount of time on this in other posts when adding the image as a POST body variable. If doing this as a multipart form, you can skip the encoding and send raw binary data! ;-)
I hope this helps someone! I've spent two weeks banging my head on random solid objects trying to figure this out. The implementation is very easy to do, but there is zero documentation available on how exactly to build POST requests for photos properly. The official docs really should include that. If I had know what I just posted above I could have completed this in minutes instead of weeks!!!
The last request I posted earlier is still valid, but here it is again. Just remember what I mentioned about the signature!!!
REQUEST:
POST http://api.tumblr.com/v2/blog/REMOVED.tumblr.com/post HTTP/1.1
Content-Type: multipart/form-data, boundary=c60f7c041c02c60f7c046e9b
Authorization: OAuth oauth_signature="***REMOVED***", oauth_version="1.0", oauth_nonce="315351812", oauth_signature_method="HMAC-SHA1", oauth_consumer_key="***REMOVED***", oauth_timestamp="1388785116", oauth_token="***REMOVED***"
Content-Length: 1001
User-Agent: Dalvik/1.6.0 (Linux; U; Android 4.3; SM-N900T Build/JSS15J)
Host: api.tumblr.com
Connection: Keep-Alive
Accept-Encoding: gzip
--c60f7c041c02c60f7c046e9b
Content-Disposition: form-data; name="type"
photo
--c60f7c041c02c60f7c046e9b
Content-Disposition: form-data; name="caption"
Another pic test...
--c60f7c041c02c60f7c046e9b
Content-Disposition: form-data; name="data[0]"; filename="postr_media_file_1388785116-1709648435.jpg"
Content-Type: image/jpeg
***** BINARY DATA REMOVED FOR BREVITY *****
--c60f7c041c02c60f7c046e9b--

windows-1252 character 146 is stopping POST data reaching servlet in glassfish v2

An HTTP POST request is made to my servlet. There is a posted form parameter in the http request that my code in the servlet retrieves for further processing named "payload". When the value of the payload includes the windows-1252 character "’" (ascii value 146), HttpServletRequest instance method getParameter("payload") returns null. There is nothing in the server.log related to the problem. We think the character encoding used to produce this character is windows-1252. The character encoding glassfish defaults to for http requests appears to be ISO-8859-1. Ascii value 146 is a control character in ISO-8859-1.
Does anyone have any suggestions as to how I could solve this problem?
The http request headers in the post that showed the problem are:
POST /dbxchange/TechAnywhere HTTP/1.1
CONTENT_LENGTH: 13117
Content-type: application/x-www-form-urlencoded
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/4.0 (Windows Vista 6.0) Java/1.6.0_16
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 13117

Java doesn't care about the differences between Cp1252 and Latin-1. Since there are no invalid byte sequence in both encoding, you wouldn't get null with either one. I think your server is using UTF-8 and the browser is using Cp1252 or Latin1.
Try to put following attributes in form to see if it helps,
<form action="..." method="post" charset="UTF-8" accept-encoding="UTF-8"...>

We think the character encoding used to produce this character is windows-1252.
Yes, very probably. Even when browsers claim to be using iso-8559-1, they are usually actually using windows-1252.
The character encoding glassfish defaults to for http requests appears to be ISO-8859-1
Most likely it is defaulting to your system's Java ‘default encoding’. This is rarely what you want, as it makes your application break when you redeploy it.
For reading POST request bodies, you should be able to fix the encoding by calling setCharacterEncoding on the request object, as long as you can do it early enough so that no-one has already caused it to read the body by calling methods such as getParameter. Try setting the encoding to "Cp1252". Although really you ought to be aiming for UTF-8 for everything in the long run.
Unfortunately there is not a standard J2EE way to specify what encoding your application expects for all requests (including query string parameters, which are not affected by setCharacterEncoding). Each server has its own way, which creates annoying deployment issues. But for Glassfish, set a <parameter-encoding> in your sun-web.xml.

We have found that the problem is in the javascript code that sends the post request. The javascript code was URL encoding the value of the payload before sending the request. The javascript built-in function escape() was used to do the URL encoding. This was encoding the character to a non standard encoding implementation of %u2019. It appears as though glassfish does not support this non standard form of encoding.
See http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations
The fix was to use the built-in javascript function encodeURI() which returns "%E2%80%99" for ’

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.