UTF-8 works from browser not application

UTF-8 works from browser not application - java

This is the sample url
http://abc.com/ABCServlet/abc?cmd=1&id=123&content=%E8%AE%8A
From the browser i'm receiving 變 which is correct,
But from an application which does a http post using the same url I get è®�. Seems like a double encoding or something, anyone has any ideas?

Since you get three characters, my guess is that you read the input stream without specifying an encoding.
Wrap the stream in InputStreamReader( stream, "UTF-8" ) or, even better, get the encoding from the HTTP header (see the docs of your HTTP framework how to do that).

Related

fix for spot bug - HRS_REQUEST_PARAMETER_TO_HTTP_HEADER

I am executing below code inside servlet and getting this spot bugs - HRS_REQUEST_PARAMETER_TO_HTTP_HEADER
Bug: HTTP parameter directly written to HTTP header output in SSOIdpLogoutRedirect.doPost(HttpServletRequest, HttpServletResponse)
String relayState = request.getParameter("RELAY_STATE");
if(relayState != null)
{
response.sendRedirect(relayState);
}
To fix this bugs i added below code.
relayState = URLEncoder.encode(relayState,StandardCharsets.UTF_8);
But URL does not redirect in correct way as i can see the relaystate url has been changed after encoding
original relaystate = https://sad.ezhdj.net/system/web/apps/dfgh/
and after encoded it is
relaystate =https%3A%2F%2Fsad.ezdev.net%2Fsystem%2Fweb%2Fapps%2Fdfgh%2F`

you should use HttpServletResponse.encodeRedirectURL() to encode redirect urls:
String encodeRedirectURL(String url)
Encodes the specified URL for use in the sendRedirect method or, if
encoding is not needed, returns the URL unchanged. The implementation
of this method includes the logic to determine whether the session ID
needs to be encoded in the URL.
...
All URLs sent to the HttpServletResponse.sendRedirect method should be
run through this method...
this should work:
response.sendRedirect(response.encodeRedirectURL(relayState));
since your url doesn't actually need encoding, output from encodeRedirectURL() will be:
https://sad.ezhdj.net/system/web/apps/dfgh/
and the redirect will work just fine.
edit:
apparently proposed solution still triggers HRS_REQUEST_PARAMETER_TO_HTTP_HEADER spotbug error.
after doing little more research I found out that the error is meant to prevent HTTP response splitting vulnerability (i.e. when unwanted \r\n are written in the header section of http response).
we should then better sanitize relayState against this type of vulnerability.
a simple relayState.replace("\r\n", "") is enough to make the error go away:
response.sendRedirect(response.encodeRedirectURL(relayState.replace("\r\n", "")));

HttpServletRequest.getInputStream() does not unwrap chunked HTTP request

I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?

This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.

OutputStreamWriter encoding vs response content-type

I have an OutputStreamWriter in my Servlet that uses a particular encoding scheme, i.e. I have to use this constructor
OutputStreamWriter(OutputStream out, String charsetName)
Also, I have used the following line of code to set the encoding scheme of the response
response.setContentType("text/html;charset=UTF-8")
Using this output stream I am sending response to the client.
Now in the browser the decoding will be done by which scheme UTF-8 or charsetName.
Can someone explain me why?

The line
OutputStreamWriter(OutputStream out, String charsetName)
tells the writer which charset to use for encoding.
The line
response.setContentType(text/html;charset=UTF-8)
sets the contentType header in the http response and tells the browser which encoding to use for displaying the content.

The browser will handle the content based on the Content-Type header. The charset you use for the OutputStreamWriter only affects how characters written to it are encoded into bytes.

Encoding of Response is incorrect using Apache HttpClient

I am calling a restful service that returns JSON using the Apache HttpClient.
The problem is I am getting different results in the encoding of the response when I run the code on different platforms.
Here is my code:
GetMethod get = new GetMethod("http://urltomyrestservice");
get.addRequestHeader("Content-Type", "text/html; charset=UTF-8");
...
HttpResponse response = httpexecutor.execute(request, conn, context);
response.setParams(params);
httpexecutor.postProcess(response, httpproc, context);
StringWriter writer = new StringWriter();
IOUtils.copy(response.getEntity().getContent(), writer);
When I run this on OSX, asian characters etc return fine e.g. 張惠妹 in the response. But when I run this on a linux server the same code displays the characters as ???
The linux server is an Amazon EC2 instance running Java 1.6.0_26-b03
My local OSX is running 1.6.0_29-b11
Any ideas really appreciated!!!!!

If you look at the javadoc of org.apache.commons.io.IOUtils.copy(InputStream, Writer):
Copy bytes from an InputStream to chars on a Writer using the default
character encoding of the platform.
So that will give different answers depending on the client (which is what you're seeing)
Also, Content-Type is usually a response header (unless you're using POST or PUT). The server is likely to ignore it (though you might have more luck with the Accept-Charset request header).
You need to parse the content type's charset-encoding parameter of the response header, and use that to convert the response into a String (if it's a String you're actually after). I expect Commons HTTP has code that will do that automatically for you. If it doesn't, Spring's RESTTemplate definitely does.

I believe that the problem is not in the HTTP encoding but elsewhere (e.g. while reading or forming the answer). Where do you get the content from and how? Is this stored in a DB or file?

Jersey REST WS - request body UTF-8

I have simple Jersey REST webServices:
#POST
#Path("/label")
#Consumes(MediaType.TEXT_HTML)
public Response setLabels(String requestBody) {
System.out.println(requestBody);
......
}
Request passes some text with "special" non-English characters
[{"За обекта"}]
I can see in Firebug that request passed with correct UTF-8 content and charset
Content-Type text/plain; charset=UTF-8
Though on on server output does not present desirable charset:
[{"?? ??????"}]
Any Idea what and were went wrong? How can I capture text in correct charset on server side?

System.out is a PrintStream. It uses the platform default encoding, which is typically not UTF-8. So you are getting the correct data in, it's just getting mangled when you print it to the console.
I had the exact same problem a few weeks ago - drove me nuts until I figured it out. What made it worse is that I actually had an encoding-related bug in another part of the code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.