OutputStreamWriter encoding vs response content-type

OutputStreamWriter encoding vs response content-type - java

I have an OutputStreamWriter in my Servlet that uses a particular encoding scheme, i.e. I have to use this constructor
OutputStreamWriter(OutputStream out, String charsetName)
Also, I have used the following line of code to set the encoding scheme of the response
response.setContentType("text/html;charset=UTF-8")
Using this output stream I am sending response to the client.
Now in the browser the decoding will be done by which scheme UTF-8 or charsetName.
Can someone explain me why?

The line
OutputStreamWriter(OutputStream out, String charsetName)
tells the writer which charset to use for encoding.
The line
response.setContentType(text/html;charset=UTF-8)
sets the contentType header in the http response and tells the browser which encoding to use for displaying the content.

The browser will handle the content based on the Content-Type header. The charset you use for the OutputStreamWriter only affects how characters written to it are encoded into bytes.

Related

How to read HTTP response headers with specified charset encoding when using HttpClient

There is a Chinese word in the response header. I must read the headers with the UTF-8 character encoding. But I don't know how to set this encoding in httpclient. How do I tell it which character encoding to use for headers?

If you're using Apache HttpClient, it must be taken care when you convert the response to String using getResponseBodyAsString
If the response is known to be a String, you can use the
getResponseBodyAsString method which will automatically use the
encoding specified in the Content-Type header or ISO-8859-1 if no
charset is specified.
See more details here.

RFC 7230, the standard for HTTP, notes:
Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset, supporting other charsets only
through use of RFC2047 encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset.
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.
So, how do you know the header field is encoded with UTF-8? I'm guesssing that the server has not encoded the header value using RFC2047. In that case, your client program should not be trying to interpret the header value as UTF-8 text, but should instead treat it as opaque data.

How do I send an HTTP response without Transfer Encoding: chunked?

I have a Java Servlet that responds to the Twilio API. It appears that Twilio does not support the chunked transfer that my responses are using. How can I avoid using Transfer-Encoding: chunked?
Here is my code:
// response is HttpServletResponse
// xml is a String with XML in it
response.getWriter().write(xml);
response.getWriter().flush();
I am using Jetty as the Servlet container.

I believe that Jetty will use chunked responses when it doesn't know the response content length and/or it is using persistent connections. To avoid chunking you either need to set the response content length or to avoid persistent connections by setting "Connection":"close" header on the response.

Try setting the Content-length before writing to the stream. Don't forget to calculate the amount of bytes according to the correct encoding, e.g.:
final byte[] content = xml.getBytes("UTF-8");
response.setContentLength(content.length);
response.setContentType("text/xml"); // or "text/xml; charset=UTF-8"
response.setCharacterEncoding("UTF-8");
final OutputStream out = response.getOutputStream();
out.write(content);

The container will decide itself to use Content-Length or Transfer-Encoding basing on the size of data to be written by using Writer or outputStream. If the size of the data is larger than the HttpServletResponse.getBufferSize(), then the response will be trunked. If not, Content-Length will be used.
In your case, just remove the 2nd flushing code will solve your problem.

Charset filter causing issue in parsing UTF-8 characters

I am using Spring MVC's charset filter. This is the URL that I use to invoke my servlet from my applet
http://192.168.0.67/MyServlet?p1=団
As you can see, the parameter has a unicode character 団. So I use
URLEncoder.encode("団", "UTF-8");
and now my URL becomes
http://192.168.0.67/MyServlet?p1=%E5%9B%A3
However, from the servlet, calling
request.getParameter("p1");
already return some gibberish that cannot be decoded with URLDecoder. BTW, invoking
URLDecoder.decode("%E5%9B%A3", "UTF-8");
does give me the original unicode character. It's just that the servlet has garbled the parameter before it can even be decoded. Does anyone know why? request.getParameter() doesn't decode parameter with UTF-8?

The Spring MVC's charset filter will only set the request body encoding, not the request URI encoding. You need to set the charset for the URI encoding in the servletcontainer configuration. Lot of servletcontainers default to ISO-8859-1 to decode the URI. It's unclear what servletcontainer you're using, so here's just an example for Tomcat: edit the <Connector> entry of /conf/server.xml to add URIEncoding="UTF-8":
<Connector ... URIEncoding="UTF-8">
If you can't edit the server's configuration for some reason (e.g. 3rd party hosting and such), then you should consider to use POST instead of GET:
String query = "p1=" + URLEncoder.encode("団", "UTF-8");
URLConnection connection = new URL(getCodeBase(), "MyServlet").openConnection();
connection.setDoOutput(true); // This sets request method to POST.
connection.getOutputStream().write(query.getBytes("UTF-8"));
// ...
This way you can in doPost() use ServletRequest#setCharacterEncoding() to tell the Servlet API what charset to use to parse the request body (or just rely on the Spring MVC's charset filter from doing this job):
request.setCharacterEncoding("UTF-8");
String p1 = request.getParameter("p1"); // You don't need to decode yourself!
// ...
See also:
Unicode - How to get the characters right?

UTF-8 works from browser not application

This is the sample url
http://abc.com/ABCServlet/abc?cmd=1&id=123&content=%E8%AE%8A
From the browser i'm receiving 變 which is correct,
But from an application which does a http post using the same url I get è®�. Seems like a double encoding or something, anyone has any ideas?

Since you get three characters, my guess is that you read the input stream without specifying an encoding.
Wrap the stream in InputStreamReader( stream, "UTF-8" ) or, even better, get the encoding from the HTTP header (see the docs of your HTTP framework how to do that).

Bad encoding of streamed CSV with Stripes / Tomcat

Actually i'm trying to stream a CSV file. I set the encoding to windows-1252 but it seems it is still streamed as UTF-8 file.
final String encoding = "windows-1252";
exportResolution = new StreamingResolution(builder.getContentType() + ";charset=" + encoding.toLowerCase()) {
#Override
public void stream(HttpServletResponse response) throws Exception {
// Set response headers
response.setHeader("Cache-control", "private, max-age=0");
response.setCharacterEncoding(encoding);
OutputStream os = response.getOutputStream();
writeExportStream(os,builder);
}
}.setFilename(filename);
writeExportStream just streams the content to the outputstream (with pagination and db calls, it takes some time)
It doesn't work in local (jetty plugin) + dev (tomcat) Neither with firefox / chrome
I've not tested but people at work told me that it works better when we don't stream the content but we write the file in one time after having loaded all the objets we want from db.
Anybody know what is happening? Thanks
Btw my headers:
HTTP/1.1 200 OK
Content-Language: fr-FR
Content-Type: text/csv;charset=windows-1252
Content-Disposition: attachment;filename="export_rshop_01-02-11.csv"
Cache-Control: private, max-age=0
Transfer-Encoding: chunked
Server: Jetty(6.1.14)
I want the file to be able to be imported in excel in windows-1252 but i can't, it just open in utf8 while my header is windows-1252

The problem lies in the writeExportStream(os,builder); method. We can't see what encoding operations it is performing, but I'm guessing it is writing UTF-8 data.
The output operation needs to perform two encoding tasks:
Tell the client what encoding the response text is in (via the headers)
Encode the data writen to the client in a matching encoding (e.g. via a writer)
Step 1 is being done correctly. Step 2 is probably the source of the error.
If you use the provided writer, it will encode character data in the appropriate response encoding.
If pre-encoded data is written via the raw byte stream (getOutputStream()), you need to make sure this process uses the same encoding.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

OutputStreamWriter encoding vs response content-type - java

The line OutputStreamWriter(OutputStream out, String charsetName) tells the writer which charset to use for encoding. The line response.setContentType(text/html;charset=UTF-8) sets the contentType header in the http response and tells the browser which encoding to use for displaying the content.

The browser will handle the content based on the Content-Type header. The charset you use for the OutputStreamWriter only affects how characters written to it are encoded into bytes.

Related

How to read HTTP response headers with specified charset encoding when using HttpClient

How do I send an HTTP response without Transfer Encoding: chunked?

Charset filter causing issue in parsing UTF-8 characters

UTF-8 works from browser not application

Bad encoding of streamed CSV with Stripes / Tomcat

Categories

Resources