My Servlet just won't use UTF-8 for JSON responses.
MyServlet.java:
public class MyServlet extends HttpServlet {
protected void doPost(HttpServletRequest req, HttpServletResponse res) throws Exception {
PrintWriter writer = res.getWriter();
res.setCharacterEncoding("UTF-8");
res.setContentType("application/json; charset=UTF-8");
writer.print(getSomeJson());
}
}
But special characters aren't showing up, and when I check the headers that I'm getting back in Firebug, I see Content-Type: application/json;charset=ISO-8859-1.
I did a grep -ri iso . in my Servlet directory, and came up with nothing, so nowhere am I explicitly setting the type to ISO-8859-1.
I should also specify that I'm running this on Tomcat 7 in Eclipse with a J2EE target as a development environment, with Solaris 10 and whatever they call their web server environment (somebody else admins this) as the production environment, and the behavior is the same.
I've also confirmed that the request submitted is UTF-8, and only the response is ISO-8859-1.
Update
I have amended the code to reflect that I am calling PrintWriter before I set the character encoding. I omitted this from my original example, and now I realize that this was the source of my problem. I read here that you have to set character encoding before you call HttpServletResponse.getWriter(), or getWriter will set it to ISO-8859-1 for you.
This was my problem. So the above example should be adjusted to
public class MyServlet extends HttpServlet {
protected void doPost(HttpServletRequest req, HttpServletResponse res) throws Exception {
res.setCharacterEncoding("UTF-8");
res.setContentType("application/json");
PrintWriter writer = res.getWriter();
writer.print(getSomeJson());
}
}
Once the encoding is set for a response, it cannot be changed.
The easiest way to force UTF-8 is to create your own filter which is the first to peek at the response and set the encoding.
Take a look at how Spring 3.0 does this. Even if you can't use Spring in your project, maybe you can get some inspiration (make sure your company policy allows you to get inspiration from open source licenses).
The code looks fine. Either you're not running the code you think you're running, or there's some Filter or proxy somewhere in the request-response chain which modifies the content type like that.
Aside from specific problem, you really should consider getting output stream, using JSON library to write contents directly as UTF-8 encoded JSON; there is no benefit to using writers.
Some JSON packages only work with strings, which is unfortunate, but most allow using more efficient streams (safer and more efficient as parser/generator can handle escaping and encoding aspects together).
Related
After a lot of trial and error I still can't figure out the problem. The JSP, servlet, and database are all set to accept UTF-8 encoding, but even still whenever I use request.getParameter on anything that has any two-byte characters like the em dash they get scrambled up as broken characters.
I've made manual submissions to the database and it's able to accept these characters, no problem. And if I pull the text from the database in a servlet and print it in my jsp page's form it displays no problem.
The only time I've found that it comes back as broken characters is when I try and display it elsewhere after retrieving it using request.getParameter.
Has anyone else had this problem? How can I fix it?
That can happen if request and/or response encoding isn't properly set at all.
For GET requests, you need to configure it at the servletcontainer level. It's unclear which one you're using, but for in example Tomcat that's to be done by URIEncoding attribute in <Connector> element in its /conf/server.xml.
<Connector ... URIEncoding="UTF-8">
For POST requests, you need to create a filter which is mapped on the desired URL pattern covering all those POST requests. E.g. *.jsp or even /*. Do the following job in doFilter():
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
For HTML responses and client side encoding of submitted HTML form input values, you need to set the JSP page encoding. Add this to top of the JSP (you've probably already done it properly given the fact that displaying UTF-8 straight form DB works fine).
<%#page pageEncoding="UTF-8" %>
Or to prevent copypasting this over every single JSP, configure it once in web.xml:
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
For source code files and stdout (IDE console), you need to set the IDE workspace encoding. It's unclear which one you're using, but for in example Eclipse that's to be done by setting Window > Preferences > General > Workspace > Text File Encoding to UTF-8.
Do note that HTML <meta http-equiv> tags are ignored when page is served over HTTP. It's only considered when page is opened from local disk file system via file://. Also specifying <form accept-charset> is unnecessary as it already defaults to response encoding used during serving the HTML page with the form. See also W3 HTML specification.
See also:
Unicode - How to get the characters right?
Why does POST not honor charset, but an AJAX request does? tomcat 6
HTML : Form does not send UTF-8 format inputs
Unicode characters in servlet application are shown as question marks
Bad UTF-8 encoding when writing to database (reading is OK)
BalusC's answer is correct but I just want to add it is important (for POST method of course) that
request.setCharacterEncoding("UTF-8");
is called before you read any parameter. This is how reading parameter is implemented:
#Override
public String getParameter(String name) {
if (!parametersParsed) {
parseParameters();
}
return coyoteRequest.getParameters().getParameter(name);
}
As you can see there is a flag parametersParsed that is set when you read any parameter for the first time, parseParameters() method with parse all the request's parameters and set the encoding.
Calling:
request.setCharacterEncoding("UTF-8");
after the parameters were parsed will have no effect! That is why some people are complaining that setting the request's encoding is not working.
Most answers here suggest to use servlet filter and set the character encoding there. This is correct but also be aware that some security libraries can read request parameters before your filter (this was my case) so if your filter is executed after that the character encoding of request parameters are already set and setting UTF-8 or any other will have no effect.
The Tomcat FAQ covers this topic pretty well. Particularly:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
and http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q4
The test JSP given in the FAQ is essentially the one I used when going through Tomcat years ago fixing various encoding issues.
Just want to add a point that in case anyone else made the same mistake as me where i overlooked POST method
Read all these solutions and applied to my code but it still didnt work because i forgot to add method="POST" in my <form> tag
Use a Filter as stated here: https://www.baeldung.com/tomcat-utf-8
P.S. If you are using JDK 8 (which doesn't have default methods) you can easily work it out defining empty methods "init" and "destroy:
package sample;
import javax.servlet.*;
import java.io.IOException;
public class CharacterSetFilter implements Filter {
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
}
public void destroy() {
}
}
then, in web.xml:
<filter>
<filter-name>CharacterSetFilter</filter-name>
<filter-class>sample.CharacterSetFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>CharacterSetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
I have a default installation of Tomcat 8.5.6. It seems like UTF-8 encoded requests are not being interpreted correctly, even though the docs say the default (if not in strict mode) should be UTF-8 everywhere these days. My java POST requests look like:
HttpPost post = new HttpPost(url);
post.setEntity(new UrlEncodedFormEntity(nameValuePairs, HTTP.UTF_8));
...
Testing, I see the tilde character ñ is not decoded correctly in my servlet handler:
public class MyServlet extends HttpServlet {
protected void doPost(HttpServletRequest request, ...) {
String tildeTest = request.getParam("foo"); // no good.
}
}
if I explicitly set the encoding on the request before access, it decodes properly:
protected void doPost(HttpServletRequest request, ...) {
request.setCharacterEncoding("UTF-8");
String tildeTest = request.getParam("foo"); // works!
...
}
so I'm not sure if:
Tomcat 8.5.6 is not really using UTF-8 everywhere, and I need to set that manually in the config files somewhere.
My http request is missing some header which tells Tomcat which encoding to use - perhaps the http post is defaulting to some other encoding which Tomcat is just honoring.
Anyone know which one?
Thanks
https://wiki.apache.org/tomcat/FAQ/CharacterEncoding
POST requests should specify the encoding of the parameters and values
they send. Since many clients fail to set an explicit encoding, the
default is used (ISO-8859-1).
What can you recommend to just make everything work? (How to use UTF-8 everywhere).
There are 6 ways listed to ensure this, for servlet requests 1,2 should be relevant
Set URIEncoding="UTF-8" on your in server.xml. References: HTTP Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
When run the below code and the output I receive is "in", but the response .sendRedirect() does not run. The two java servlet files "Class1" and "Servlet1" are in the same folder.
public class Class1 extends HttpServlet {
public void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException, ServletException {
response.setContentType("text/html; charset=ISO-8859-7");
PrintWriter out = new PrintWriter(response.getWriter(), true);
ArrayList list2 = (ArrayList)request.getAttribute("list_lo");
if (list2 == null || list2.isEmpty() ) {
out.println("in");
response.sendRedirect("Servlet1");
return;}
}
}
Try to put the name of the file with the extension instead of the class name, sometimes happened to me, and just by doing this worked
You cannot send redirect once you start returning any output.
You need to put the logic handling any possible redirects before any output is started (e.g. out.println() in your example). Putting the redirect logic at the very beginning of the method is a sensible thing to do anyways, since it should be the first thing you decide.
The reason why redirecting after output is started lies in the HTTP protocol itself - redirect is transmitted using response headers, which are separated from response body by a blank line. Once you start writing the response body, there's no way to transmit any more headers anymore (apart from the started output being still in buffer if you're lucky).
I have a small servlet returning several html pages. The content of one of these pages is pretty complex, but changes only every hour or so. However, it is requested often by users. I want to avoid recomputing it at each request.
I was wondering whether it is possible to prepare a gzip-ed version in memory (byte array), and set it as the response to all HTML requests for this page. I would also recompute a new cached gzip-ed version every hour.
If this is possible, how can I do this? Should I use a filter? For the sake of this question, we can assume that all browsers can handle gzip-ed responses. I am looking for a code example.
After quite some googling, this seems to be the solution:
public class MyFilter implements Filter {
private byte[] my_gzipped_page = ....
public void doFilter(ServletRequest req, ServletResponse res,
FilterChain chain) throws IOException, ServletException {
if (req instanceof HttpServletRequest) {
HttpServletRequest request = (HttpServletRequest) req;
HttpServletResponse response = (HttpServletResponse) res;
String ae = request.getHeader("accept-encoding");
if (ae != null && ae.indexOf("gzip") != -1) {
response.addHeader("Content-Length",
Integer.toString(my_gzipped_page.length));
response.addHeader("Content-Encoding", "gzip");
OutputStream output = response.getOutputStream();
output.write(my_gzipped_page);
output.flush();
output.close();
return;
} else ...
}
}
...
}
Why doing it the hard way?
Open Tomcat's /conf/server.xml, lookup the <Connector> for your HTTP port and edit it as follows to add a new attribute:
<Connector ... compression="on">
Tomcat will then GZIP all responses matching compressableMimeType automagically when the client supports it. All other self-respected webservers have a similar configuration setting.
It's not really clear from your question but I'm guessing you're looking for info on how to cache data, rather than how to serve compressed data. Most web servers will automatically compress data if configured to do so, and if the client supplies the necessary headers in the request. In other words, you don't need to compress the page before transmitting it, the server will compress it automatically if possible.
For caching, you can store a processed version of the page either on disk or in memory using for example memcache.
If you know that you'll only need to update the page, say, every hour, you can run a script, for example with crontab, to generate the page every hour and just serve the generated page. That should be fairly straight forward as you don't really need to make special considerations as far as caching on the server side.
On the other hand, if you need to check if the page is stale before deciding wether to use the cached version or a fresh one, it gets a little more complex. For example it's possible that checking if the data is stale is almost as costly as generating the page.
Can't really give a more specific answer without more details.
I had recently a problem with encoding of websites generated by servlet, that occurred if the servlets were deployed under Tomcat, but not under Jetty. I did a little bit of research about it and simplified the problem to the following servlet:
public class TestServlet extends HttpServlet implements Servlet {
#Override
public void service(HttpServletRequest request, HttpServletResponse response) throws IOException {
response.setContentType("text/plain");
Writer output = response.getWriter();
output.write("öäüÖÄÜß");
output.flush();
output.close();
}
}
If I deploy this under Jetty and direct the browser to it, it returns the expected result. The data is returned as ISO-8859-1 and if I take a look into the headers, then Jetty returns:
Content-Type: text/plain; charset=iso-8859-1
The browser detects the encoding from this header. If I deploy the same servlet in Tomcat, the browser shows up strange characters. But Tomcat also returns the data as ISO-8859-1, the difference is, that no header tells about it. So the browser has to guess the encoding, and that goes wrong.
My question is, is that behaviour of Tomcat correct or a bug? And if it is correct, how can I avoid this problem? Sure, I can always add response.setCharacterEncoding("UTF-8"); to the servlet, but that means I set a fixed encoding, that the browser might or might not understand. The problem is more relevant, if no browser but another service accesses the servlet. So how I should deal with the problem in the most flexible way?
If you don't specify an encoding, the Servlet specification requires ISO-8859-1. However, AFAIK it does not require the container to set the encoding in the content type, at least not if you set it to "text/plain". This is what the spec says:
Calls to setContentType set the
character encoding only if the given
content type string provides a value
for the charset attribute.
In other words, only if you set the content type like this
response.setContentType("text/plain; charset=XXXX")
Tomcat is required to set the charset. I haven't tried whether this works though.
In general, I would recommend to always set the encoding to UTF-8 (as it causes the least amount of trouble, at least in browsers) and then, for text/plain, state the encoding explicitly, to prevent browsers from using a system default.
In support of Jesse Barnum's answer, the apache Wiki suggests that a filter can be used to control the character encoding of the request and the response. However, Tomcat 5.5 and up come bundled with a SetCharacterEncodingFilter so it may be better to use apache's implementation than to use Jesse's (no offense Jesse). The tomcat implementations only set the character encoding on the request, so modification may be necessary to use the filter as a means of setting the character set on the response of all servlets.
Specifically, Tomcat has implementations examples here:
5.x
webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
6.x
webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
7.x
Since 7.0.20 the filter became first-class citizen and was moved from the examples into core Tomcat and is available to any web application without the need to compile and bundle it separately. See documentation for the list of filters provided by Tomcat. The class name is:
org.apache.catalina.filters.SetCharacterEncodingFilter
This page tells more: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q3
Here's a filter that I wrote to force UTF-8 encoding:
public class CharacterEncodingFilter implements Filter {
private static final Logger log = Logger.getLogger( CharacterEncodingFilter.class.getName() );
boolean isConnectorConfigured = false;
public void init( FilterConfig filterConfig ) throws ServletException {}
public void doFilter( ServletRequest request, ServletResponse response, FilterChain chain ) throws IOException, ServletException {
request.setCharacterEncoding( "utf-8" );
response.setCharacterEncoding( "utf-8" );
if( ! isConnectorConfigured ) {
isConnectorConfigured = true;
try { //I need to do all of this with reflection, because I get NoClassDefErrors otherwise. --jsb
Field f = request.getClass().getDeclaredField( "request" ); //Tomcat wraps the real request in a facade, need to get it
f.setAccessible( true );
Object req = f.get( request );
Object connector = req.getClass().getMethod( "getConnector", new Class[0] ).invoke( req ); //Now get the connector
connector.getClass().getMethod( "setUseBodyEncodingForURI", new Class[] {boolean.class} ).invoke( connector, Boolean.TRUE );
} catch( NoSuchFieldException e ) {
log.log( Level.WARNING, "Servlet container does not seem to be Tomcat, cannot programatically alter character encoding. Do this in the Server.xml <Connector> attribute instead." );
} catch( Exception e ) {
log.log( Level.WARNING, "Could not setUseBodyEncodingForURI to true on connector" );
}
}
chain.doFilter( request, response );
}
public void destroy() {}
}
If you don't specify the encoding, Tomcat is free to encode your characters however it feels, and the browser is free to guess what encoding Tomcat picked. You are correct in that the way to solve the problem is response.setCharacterEncoding("UTF-8").
You shouldn't worry about the chance that the browser won't understand the encoding, as virtually all browsers released in the past 10 years support UTF-8. Though if you're really worried, you can inspect the "Accept-Encoding" headers provided by the user agent.