I'm having some problem with java servlet's getParameter() which does not decode param even though I've set Tomcat's encoding properly in server.xml.
<Connector port.. URIEncoding="UTF-8"/>
If I decode raw query I get the decoded query but getParamter does not decode by itself!
protected void service(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
System.out.println("CharacterEncoding: "+ request.getCharacterEncoding());
System.out.println("Query String: "+ URLDecoder.decode(request.getQueryString(), "UTF-8");
System.out.println("Query param name: "+request.getParameter("name"));
...
The result I get is as follows:
CharacterEncoding: UTF-8
Query String: name=日本語一番ぜソFOX_&'">•«Ç€Ö™»_αß_iİıI_Администратор_cœur d´Ouchy_𠀀𠄂𪛖_عربي
Query param name: æ¥æ¬èªä¸çªãã½ï¼¦ï¼¯ï¼¸_&'">â¢Â«Ãâ¬Ãâ¢Â»_αÃ_iİıI_ÐдминиÑÑÑаÑоÑ_cÅur d´Ouchy_ð ð ðª_عربÙ
you can clearly see the query and name's value are not same !
In my jsp page I'm using <%#page contentType="text/html; charset=UTF-8" %>
I understand that this concerns a GET request. Setting <Connector URIEncoding="UTF-8"> should do it. That it doesn't work can only mean that you're running Tomcat from inside an IDE like Eclipse and that the IDE isn't been configured to take over Tomcat's own configuration while you've edited Tomcat's own configuration in /conf/server.xml.
It's unclear which IDE you're using, but if it were Eclipse, you'd need to either edit the server.xml file in the workspace's Servers project instead, not Tomcat's own /conf/server.xml file
Or configure Eclipse to take control of Tomcat's installation by doubleclicking the Tomcat server entry in Servers view and changing the Server Locations section accordingly.
Back to your investigation/fixing attempts: the request.getCharacterEncoding() isn't been used to decode GET query strings (as that's beyond the control of the Servlet API), it's only been used to decode POST request bodies. The <%#page pageEncoding="UTF-8"%> will only set the character encoding of the response and the subsequent form submits.
See also
Unicode - How to get the characters right?
Related
I have a JSP page that show data in some formatted way. the browser can call spring showInfo.do and it is forward to that JSP.
i.e.
public showInfo(HttpServletRequest request, HttpServletResponse response) {
RequestDispatcher rd = getServletContext().getRequestDispatcher("info.jsp");
dispatcher.forward(request,response);
}
The output of the JSP is html.
Now I want to save this JSP output manually from my java server side code (not in a servlet context), something like this:
void saveInfo() {
params.setParameter("info1", "data");
String responseStr = Invoke("info.jsp", params);
//save responseStr to disk
}
I want to be able to save the html page on disk from a service and make it look the same as a user can see it from a browser. So if the server is offline a user can double click on the saved html file and see in his browser the last info.
Any idea how this can be done?
Oups. The servlet specification requires the servlet container to be able to execute a JSP file. This is commonly done by converting the JSP to plain Java and the generating a servlet class file.
If you are outside of a servlet container you must:
* either fully implement a JSP execution environment, for example by using sources from a servlet container like Tomcat
* or rely on a servlet container to convert the JSP file to a .java or .class servlet and then use the Servlet interface methods on it
Alternatively, you could try to use a headless browser to capture the output of the application.
After a lot of trial and error I still can't figure out the problem. The JSP, servlet, and database are all set to accept UTF-8 encoding, but even still whenever I use request.getParameter on anything that has any two-byte characters like the em dash they get scrambled up as broken characters.
I've made manual submissions to the database and it's able to accept these characters, no problem. And if I pull the text from the database in a servlet and print it in my jsp page's form it displays no problem.
The only time I've found that it comes back as broken characters is when I try and display it elsewhere after retrieving it using request.getParameter.
Has anyone else had this problem? How can I fix it?
That can happen if request and/or response encoding isn't properly set at all.
For GET requests, you need to configure it at the servletcontainer level. It's unclear which one you're using, but for in example Tomcat that's to be done by URIEncoding attribute in <Connector> element in its /conf/server.xml.
<Connector ... URIEncoding="UTF-8">
For POST requests, you need to create a filter which is mapped on the desired URL pattern covering all those POST requests. E.g. *.jsp or even /*. Do the following job in doFilter():
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
For HTML responses and client side encoding of submitted HTML form input values, you need to set the JSP page encoding. Add this to top of the JSP (you've probably already done it properly given the fact that displaying UTF-8 straight form DB works fine).
<%#page pageEncoding="UTF-8" %>
Or to prevent copypasting this over every single JSP, configure it once in web.xml:
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
For source code files and stdout (IDE console), you need to set the IDE workspace encoding. It's unclear which one you're using, but for in example Eclipse that's to be done by setting Window > Preferences > General > Workspace > Text File Encoding to UTF-8.
Do note that HTML <meta http-equiv> tags are ignored when page is served over HTTP. It's only considered when page is opened from local disk file system via file://. Also specifying <form accept-charset> is unnecessary as it already defaults to response encoding used during serving the HTML page with the form. See also W3 HTML specification.
See also:
Unicode - How to get the characters right?
Why does POST not honor charset, but an AJAX request does? tomcat 6
HTML : Form does not send UTF-8 format inputs
Unicode characters in servlet application are shown as question marks
Bad UTF-8 encoding when writing to database (reading is OK)
BalusC's answer is correct but I just want to add it is important (for POST method of course) that
request.setCharacterEncoding("UTF-8");
is called before you read any parameter. This is how reading parameter is implemented:
#Override
public String getParameter(String name) {
if (!parametersParsed) {
parseParameters();
}
return coyoteRequest.getParameters().getParameter(name);
}
As you can see there is a flag parametersParsed that is set when you read any parameter for the first time, parseParameters() method with parse all the request's parameters and set the encoding.
Calling:
request.setCharacterEncoding("UTF-8");
after the parameters were parsed will have no effect! That is why some people are complaining that setting the request's encoding is not working.
Most answers here suggest to use servlet filter and set the character encoding there. This is correct but also be aware that some security libraries can read request parameters before your filter (this was my case) so if your filter is executed after that the character encoding of request parameters are already set and setting UTF-8 or any other will have no effect.
The Tomcat FAQ covers this topic pretty well. Particularly:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
and http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q4
The test JSP given in the FAQ is essentially the one I used when going through Tomcat years ago fixing various encoding issues.
Just want to add a point that in case anyone else made the same mistake as me where i overlooked POST method
Read all these solutions and applied to my code but it still didnt work because i forgot to add method="POST" in my <form> tag
Use a Filter as stated here: https://www.baeldung.com/tomcat-utf-8
P.S. If you are using JDK 8 (which doesn't have default methods) you can easily work it out defining empty methods "init" and "destroy:
package sample;
import javax.servlet.*;
import java.io.IOException;
public class CharacterSetFilter implements Filter {
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
}
public void destroy() {
}
}
then, in web.xml:
<filter>
<filter-name>CharacterSetFilter</filter-name>
<filter-class>sample.CharacterSetFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>CharacterSetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
I have a default installation of Tomcat 8.5.6. It seems like UTF-8 encoded requests are not being interpreted correctly, even though the docs say the default (if not in strict mode) should be UTF-8 everywhere these days. My java POST requests look like:
HttpPost post = new HttpPost(url);
post.setEntity(new UrlEncodedFormEntity(nameValuePairs, HTTP.UTF_8));
...
Testing, I see the tilde character ñ is not decoded correctly in my servlet handler:
public class MyServlet extends HttpServlet {
protected void doPost(HttpServletRequest request, ...) {
String tildeTest = request.getParam("foo"); // no good.
}
}
if I explicitly set the encoding on the request before access, it decodes properly:
protected void doPost(HttpServletRequest request, ...) {
request.setCharacterEncoding("UTF-8");
String tildeTest = request.getParam("foo"); // works!
...
}
so I'm not sure if:
Tomcat 8.5.6 is not really using UTF-8 everywhere, and I need to set that manually in the config files somewhere.
My http request is missing some header which tells Tomcat which encoding to use - perhaps the http post is defaulting to some other encoding which Tomcat is just honoring.
Anyone know which one?
Thanks
https://wiki.apache.org/tomcat/FAQ/CharacterEncoding
POST requests should specify the encoding of the parameters and values
they send. Since many clients fail to set an explicit encoding, the
default is used (ISO-8859-1).
What can you recommend to just make everything work? (How to use UTF-8 everywhere).
There are 6 ways listed to ensure this, for servlet requests 1,2 should be relevant
Set URIEncoding="UTF-8" on your in server.xml. References: HTTP Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
I am having problem with redirect in jsp , the page just remains and doesn't throw any error.
I am able to do redirect when I direct write the script in my login.jsp like
<%
String redirectURL = "/client/index.jsp";
response.sendRedirect(redirectURL);
%>
<t:login title="Client Login">
..........
</t:login>
But I am unable to do redirect when I split the file into three and include it. below is my implementation.
login.jsp
<%#include file="/include/checkhandler.jsp"%>
checkhandler.jsp - this is a script that will check for file in handler folder and include it when it is exist.
......
request.getRequestDispatcher(handler).include(request, response);
......
login_handler.jsp this is the file the dispatcher will include
String redirectURL = "/client/index.jsp";
response.sendRedirect(redirectURL);
out.println("hello world");
After I execute this script , the hello world displayed but it is still stay at the same page without any error.
You need to use RequestDispatcher#forward() instead. Change your checkhandler.jsp to
request.getRequestDispatcher(handler).forward(request, response);
A server-side include is prohibited to change the response status code which is what happens when you use sendRedirect(). Any such attempt is simply ignored by the container.
From the RequestDispatcher#include() docs:
The ServletResponse object has its path elements and parameters remain
unchanged from the caller's. The included servlet cannot change the
response status code or set headers; any attempt to make a change is
ignored.
This limitation is by design. The spec treats the web component being included as a guest i.e. it cannot direct the flow and any such attempts would be rightly ignored instead of throwing an exception to possibly allow an include for any servlet that you may have.
Only the hosting web component (the one doing an include) would be in complete control of the flow as well as what response headers are sent over to the client.
You have this in your code
out.println("hello world");
String redirectURL = "/client/index.jsp";
response.sendRedirect(redirectURL);
which will not work because you cannot redirect after writing to the response stream. The redirect is sent in the response header. The response body should not contain any html.
This question already has answers here:
How to pass Unicode characters as JSP/Servlet request.getParameter?
(5 answers)
Closed 6 years ago.
I have such a link in JSP page with encoding big5
http://hello/world?name=婀ㄉ
And when I input it in browser's URL bar, it will be changed to something like
http://hello/world?name=%23%24%23
And when we want to get this parameter in jsp page, all the characters are corrupted.
And we have set this:
request.setCharacterEncoding("UTF-8"), so all the requests will be converted to UTF8.
But why in this case, it doesn't work ?
Thanks in advance!.
When you enter the URL in browser's address bar, browser may convert the character encoding before URL-encoding. However, this behavior is not well defined, see my question,
Handling Character Encoding in URI on Tomcat
We mostly get UTF-8 and Latin-1 on newer browsers but we get all kinds of encodings (including Big5) in old ones. So it's best to avoid non-ASCII characters in URL entered by user directly.
If the URL is embedded in JSP, you can force it into UTF-8 by generating it like this,
String link = "http://hello/world?name=" + URLEncoder.encode(name, "UTF-8");
On Tomcat, the encoding needs to be specified on Connector like this,
<Connector port="8080" URIEncoding="UTF-8"/>
You also need to use request.setCharacterEncoding("UTF-8") for body encoding but it's not safe to set this in servlet because this only works when the parameter is not processed but other filter or valve may trigger the processing. So you should do it in a filter. Tomcat comes with such a filter in the source distribution.
To avoid fiddling with the server.xml use :
protected static final String CHARSET_FOR_URL_ENCODING = "UTF-8";
protected String encodeString(String baseLink, String parameter)
throws UnsupportedEncodingException {
return String.format(baseLink + "%s",
URLEncoder.encode(parameter, CHARSET_FOR_URL_ENCODING));
}
// Used in the servlet code to generate GET requests
response.sendRedirect(encodeString("userlist?name=", name));
To actually get those parameters on Tomcat you need to do something like :
final String name =
new String(request.getParameter("name").getBytes("iso-8859-1"), "UTF-8");
As apparently (?) request.getParameter URLDecodes() the string and interprets it as iso-8859-1 - or whatever the URIEncoding is set to in the server.xml. For an example of how to get the URIEncoding charset from the server.xml for Tomcat 7 see here
You cannot have non-ASCII characters in an URL - you always need to percent-encode them. When doing so, browsers have difficulties rendering them. Rendering works best if you encode the URL in UTF-8, and then percent-encode it. For your specific URL, this would give http://hello/world?name=%E5%A9%80%E3%84%89 (check your browser what it gives for this specific link). When you get the parameter in JSP, you need to explicitly unquote it, and then decode it from UTF-8, as the browser will send it as-is.
I had a problem with JBoss 7.0, and I think this filter solution also works with Tomcat:
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
HttpServletResponse httpResponse = (HttpServletResponse) response;
try {
httpRequest.setCharacterEncoding(MyAppConfig.getAppSetting("System.Character.Encoding"));
String appServer = MyAppConfig.getAppSetting("System.AppServer");
if(appServer.equalsIgnoreCase("JBOSS7")) {
Field requestField = httpRequest.getClass().getDeclaredField("request");
requestField.setAccessible(true);
Object requestValue = requestField.get(httpRequest);
Field coyoteRequestField = requestValue.getClass().getDeclaredField("coyoteRequest");
coyoteRequestField.setAccessible(true);
Object coyoteRequestValue = coyoteRequestField.get(requestValue);
Method getParameters = coyoteRequestValue.getClass().getMethod("getParameters");
Object parameters = getParameters.invoke(coyoteRequestValue);
Method setQueryStringEncoding = parameters.getClass().getMethod("setQueryStringEncoding", String.class);
setQueryStringEncoding.invoke(parameters, MyAppConfig.getAppSetting("System.Character.Encoding"));
Method setEncoding = parameters.getClass().getMethod("setEncoding", String.class);
setEncoding.invoke(parameters, MyAppConfig.getAppSetting("System.Character.Encoding"));
}
} catch (NoSuchMethodException nsme) {
System.err.println(nsme.getLocalizedMessage());
nsme.printStackTrace();
MyLogger.logException(nsme);
} catch (InvocationTargetException ite) {
System.err.println(ite.getLocalizedMessage());
ite.printStackTrace();
MyLogger.logException(ite);
} catch (IllegalAccessException iae) {
System.err.println(iae.getLocalizedMessage());
iae.printStackTrace();
MyLogger.logException(iae);
} catch(Exception e) {
TALogger.logException(e);
}
try {
httpResponse.setCharacterEncoding(MyAppConfig.getAppSetting("System.Character.Encoding"));
} catch(Exception e) {
MyLogger.logException(e);
}
}
I did quite a bit of searching on this issue so this might help others who are experiencing the same problem on tomcat. This is taken from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding.
(How to use UTF-8 everywhere).
Set URIEncoding="UTF-8" on your <Connector> in server.xml. References: HTTP Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
Change all your JSPs to include charset name in their contentType.
For example, use <%#page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents).
Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8.
Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8").
Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8.