Character encoding issue with Tomcat - java

There is strange character encoding going on. I am using JSP (JSTL) and Struts with Tomat 6.
I have my JSP page encoding as such:
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
The issue is when I try to pass the url using encodeURI as such:
<script type="text/javascript">
$('#mailer_filter').change(function(){
var val = $(this).val();
console.log(val);
console.log(escape(val));
console.log(encodeURI(val));
location.href = 'mailList.a?' + encodeURI($(this).val());
});
</script>
the parameter on the action (java end) comes out as:
Gaz Métro
however on the front end it is displayed as:
Gaz Métro
which is the correct way. What I can do about this??

Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"

Have you followed these steps?
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
Copied below:
Using UTF-8 as your character encoding for everything is a safe bet. This should work for pretty much every situation.
In order to completely switch to using UTF-8, you need to make the following changes:
Set URIEncoding="UTF-8" on your in server.xml. References: HTTP Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
Change all your JSPs to include charset name in their contentType.
For example, use <%#page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents).
Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8.
Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8").
Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. For more information see http://www.mail-archive.com/users#tomcat.apache.org/msg21117.html.

Try setting the URIEncoding parameter of your tomcat connector (in the server.xml) to UTF-8:
E.g.:
<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
URIEncoding="UTF-8"/>

Related

Spring mvc checkbox selected value not readable as a string (values are in korean language)

I'm working with Spring MVC develping REST API. I have checkboxes that are created automatically. The value of a check box is displayed on the web page in Korean, when I select one checkbox, it returns an unreadable string. I have added <%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> tag in my jsp page and annotation my function in controlar like this RequestMapping(value = "/filterSubmit", method = RequestMethod.POST, produces = "application/json;charset=UTF-8" ) also I have added URIEncoding="UTF-8" to tomcat server.xml <Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8081" protocol="HTTP/1.1" redirectPort="8443"/> still, I cannot read the selected value?
As far as i know, CJK (Chinese/Korean/Japanese) doesn't use UTF-8 encoding. So maybe your problem is encoding. Try some other pageEncoding
References: https://en.m.wikipedia.org/wiki/Korean_language_and_computers

JSP not showing correct UTF-8 contents for HTML form POST

I'm using Java 11 with Tomcat 9 with the latest JSP/JSTL. I'm testing in Chrome 71 and Firefox 64.0 on Windows 10. I have the following test document:
<%# page contentType="text/html; charset=UTF-8" %>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8"/>
<title>Hello</title>
</head>
<body>
<c:if test="${not empty param.fullName}">
<p>Hello, ${param.fullName}.</p>
</c:if>
<form>
<div>
<label>Full name: <input name="fullName" /></label>
</div>
<button>Say Hello</button>
</form>
</body>
</html>
This is perhaps the simplest form possible. As you know the form method defaults to get, the form action defaults to "" (submitting to the same page), and the form enctype defaults to application/x-www-form-urlencoded.
If I enter the name "Flávio José" (a famous Brazilian forró singer and musícian) in the field and submit, the form is submitted via HTTP GET to the same page using hello.jsp?fullName=Fl%C3%A1vio+Jos%C3%A9. This is correct, and the page says:
Hello, Flávio José.
If I change the form method to post and enter the same name "Flávio José", the form contents are instead submitted via POST, with HTTP request contents:
fullName=Fl%C3%A1vio+Jos%C3%A9
This also appears correct. But this time the page says:
Hello, Flávio José.
Rather than seeing %C3%A as a sequence of UTF-8 octects, JSP seems to think that these are a series of ISO-8859-1 octets (or code page 1252 octets), and is therefore decoding them to the wrong character sequence.
But where is it getting ISO-8859-1? What is my JSP page lacking to indicate the correct encoding?
I'll note also that WHATWG specification says that application/x-www-form-urlencoded octets should be parsed as UTF-8 by default. Is the Java servlet specification simply broken? How do I work around this?
This is caused by Tomcat, but the root problem is the Java Servlet 4 specification, which is incorrect and outdated.
Originally HTML 4.0.1 said that application/x-www-form-urlencoded encoded octets should be decoded as US-ASCII. The servlet specification changed this to say that, if the request encoding is not specified, the octets should be decoded as ISO-8859-1. Tomcat is simply following the servlet specification.
There are two problems with the Java servlet specification. The first is that the modern interpretation of application/x-www-form-urlencoded is that encoded octets should be decoded using UTF-8. The second problem is that tying the octet decoding to the resource charset confuses two levels of decoding.
Take another look at this POST content:
fullName=Fl%C3%A1vio+Jos%C3%A9
You'll notice that it is ASCII!! It doesn't matter if you consider the POST HTTP request charset to be ISO-8859-1, UTF-8, or US-ASCII—you'll still wind up with exactly the same Unicode characters before decoding the octets! What encoding is used to decode the encoding octets is completely separate.
As a further example, let's say I download a text file instructions.txt that is clearly marked as ISO-8859-1, and it contains the URI https://example.com/example.jsp?fullName=Fl%C3%A1vio+Jos%C3%A9. Just because the text file has a charset of ISO-8859-1, does that mean I need to decode %C3%A using ISO-8859-1? Of course not! The charset used for decoding URI characters is a separate level of decoding on top of the resource content type charset! Similarly the octets of values encoded in application/x-www-form-urlencoded should be decoded using UTF-8, regardless of the underlying charset of the resource.
There are several workarounds, some of them found at found by looking at the Tomcat character encoding FAQ to "use UTF-8 everywhere".
Set the request character encoding in your web.xml file.
Add the following to your WEB-INF/web.xml file:
<request-character-encoding>UTF-8</request-character-encoding>
This setting is agnostic of the servlet container implementation, and is defined forth in the servlet specification. (You should be able to alternatively put it in Tomcat's conf/web.xml file, if want a global setting and don't mind changing the Tomcat configuration.)
Set the SetCharacterEncodingFilter in your web.xml file.
Tomcat has a proprietary equivalent: use the org.apache.catalina.filters.SetCharacterEncodingFilter in the WEB-INF/web.xml file, as the Tomcat FAQ above mentions, and as illustrated by https://stackoverflow.com/a/37833977/421049, excerpted below:
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This will make your web application only work on Tomcat, so it's better to put this in the Tomcat installation conf/web.xml file instead, as the post above mentions. In fact Tomcat's conf/web.xml installations have these two sections, but commented out; simply uncomment them and things should work.
Force the request character encoding to UTF-8 in the JSP or servlet.
You can force the character encoding of the servlet request to UTF-8, somewhere early in the JSP:
<% request.setCharacterEncoding("UTF-8"); %>
But that is ugly, unwieldy, error-prone, and goes against modern best practices—JSP scriptlets shouldn't be used anymore.
Hopefully we can get a newer Java servlet specification to remove any relationship between the resource charset and the decoding of application/x-www-form-urlencoded octets, and simply state that application/x-www-form-urlencoded octets must be decoded as UTF-8, as is modern practice as clarified by the latest W3C and WHATWG specifications.
Update: I've updated the Tomcat FAQ on Character Encoding Issues with this information.

utf8 html input tag is not respected after refreshing the page

I'm developping an search form with springMVC, in which I have an input tag and a submit button.
If I write in the input tag:
"Cherché"
and I submit. Then, the same data in the input tag will be
"Cherché"
Any help please?
Note that:
- I have already this header in the html page:
<META http-equiv="Content-Type" content="text/html;charset=UTF-8">
- I have specified the coding in springMVC
#RequestMapping(value = "/project/data", produces = "text/plain;charset=UTF-8")
If you're submitting form data using the GET method, you must ensure the servlet engine is using UTF-8 to decode the URLs.
With tomcat, it is in the Connector tag of server.xml:
<Connector port="8080" URIEncoding="UTF-8"/>

construct URL query string: character set encode/decode

I'm trying to construct an URL with query string containing other characters (hebrew in my case).
However, when my webapp receives the request, the extracted request parameters are gibrish...
How can I resolve this?
new URL("http://localhost:8080/SRV/page.jsp?param=" + URLEncoder.encode("heb text", "UTF-8")).openConnection();
target page:
<%
System.out.println("Receive: " + request.getParameter("param"));
%>
I'm using tomcat6, jdk6, windows7 x64
edit: this is my page declaration:
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
Take a look at the $TOMCAT_HOME/conf/server.xml file and check the encoding setting:
<Connector port="8080" ... URIEncoding="UTF-8" />
It appears that Tomcat needs this setting for UTF-8 to work for HTTP request values, such as request parameters.

How to solve UTF-8 in java

I currently use
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
in my jsp page.
And when I get data from textbox using request.getParameter("..."); it retrieves data like that öÉ?É?É?öİ . I saw this problem when I used characters that are not english chars. I add URIEncoding="UTF-8" to server.xml in tomcat. But it retrieved the same (öÉ?É?É?öİ). How to solve it?
Thank you
EDIT
Thanks for your answers. I tried a few things, but nothing has fixed the problem.
Here's what I've done:
I added <Connector URIEncoding="UTF-8" .../> in server.xml.
<meta ... charset=utf-8> tag is ok and I tried request.setCharacterEncoding("UTF-8");
I also tried <filter> tag in web.xml
None of these actions fixes the problem. I'm wondering if there's something else wrong with this...(remembering: I used <form method='post'>. I click submit button and when I get data using request.getParameter("..") the format of this data is not the correct format. )
You can try this code in your Servlet
if(request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}
May be because the actual character encoding is not UTF-8 ? If the characters itself are encoded in some other format then we just can't label them as UTF-8.
Try decoding them by giving various charset and see which one gives proper result. I think the input character encoding is latin1(ISO-8859-1). If yes, follow below code
String param1 = request.getParameter("...");
if(param1!=null)
{
param1 = new String(param1.getBytes("ISO-8859-1"));
}
UTF 8 should be set at all the layers of the application.
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"
There is another place you can check. Did you include following declaration in your JSP file?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I think the problem is that browser still sends requests using default ISO-8859-1, which is the standard charset if not declared.
You can also check the HTTP headers received from server to make sure the correct charset is sent back.
Essentially the cleanest way to do it is to use Unicode in your property files and/or code if need be (not adviced).
This way you avoid all encoding issues, since your programm only has deal with ASCII code, the proper reprenstation is then handeled entierly by the client side and you do not have to worry about the standard os encoding or enviorment encoding.
You can also try adding the following filter at web.xml:
<filter>
<filter-name>Character Encoding Filter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Hope this help
You should try it
String content= request.getParameter("content");
if(content!=null)
content = new String(content.getBytes("ISO-8859-1"));

Categories