How to solve UTF-8 in java - java

I currently use
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
in my jsp page.
And when I get data from textbox using request.getParameter("..."); it retrieves data like that öÉ?É?É?öİ . I saw this problem when I used characters that are not english chars. I add URIEncoding="UTF-8" to server.xml in tomcat. But it retrieved the same (öÉ?É?É?öİ). How to solve it?
Thank you
EDIT
Thanks for your answers. I tried a few things, but nothing has fixed the problem.
Here's what I've done:
I added <Connector URIEncoding="UTF-8" .../> in server.xml.
<meta ... charset=utf-8> tag is ok and I tried request.setCharacterEncoding("UTF-8");
I also tried <filter> tag in web.xml
None of these actions fixes the problem. I'm wondering if there's something else wrong with this...(remembering: I used <form method='post'>. I click submit button and when I get data using request.getParameter("..") the format of this data is not the correct format. )

You can try this code in your Servlet
if(request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}

May be because the actual character encoding is not UTF-8 ? If the characters itself are encoded in some other format then we just can't label them as UTF-8.
Try decoding them by giving various charset and see which one gives proper result. I think the input character encoding is latin1(ISO-8859-1). If yes, follow below code
String param1 = request.getParameter("...");
if(param1!=null)
{
param1 = new String(param1.getBytes("ISO-8859-1"));
}

UTF 8 should be set at all the layers of the application.
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"

There is another place you can check. Did you include following declaration in your JSP file?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I think the problem is that browser still sends requests using default ISO-8859-1, which is the standard charset if not declared.
You can also check the HTTP headers received from server to make sure the correct charset is sent back.

Essentially the cleanest way to do it is to use Unicode in your property files and/or code if need be (not adviced).
This way you avoid all encoding issues, since your programm only has deal with ASCII code, the proper reprenstation is then handeled entierly by the client side and you do not have to worry about the standard os encoding or enviorment encoding.

You can also try adding the following filter at web.xml:
<filter>
<filter-name>Character Encoding Filter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Hope this help

You should try it
String content= request.getParameter("content");
if(content!=null)
content = new String(content.getBytes("ISO-8859-1"));

Related

JSP not showing correct UTF-8 contents for HTML form POST

I'm using Java 11 with Tomcat 9 with the latest JSP/JSTL. I'm testing in Chrome 71 and Firefox 64.0 on Windows 10. I have the following test document:
<%# page contentType="text/html; charset=UTF-8" %>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8"/>
<title>Hello</title>
</head>
<body>
<c:if test="${not empty param.fullName}">
<p>Hello, ${param.fullName}.</p>
</c:if>
<form>
<div>
<label>Full name: <input name="fullName" /></label>
</div>
<button>Say Hello</button>
</form>
</body>
</html>
This is perhaps the simplest form possible. As you know the form method defaults to get, the form action defaults to "" (submitting to the same page), and the form enctype defaults to application/x-www-form-urlencoded.
If I enter the name "Flávio José" (a famous Brazilian forró singer and musícian) in the field and submit, the form is submitted via HTTP GET to the same page using hello.jsp?fullName=Fl%C3%A1vio+Jos%C3%A9. This is correct, and the page says:
Hello, Flávio José.
If I change the form method to post and enter the same name "Flávio José", the form contents are instead submitted via POST, with HTTP request contents:
fullName=Fl%C3%A1vio+Jos%C3%A9
This also appears correct. But this time the page says:
Hello, Flávio José.
Rather than seeing %C3%A as a sequence of UTF-8 octects, JSP seems to think that these are a series of ISO-8859-1 octets (or code page 1252 octets), and is therefore decoding them to the wrong character sequence.
But where is it getting ISO-8859-1? What is my JSP page lacking to indicate the correct encoding?
I'll note also that WHATWG specification says that application/x-www-form-urlencoded octets should be parsed as UTF-8 by default. Is the Java servlet specification simply broken? How do I work around this?
This is caused by Tomcat, but the root problem is the Java Servlet 4 specification, which is incorrect and outdated.
Originally HTML 4.0.1 said that application/x-www-form-urlencoded encoded octets should be decoded as US-ASCII. The servlet specification changed this to say that, if the request encoding is not specified, the octets should be decoded as ISO-8859-1. Tomcat is simply following the servlet specification.
There are two problems with the Java servlet specification. The first is that the modern interpretation of application/x-www-form-urlencoded is that encoded octets should be decoded using UTF-8. The second problem is that tying the octet decoding to the resource charset confuses two levels of decoding.
Take another look at this POST content:
fullName=Fl%C3%A1vio+Jos%C3%A9
You'll notice that it is ASCII!! It doesn't matter if you consider the POST HTTP request charset to be ISO-8859-1, UTF-8, or US-ASCII—you'll still wind up with exactly the same Unicode characters before decoding the octets! What encoding is used to decode the encoding octets is completely separate.
As a further example, let's say I download a text file instructions.txt that is clearly marked as ISO-8859-1, and it contains the URI https://example.com/example.jsp?fullName=Fl%C3%A1vio+Jos%C3%A9. Just because the text file has a charset of ISO-8859-1, does that mean I need to decode %C3%A using ISO-8859-1? Of course not! The charset used for decoding URI characters is a separate level of decoding on top of the resource content type charset! Similarly the octets of values encoded in application/x-www-form-urlencoded should be decoded using UTF-8, regardless of the underlying charset of the resource.
There are several workarounds, some of them found at found by looking at the Tomcat character encoding FAQ to "use UTF-8 everywhere".
Set the request character encoding in your web.xml file.
Add the following to your WEB-INF/web.xml file:
<request-character-encoding>UTF-8</request-character-encoding>
This setting is agnostic of the servlet container implementation, and is defined forth in the servlet specification. (You should be able to alternatively put it in Tomcat's conf/web.xml file, if want a global setting and don't mind changing the Tomcat configuration.)
Set the SetCharacterEncodingFilter in your web.xml file.
Tomcat has a proprietary equivalent: use the org.apache.catalina.filters.SetCharacterEncodingFilter in the WEB-INF/web.xml file, as the Tomcat FAQ above mentions, and as illustrated by https://stackoverflow.com/a/37833977/421049, excerpted below:
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This will make your web application only work on Tomcat, so it's better to put this in the Tomcat installation conf/web.xml file instead, as the post above mentions. In fact Tomcat's conf/web.xml installations have these two sections, but commented out; simply uncomment them and things should work.
Force the request character encoding to UTF-8 in the JSP or servlet.
You can force the character encoding of the servlet request to UTF-8, somewhere early in the JSP:
<% request.setCharacterEncoding("UTF-8"); %>
But that is ugly, unwieldy, error-prone, and goes against modern best practices—JSP scriptlets shouldn't be used anymore.
Hopefully we can get a newer Java servlet specification to remove any relationship between the resource charset and the decoding of application/x-www-form-urlencoded octets, and simply state that application/x-www-form-urlencoded octets must be decoded as UTF-8, as is modern practice as clarified by the latest W3C and WHATWG specifications.
Update: I've updated the Tomcat FAQ on Character Encoding Issues with this information.

Apache struts internationalization and localization issue

I am working on a Struts-1 project which support two language English and Turkies. To display message we are using Internationalization feature of Struts-1 hence we have two property file(ApplicationResources_en.properties and ApplicationResources_en.properties) to store messages which need to be display to user.
For english version ApplicationResources_en.properties key and value is
farequoteautomatic.entry-area.gen.emd.fareamount=Fare Amount
For Turkies version ApplicationResources_tr.properties key and value is
farequoteautomatic.entry-area.gen.emd.fareamount=Ücret Miktarı
Everything is working fine when Locale is English means when we are using English version. There is correct and expected out put for that key which is Fare Amount.
But when Locale is changed means when we try try to use turkey version there no correct out put. It displays special chars rather than the actual char written in property fıle.
In property file message is Ücret Miktarı but out put at browser is �cret Miktar�.
Note: I have checked my Firefox browser is set default to Unicede (UTF-8) encoding and we have a header.jsp which is encluded in each page in which we have a META tag like <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I don't understand what I am doing wrong here. Please help me.
check your browser encoding and set it UTF-8
try this
in web.xml
<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>bt.gov.g2c.framework.common.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Followed mkyong url, It says.
For UTF-8 or non-English characters, for example Chinese , you should encode it with native2ascii tool.
With the help of native2ascii tool
farequoteautomatic.entry-area.gen.emd.fareamount=Ücret Miktarı
Converted to
farequoteautomatic.entry-area.gen.emd.fareamount=\ufeff\u00dccret Miktar\u0131
And at the browser i got desired out put that is Ücret Miktarı

UTF-8 encoding in JSP page [duplicate]

This question already has answers here:
How to pass Unicode characters as JSP/Servlet request.getParameter?
(5 answers)
Closed 2 years ago.
I have a JSP page whose page encoding is ISO-8859-1. This JSP page there is in a question answer blog. I want to include special characters during Q/A posting.
The problem is JSP is not supporting UTF-8 encoding even I have changed it from ISO-8859-1 to UTF-8. These characters (~,%,&,+) are making problem. When I am posting these character either individually or with the combination of any character it is storinh null in the database and when I remove these characters while posting application it is working fine.
Can any one suggest some solution?
You should use the same encoding on all layers of your application to avoid this problem. It is useful to add a filter to set the encoding:
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain) throws ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
To only set the encoding on your JSP pages, add this line to them:
<%# page contentType="text/html; charset=UTF-8" %>
Configure your database to use the same char encoding as well.
If you need to convert the encoding of a string see:
Encoding conversion in java
I would not recommend to store HTML encoded text in your database. For example, if you need to generate a PDF (or anything other than HTML) you need to convert the HTML encoding first.
The full JSP tag should be something like this, mind the pageEncoding too:
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
Some old browsers mess up with the encoding too. you can use the HTML tag
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Also the file should be recorded in UTF-8 format, if you are using Eclipse left-click on the file->Properties->Check out ->Text file encoding.
I also had an issue displaying charectors like "Ṁ Ů".I added the following to my web.xml.
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
This solved the issue in the pages except header. Tried many ways to solve this and nothing worked in my case. The issue with header was header jsp page is included from another jsp. So gave the encoding to the import and that solved my problem.
<c:import url="/Header1.jsp" charEncoding="UTF-8"/>
Thanks
The default JSP file encoding is specified by JSR315 as ISO-8859-1. This is the encoding that the JSP engine uses to read the JSP file and it is unrelated to the servlet request or response encoding.
If you have non-latin characters in your JSP files, save the JSP file as UTF-8 with BOM or set pageEncoding in the beginning of the JSP page:
<%#page pageEncoding="UTF-8" %>
However, you might want to change the default to UTF-8 globally for all JSP pages. That can be done via web.xml:
<jsp-config>
<jsp-property-group>
<url-pattern>/*</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
Or, when using Spring Boot with an (embedded) Tomcat, via a TomcatContextCustomizer:
#Component
public class JspConfig implements TomcatContextCustomizer {
#Override
public void customize(Context context) {
JspPropertyGroup pg = new JspPropertyGroup();
pg.addUrlPattern("/*");
pg.setPageEncoding("UTF-8");
pg.setTrimWhitespace("true"); // optional, but nice to have
ArrayList<JspPropertyGroupDescriptor> pgs = new ArrayList<>();
pgs.add(new JspPropertyGroupDescriptorImpl(pg));
context.setJspConfigDescriptor(new JspConfigDescriptorImpl(pgs, new ArrayList<TaglibDescriptor>()));
}
}
For JSP to work with Spring Boot, don't forget to include these dependencies:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.tomcat.embed</groupId>
<artifactId>tomcat-embed-jasper</artifactId>
<scope>provided</scope>
</dependency>
And to make a "runnable" .war file, repackage it:
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>repackage</goal>
</goals>
</execution>
</executions>
</plugin>
. . .
You have to make sure the file is been saved with UTF-8 encoding.
You can do it with several plain text editors. With Notepad++, i.e., you can choose in the menu Encoding-->Encode in UTF-8. You can also do it even with Windows' Notepad (Save As --> Encoding UTF-8).
If you are using Eclipse, you can set it in the file's Properties.
Also, check if the problem is that you have to escape those characters. It wouldn't be strange that were your problem, as one of the characters is &.
I used encoding filter which has solved my all encoding problem...
package com.dina.filter;
import java.io.IOException;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
/**
*
* #author DINANATH
*/
public class EncodingFilter implements Filter {
private String encoding = "utf-8";
public void doFilter(ServletRequest request,ServletResponse response, FilterChain filterChain) throws IOException, ServletException {
request.setCharacterEncoding(encoding);
// response.setContentType("text/html;charset=UTF-8");
response.setCharacterEncoding(encoding);
filterChain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
String encodingParam = filterConfig.getInitParameter("encoding");
if (encodingParam != null) {
encoding = encodingParam;
}
}
public void destroy() {
// nothing todo
}
}
in web.xml
<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>
com.dina.filter.EncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This thread can help you:
Passing request parameters as UTF-8 encoded strings
Basically:
request.setCharacterEncoding("UTF-8");
String login = request.getParameter("login");
String password = request.getParameter("password");
Or you use javascript on jsp file:
var userInput = $("#myInput").val();
var encodedUserInput = encodeURIComponent(userInput);
$("#hiddenImput").val(encodedUserInput);
and after recover on class:
String parameter = URLDecoder.decode(request.getParameter("hiddenImput"), "UTF-8");
This is a common issue.
one of the easiest way to solve is to check if the special character is reaching inside the action layer and then modifying the special character in the java code.
If you are able to view this character in Action or any other java layer of your choice (Like business layer), just replace the character with corresponding HTML character using the StringEscapeUtils.html#escapeHtml
After doing the escape. use the new string to save to the DB.
This will help you.
<%#page contentType="text/html" pageEncoding="UTF-8"%>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
This are special characters in html. Why dont you encode it?
Check it out: http://www.degraeve.com/reference/specialcharacters.php
Thanks for all the Hints. Using Tomcat8 I also added a filter like #Jasper de Vries wrote. But in the newer Tomcats nowadays there is a filter already implemented that can be used resp just uncommented in the Tomcat web.xml:
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<async-supported>true</async-supported>
</filter>
...
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
And like all others posted; I added the URIEncoding="UTF-8" to the Tomcat Connector in Apache. That also helped.
Important to say, that Eclipse (if you use this) has a copy of its web.xml and overwrites the Tomcat-Settings as it was explained here: Broken UTF-8 URI Encoding in JSPs
I had the same problem using special characters as delimiters on JSP. When the special characters got posted to the servlet, they all got messed up. I solved the issue by using the following conversion:
String str = new String (request.getParameter("string").getBytes ("iso-8859-1"), "UTF-8");
Page encoding or anything else do not matter a lot. ISO-8859-1 is a subset of UTF-8, therefore you never have to convert ISO-8859-1 to UTF-8 because ISO-8859-1 is already UTF-8,a subset of UTF-8 but still UTF-8.
Plus, all that do not mean a thing if You have a double encoding somewhere.
This is my "cure all" recipe for all things encoding and charset related:
String myString = "heartbroken ð";
//String is double encoded, fix that first.
myString = new String(myString.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
String cleanedText = StringEscapeUtils.unescapeJava(myString);
byte[] bytes = cleanedText.getBytes(StandardCharsets.UTF_8);
String text = new String(bytes, StandardCharsets.UTF_8);
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
CharsetEncoder encoder = charset.newEncoder();
encoder.onMalformedInput(CodingErrorAction.IGNORE);
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
try {
// The new ByteBuffer is ready to be read.
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(text));
// The new ByteBuffer is ready to be read.
CharBuffer cbuf = decoder.decode(bbuf);
String str = cbuf.toString();
} catch (CharacterCodingException e) {
logger.error("Error Message if you want to");
}
i add this shell script to convert jsp files from IS
#!/bin/sh
###############################################
## this script file must be placed in the parent
## folder of the to folders "in" and "out"
## in contain the input jsp files
## out will containt the generated jsp files
##
###############################################
find in/ -name *.jsp |
while read line; do
outpath=`echo $line | sed -e 's/in/out/'` ;
parentdir=`echo $outpath | sed -e 's/[^\/]*\.jsp$//'` ;
mkdir -p $parentdir
echo $outpath ;
iconv -t UTF-8 -f ISO-8859-1 -o $outpath $line ;
done

Character encoding issue with Tomcat

There is strange character encoding going on. I am using JSP (JSTL) and Struts with Tomat 6.
I have my JSP page encoding as such:
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
The issue is when I try to pass the url using encodeURI as such:
<script type="text/javascript">
$('#mailer_filter').change(function(){
var val = $(this).val();
console.log(val);
console.log(escape(val));
console.log(encodeURI(val));
location.href = 'mailList.a?' + encodeURI($(this).val());
});
</script>
the parameter on the action (java end) comes out as:
Gaz Métro
however on the front end it is displayed as:
Gaz Métro
which is the correct way. What I can do about this??
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"
Have you followed these steps?
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
Copied below:
Using UTF-8 as your character encoding for everything is a safe bet. This should work for pretty much every situation.
In order to completely switch to using UTF-8, you need to make the following changes:
Set URIEncoding="UTF-8" on your in server.xml. References: HTTP Connector, AJP Connector.
Use a character encoding filter with the default encoding set to UTF-8
Change all your JSPs to include charset name in their contentType.
For example, use <%#page contentType="text/html; charset=UTF-8" %> for the usual JSP pages and <jsp:directive.page contentType="text/html; charset=UTF-8" /> for the pages in XML syntax (aka JSP Documents).
Change all your servlets to set the content type for responses and to include charset name in the content type to be UTF-8.
Use response.setContentType("text/html; charset=UTF-8") or response.setCharacterEncoding("UTF-8").
Change any content-generation libraries you use (Velocity, Freemarker, etc.) to use UTF-8 and to specify UTF-8 in the content type of the responses that they generate.
Disable any valves or filters that may read request parameters before your character encoding filter or jsp page has a chance to set the encoding to UTF-8. For more information see http://www.mail-archive.com/users#tomcat.apache.org/msg21117.html.
Try setting the URIEncoding parameter of your tomcat connector (in the server.xml) to UTF-8:
E.g.:
<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
URIEncoding="UTF-8"/>

transfer UTF8 input from a JSP form to a Spring controller breaks umlauts [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
UTF-8 encoding and http parameters
I have a UTF8 encoded JSP with a pure UTF8 header (and the text file is also encoded as UTF-8) and a form inside that page:
<?xml version="1.0" encoding="UTF-8" ?>
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head>
<body>
This is a funny German character: ß
<form action="utf.do" method="post">
<input type="text" name="p" value="${p}" />
<input type="submit" value="OK"/>
</form>
Then I have a nice Spring-backed #Controller on the backend:
#Controller
public class UTFCtl {
#RequestMapping("/utf.do")
public ModelAndView handleUTF(#RequestParam(value="p", required=false) String anUTFString) {
ModelAndView ret = new ModelAndView("utf");
ret.addObject("p", anUTFString);
return ret;
}
}
As you see the form transports its data via POST. Typing some German umlauts into the form field yields a bunch of crumbled characters at the backend. So submitting hähöhü on the form field yields hähöhü as value after submitting. I used the debugger and the var value is already scrambled meaning that Spring/Tomcat/Servlet hasn't detected the encoding correctly or the browser didn't encode my input correctly. The colleagues' usual response to that is: encode in ISO for Germany or encode using Javascript before transmitting. This shouldn't be neccessary, should it?? I mean, this is 2011 and that's what UTF8 is good for!
[EDIT] I think this is proving that the input is incoming as ISO even though I tell him to use UTF8:
byte[] in = anUTFString.getBytes("iso-8859-1");
String out = new String(in,"UTF-8");
out is then displayed correctly in the JSP!
I'm using Spring 2.5 on Tomcat 5.5 with Firefox 4 beta 11 on a Windows XP SP3 box. I already told the Tomcat in its to use URIEncoding="utf-8" but that doesn't change the game. I analysed the Firefox transmissions using Firebug and it seems to transmit UTF8. I also checked the current Spring WebMVC setup and IMO there are no further encoding changers anywhere, not in the config, nor in the web.xml (no listeners, nothing)- I read and understood most of the UTF-8 related docs and I worked like that in a PHP environment without any problems (simply switching PHP to utf-8, done)...
So, indeed it's a matter of the server settings, too. Please note the duplication comment beneath the question. You have to tell your server as well as your deployment to use utf-8 and then everything's fine (pretty much like in PHP). Please note, that I'm duplicating the answer here (http://ibnaziz.wordpress.com/2008/06/10/spring-utf-8-conversion-using-characterencodingfilter/).
This works in a Tomcat environment:
edit your Tomcat's server.xml Connectors to deliver UTF-8:
<Connector URIEncoding="utf-8" port="8080" blabla="blabla" ... >
Then add to your web.xml:
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
this will tell the Spring framework to apply the UTF-8 filter for all kinds of requests (/*). After applying this you can even have links in the format ?q=äöüß which will be transported correctly. Though it's better to encode parameters for request transport:
URLEncoder.encode(aParameterWithUmlaut,"UTF-8")

Categories