Java program unable to print Hindi , Gujrati from MySQL in Ubuntu - java

I am facing some challenge while printing Gujrati or Hindi using Java (tomcat server) , MySQL combination in Ubuntu . I have to produce some html format using Java from MySQL DB which will be displayed through browser. Same will also be printed in pdf using wkhtmltopdf . Although I could enter data in the table in Gujarati through MySQL workbench , unfortunately Java is printing it as ?????.
I have done the following :
1) Altered the text column of corresponding MySQL table adding
CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Hence it can store the Gujarati / Hindi text properly.
2) In the jdbc url , I have added
useUnicode=true&characterEncoding=utf8
At MySQL level I have applied
SET character_set_server=utf8mb4;
3) In the java code I have applied
System.setProperty("file.encoding", "UTF-8");
It is still returning ?????. Please let me know what else is required to fetch Gujrati character from MySQL database using Java in Ubuntu and display it through browser .
Thanks in advance for your help.

useUnicode=true&characterEncoding=utf8
-->
useUnicode=yes&characterEncoding=UTF-8
You say the column is now set to "CHARACTER SET utf8 COLLATE utf8_unicode_ci;". Was the INSERT done after the ALTER? If it was before, then nothing can fix the question marks.

It could be solved finally . I kept a simple test.html file containing the Gujrati character in jsp folder of the tomcat server . Even that could not be displayed properly in browser . The same html file was saved as test.jsp which also could not display the characters . So this hinted that it was not an issue of Java-MySQL combination as thought earlier .
In the same ubuntu server we have php server . From sites hosted in that PHP server, this simple html page could be viewed properly when invoked through same browser . This gave the clue that there is no change required at Ubuntu level but some configuration is needed at the tomcat server level.
The way it was resolved is as mentioned below .
1) At the servlet level I put the following two lines :
response.setContentType("text/html; charset=UTF-8");
response.setCharacterEncoding("UTF-8");
2) For jsp page put :
<%#page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
In program generated html page added the following tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
3) In server.xml of tomcat put URIEncoding="UTF-8" in Connector element .
<Connector port="8082" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8444"
URIEncoding="UTF-8"/>
4) In web.xml I put the following for JSP page
<jsp-config>
<jsp-property-group>
<url-pattern>*.*</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
So that whatever is put in jsp folder (jsp or html page) , can display unicode characters. After putting this the aforesaid test.html , test.jsp could display the characters properly . However , servlet was still not able to display the characters . So the below mentioned steps were applied .
5) As advised in some discussion page , I applied a java filter as specified and added corresponding tags in web.xml as shown below .
import java.io.IOException;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
public class CharsetFilter implements Filter{
private String encoding;
public void init(FilterConfig config) throws ServletException{
encoding = config.getInitParameter("requestEncoding");
if( encoding==null ) encoding="UTF-8";
}
public void doFilter(ServletRequest request, ServletResponse response
, FilterChain next) throws IOException, ServletException{
if(null == request.getCharacterEncoding())
request.setCharacterEncoding(encoding);
response.setContentType("text/html; charset=UTF-8");
response.setCharacterEncoding("UTF-8");
next.doFilter(request, response);
}
public void destroy(){}
}
Then added following tags in web.xml :
<filter>
<filter-name>CharsetFilter</filter-name>
<filter-class>CharsetFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>CharsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
After applying this , the servlet (which was sending the html generated from MySQL by the java code) can now display Gujarati / Hindi characters in browser . I believe same technique is applicable for any such languages .
Following discussion links helped me to resolve the issue .
https://wiki.duraspace.org/pages/viewpage.action?pageId=34638116
How to get UTF-8 working in Java webapps?
UtF-8 format not working in servlet for tomcat server
https://dertompson.com/2007/01/29/encoding-filter-for-java-web-applications/

Related

Servlet can not fetch the String with proper content type

My simple question is why I cant pass non english parameter with different character encoding through a url like this:
http://my-project-name:8080/something?word=علی
however I can send the parameter using a form with post method but I don't wanna do that & I wanna figure out why I can't do it using get method !
Here are my configurations:
In my web.xml I have:
<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>sys.system.EncodingFilter</filter-class>
<init-param>
<param-name>encodings</param-name>
<param-value>US-ASCII, UTF-8, EUC-KR, ISO-8859-15, ISO-8859-1</param-value>
</init-param>
<init-param>
<param-name>inputEncodingParameterName</param-name>
<param-value>ie</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
here is my servlet:
public class EncodingFilter implements Filter {
#Override
public void destroy() {
}
#Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
if (request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}
if (response.getCharacterEncoding() == null) {
response.setCharacterEncoding("UTF-8");
}
chain.doFilter(request, response);
}
#Override
public void init(FilterConfig arg0) throws ServletException {
}
}
my jsp header has set properly:
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> ...
when I want to fetch the paramter word in my controller I have a character encoding, here is my controller:
#RequestMapping(value = "something")
public String stopJob(#RequestParam("word") String word) {
... do something
}
the interesting thing is everything has set properly even when I print
request.getCharacterEncoding();
It returns "UTF-8" to me but the "word" is not proper & It's corrupted.
is there anyone here who know about this issue ?
thanks !
Have you set URIEncoding inside your Tomcat see tomcat uriencoding
also see this Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed
This problem took tones of time of me & finally I figured out what was the mistake :
the server.xml file which was inside eclipse workspace was not as same as the one the "Runtime Environment" ...
that was crap because the server.xml inside eclipse must be same as "Runtime Environment" has because It loads those configs & copy inside Workspace -> Server folder
so when I figured that out, I did these as you guess:
1) delete "Eclipse -> Window -> Preference -> Runtime Environment" server
2) delete everything inside workspace -> Server folder
3) add again the server from "Eclipse -> Window -> Preference -> Runtime Environment"
4) config the server to boot my project
5) change the server.xml file inside "workspace/Servers/TomcatServerFolderName" directory and add URIEncoding="UTF-8" useBodyEncodingForURI="true" inside my connector
<Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080" protocol="HTTP/1.1" redirectPort="8443" URIEncoding="UTF-8" useBodyEncodingForURI="true"/>
and after doing all of this character-encoding works just fine with get method !
To transport arbitrary data via url parameters you have to use the format as it's generated by UrlEncoder with all that % and hex numbers. The encoding in this context difines how the hex numbers are interpeted.

Apache struts internationalization and localization issue

I am working on a Struts-1 project which support two language English and Turkies. To display message we are using Internationalization feature of Struts-1 hence we have two property file(ApplicationResources_en.properties and ApplicationResources_en.properties) to store messages which need to be display to user.
For english version ApplicationResources_en.properties key and value is
farequoteautomatic.entry-area.gen.emd.fareamount=Fare Amount
For Turkies version ApplicationResources_tr.properties key and value is
farequoteautomatic.entry-area.gen.emd.fareamount=Ücret Miktarı
Everything is working fine when Locale is English means when we are using English version. There is correct and expected out put for that key which is Fare Amount.
But when Locale is changed means when we try try to use turkey version there no correct out put. It displays special chars rather than the actual char written in property fıle.
In property file message is Ücret Miktarı but out put at browser is �cret Miktar�.
Note: I have checked my Firefox browser is set default to Unicede (UTF-8) encoding and we have a header.jsp which is encluded in each page in which we have a META tag like <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I don't understand what I am doing wrong here. Please help me.
check your browser encoding and set it UTF-8
try this
in web.xml
<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>bt.gov.g2c.framework.common.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Followed mkyong url, It says.
For UTF-8 or non-English characters, for example Chinese , you should encode it with native2ascii tool.
With the help of native2ascii tool
farequoteautomatic.entry-area.gen.emd.fareamount=Ücret Miktarı
Converted to
farequoteautomatic.entry-area.gen.emd.fareamount=\ufeff\u00dccret Miktar\u0131
And at the browser i got desired out put that is Ücret Miktarı

How to solve UTF-8 in java

I currently use
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
in my jsp page.
And when I get data from textbox using request.getParameter("..."); it retrieves data like that öÉ?É?É?öİ . I saw this problem when I used characters that are not english chars. I add URIEncoding="UTF-8" to server.xml in tomcat. But it retrieved the same (öÉ?É?É?öİ). How to solve it?
Thank you
EDIT
Thanks for your answers. I tried a few things, but nothing has fixed the problem.
Here's what I've done:
I added <Connector URIEncoding="UTF-8" .../> in server.xml.
<meta ... charset=utf-8> tag is ok and I tried request.setCharacterEncoding("UTF-8");
I also tried <filter> tag in web.xml
None of these actions fixes the problem. I'm wondering if there's something else wrong with this...(remembering: I used <form method='post'>. I click submit button and when I get data using request.getParameter("..") the format of this data is not the correct format. )
You can try this code in your Servlet
if(request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}
May be because the actual character encoding is not UTF-8 ? If the characters itself are encoded in some other format then we just can't label them as UTF-8.
Try decoding them by giving various charset and see which one gives proper result. I think the input character encoding is latin1(ISO-8859-1). If yes, follow below code
String param1 = request.getParameter("...");
if(param1!=null)
{
param1 = new String(param1.getBytes("ISO-8859-1"));
}
UTF 8 should be set at all the layers of the application.
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"
There is another place you can check. Did you include following declaration in your JSP file?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I think the problem is that browser still sends requests using default ISO-8859-1, which is the standard charset if not declared.
You can also check the HTTP headers received from server to make sure the correct charset is sent back.
Essentially the cleanest way to do it is to use Unicode in your property files and/or code if need be (not adviced).
This way you avoid all encoding issues, since your programm only has deal with ASCII code, the proper reprenstation is then handeled entierly by the client side and you do not have to worry about the standard os encoding or enviorment encoding.
You can also try adding the following filter at web.xml:
<filter>
<filter-name>Character Encoding Filter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Hope this help
You should try it
String content= request.getParameter("content");
if(content!=null)
content = new String(content.getBytes("ISO-8859-1"));

UTF-8 encoding in JSP page [duplicate]

This question already has answers here:
How to pass Unicode characters as JSP/Servlet request.getParameter?
(5 answers)
Closed 2 years ago.
I have a JSP page whose page encoding is ISO-8859-1. This JSP page there is in a question answer blog. I want to include special characters during Q/A posting.
The problem is JSP is not supporting UTF-8 encoding even I have changed it from ISO-8859-1 to UTF-8. These characters (~,%,&,+) are making problem. When I am posting these character either individually or with the combination of any character it is storinh null in the database and when I remove these characters while posting application it is working fine.
Can any one suggest some solution?
You should use the same encoding on all layers of your application to avoid this problem. It is useful to add a filter to set the encoding:
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain) throws ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
To only set the encoding on your JSP pages, add this line to them:
<%# page contentType="text/html; charset=UTF-8" %>
Configure your database to use the same char encoding as well.
If you need to convert the encoding of a string see:
Encoding conversion in java
I would not recommend to store HTML encoded text in your database. For example, if you need to generate a PDF (or anything other than HTML) you need to convert the HTML encoding first.
The full JSP tag should be something like this, mind the pageEncoding too:
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
Some old browsers mess up with the encoding too. you can use the HTML tag
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Also the file should be recorded in UTF-8 format, if you are using Eclipse left-click on the file->Properties->Check out ->Text file encoding.
I also had an issue displaying charectors like "Ṁ Ů".I added the following to my web.xml.
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
This solved the issue in the pages except header. Tried many ways to solve this and nothing worked in my case. The issue with header was header jsp page is included from another jsp. So gave the encoding to the import and that solved my problem.
<c:import url="/Header1.jsp" charEncoding="UTF-8"/>
Thanks
The default JSP file encoding is specified by JSR315 as ISO-8859-1. This is the encoding that the JSP engine uses to read the JSP file and it is unrelated to the servlet request or response encoding.
If you have non-latin characters in your JSP files, save the JSP file as UTF-8 with BOM or set pageEncoding in the beginning of the JSP page:
<%#page pageEncoding="UTF-8" %>
However, you might want to change the default to UTF-8 globally for all JSP pages. That can be done via web.xml:
<jsp-config>
<jsp-property-group>
<url-pattern>/*</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
Or, when using Spring Boot with an (embedded) Tomcat, via a TomcatContextCustomizer:
#Component
public class JspConfig implements TomcatContextCustomizer {
#Override
public void customize(Context context) {
JspPropertyGroup pg = new JspPropertyGroup();
pg.addUrlPattern("/*");
pg.setPageEncoding("UTF-8");
pg.setTrimWhitespace("true"); // optional, but nice to have
ArrayList<JspPropertyGroupDescriptor> pgs = new ArrayList<>();
pgs.add(new JspPropertyGroupDescriptorImpl(pg));
context.setJspConfigDescriptor(new JspConfigDescriptorImpl(pgs, new ArrayList<TaglibDescriptor>()));
}
}
For JSP to work with Spring Boot, don't forget to include these dependencies:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.tomcat.embed</groupId>
<artifactId>tomcat-embed-jasper</artifactId>
<scope>provided</scope>
</dependency>
And to make a "runnable" .war file, repackage it:
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>repackage</goal>
</goals>
</execution>
</executions>
</plugin>
. . .
You have to make sure the file is been saved with UTF-8 encoding.
You can do it with several plain text editors. With Notepad++, i.e., you can choose in the menu Encoding-->Encode in UTF-8. You can also do it even with Windows' Notepad (Save As --> Encoding UTF-8).
If you are using Eclipse, you can set it in the file's Properties.
Also, check if the problem is that you have to escape those characters. It wouldn't be strange that were your problem, as one of the characters is &.
I used encoding filter which has solved my all encoding problem...
package com.dina.filter;
import java.io.IOException;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
/**
*
* #author DINANATH
*/
public class EncodingFilter implements Filter {
private String encoding = "utf-8";
public void doFilter(ServletRequest request,ServletResponse response, FilterChain filterChain) throws IOException, ServletException {
request.setCharacterEncoding(encoding);
// response.setContentType("text/html;charset=UTF-8");
response.setCharacterEncoding(encoding);
filterChain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
String encodingParam = filterConfig.getInitParameter("encoding");
if (encodingParam != null) {
encoding = encodingParam;
}
}
public void destroy() {
// nothing todo
}
}
in web.xml
<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>
com.dina.filter.EncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This thread can help you:
Passing request parameters as UTF-8 encoded strings
Basically:
request.setCharacterEncoding("UTF-8");
String login = request.getParameter("login");
String password = request.getParameter("password");
Or you use javascript on jsp file:
var userInput = $("#myInput").val();
var encodedUserInput = encodeURIComponent(userInput);
$("#hiddenImput").val(encodedUserInput);
and after recover on class:
String parameter = URLDecoder.decode(request.getParameter("hiddenImput"), "UTF-8");
This is a common issue.
one of the easiest way to solve is to check if the special character is reaching inside the action layer and then modifying the special character in the java code.
If you are able to view this character in Action or any other java layer of your choice (Like business layer), just replace the character with corresponding HTML character using the StringEscapeUtils.html#escapeHtml
After doing the escape. use the new string to save to the DB.
This will help you.
<%#page contentType="text/html" pageEncoding="UTF-8"%>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
This are special characters in html. Why dont you encode it?
Check it out: http://www.degraeve.com/reference/specialcharacters.php
Thanks for all the Hints. Using Tomcat8 I also added a filter like #Jasper de Vries wrote. But in the newer Tomcats nowadays there is a filter already implemented that can be used resp just uncommented in the Tomcat web.xml:
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<async-supported>true</async-supported>
</filter>
...
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
And like all others posted; I added the URIEncoding="UTF-8" to the Tomcat Connector in Apache. That also helped.
Important to say, that Eclipse (if you use this) has a copy of its web.xml and overwrites the Tomcat-Settings as it was explained here: Broken UTF-8 URI Encoding in JSPs
I had the same problem using special characters as delimiters on JSP. When the special characters got posted to the servlet, they all got messed up. I solved the issue by using the following conversion:
String str = new String (request.getParameter("string").getBytes ("iso-8859-1"), "UTF-8");
Page encoding or anything else do not matter a lot. ISO-8859-1 is a subset of UTF-8, therefore you never have to convert ISO-8859-1 to UTF-8 because ISO-8859-1 is already UTF-8,a subset of UTF-8 but still UTF-8.
Plus, all that do not mean a thing if You have a double encoding somewhere.
This is my "cure all" recipe for all things encoding and charset related:
String myString = "heartbroken ð";
//String is double encoded, fix that first.
myString = new String(myString.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
String cleanedText = StringEscapeUtils.unescapeJava(myString);
byte[] bytes = cleanedText.getBytes(StandardCharsets.UTF_8);
String text = new String(bytes, StandardCharsets.UTF_8);
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
CharsetEncoder encoder = charset.newEncoder();
encoder.onMalformedInput(CodingErrorAction.IGNORE);
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
try {
// The new ByteBuffer is ready to be read.
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(text));
// The new ByteBuffer is ready to be read.
CharBuffer cbuf = decoder.decode(bbuf);
String str = cbuf.toString();
} catch (CharacterCodingException e) {
logger.error("Error Message if you want to");
}
i add this shell script to convert jsp files from IS
#!/bin/sh
###############################################
## this script file must be placed in the parent
## folder of the to folders "in" and "out"
## in contain the input jsp files
## out will containt the generated jsp files
##
###############################################
find in/ -name *.jsp |
while read line; do
outpath=`echo $line | sed -e 's/in/out/'` ;
parentdir=`echo $outpath | sed -e 's/[^\/]*\.jsp$//'` ;
mkdir -p $parentdir
echo $outpath ;
iconv -t UTF-8 -f ISO-8859-1 -o $outpath $line ;
done

transfer UTF8 input from a JSP form to a Spring controller breaks umlauts [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
UTF-8 encoding and http parameters
I have a UTF8 encoded JSP with a pure UTF8 header (and the text file is also encoded as UTF-8) and a form inside that page:
<?xml version="1.0" encoding="UTF-8" ?>
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head>
<body>
This is a funny German character: ß
<form action="utf.do" method="post">
<input type="text" name="p" value="${p}" />
<input type="submit" value="OK"/>
</form>
Then I have a nice Spring-backed #Controller on the backend:
#Controller
public class UTFCtl {
#RequestMapping("/utf.do")
public ModelAndView handleUTF(#RequestParam(value="p", required=false) String anUTFString) {
ModelAndView ret = new ModelAndView("utf");
ret.addObject("p", anUTFString);
return ret;
}
}
As you see the form transports its data via POST. Typing some German umlauts into the form field yields a bunch of crumbled characters at the backend. So submitting hähöhü on the form field yields hähöhü as value after submitting. I used the debugger and the var value is already scrambled meaning that Spring/Tomcat/Servlet hasn't detected the encoding correctly or the browser didn't encode my input correctly. The colleagues' usual response to that is: encode in ISO for Germany or encode using Javascript before transmitting. This shouldn't be neccessary, should it?? I mean, this is 2011 and that's what UTF8 is good for!
[EDIT] I think this is proving that the input is incoming as ISO even though I tell him to use UTF8:
byte[] in = anUTFString.getBytes("iso-8859-1");
String out = new String(in,"UTF-8");
out is then displayed correctly in the JSP!
I'm using Spring 2.5 on Tomcat 5.5 with Firefox 4 beta 11 on a Windows XP SP3 box. I already told the Tomcat in its to use URIEncoding="utf-8" but that doesn't change the game. I analysed the Firefox transmissions using Firebug and it seems to transmit UTF8. I also checked the current Spring WebMVC setup and IMO there are no further encoding changers anywhere, not in the config, nor in the web.xml (no listeners, nothing)- I read and understood most of the UTF-8 related docs and I worked like that in a PHP environment without any problems (simply switching PHP to utf-8, done)...
So, indeed it's a matter of the server settings, too. Please note the duplication comment beneath the question. You have to tell your server as well as your deployment to use utf-8 and then everything's fine (pretty much like in PHP). Please note, that I'm duplicating the answer here (http://ibnaziz.wordpress.com/2008/06/10/spring-utf-8-conversion-using-characterencodingfilter/).
This works in a Tomcat environment:
edit your Tomcat's server.xml Connectors to deliver UTF-8:
<Connector URIEncoding="utf-8" port="8080" blabla="blabla" ... >
Then add to your web.xml:
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
this will tell the Spring framework to apply the UTF-8 filter for all kinds of requests (/*). After applying this you can even have links in the format ?q=äöüß which will be transported correctly. Though it's better to encode parameters for request transport:
URLEncoder.encode(aParameterWithUmlaut,"UTF-8")

Categories