Apache struts internationalization and localization issue - java

I am working on a Struts-1 project which support two language English and Turkies. To display message we are using Internationalization feature of Struts-1 hence we have two property file(ApplicationResources_en.properties and ApplicationResources_en.properties) to store messages which need to be display to user.
For english version ApplicationResources_en.properties key and value is
farequoteautomatic.entry-area.gen.emd.fareamount=Fare Amount
For Turkies version ApplicationResources_tr.properties key and value is
farequoteautomatic.entry-area.gen.emd.fareamount=Ücret Miktarı
Everything is working fine when Locale is English means when we are using English version. There is correct and expected out put for that key which is Fare Amount.
But when Locale is changed means when we try try to use turkey version there no correct out put. It displays special chars rather than the actual char written in property fıle.
In property file message is Ücret Miktarı but out put at browser is �cret Miktar�.
Note: I have checked my Firefox browser is set default to Unicede (UTF-8) encoding and we have a header.jsp which is encluded in each page in which we have a META tag like <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I don't understand what I am doing wrong here. Please help me.

check your browser encoding and set it UTF-8
try this
in web.xml
<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>bt.gov.g2c.framework.common.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>

Followed mkyong url, It says.
For UTF-8 or non-English characters, for example Chinese , you should encode it with native2ascii tool.
With the help of native2ascii tool
farequoteautomatic.entry-area.gen.emd.fareamount=Ücret Miktarı
Converted to
farequoteautomatic.entry-area.gen.emd.fareamount=\ufeff\u00dccret Miktar\u0131
And at the browser i got desired out put that is Ücret Miktarı

Related

JSP not showing correct UTF-8 contents for HTML form POST

I'm using Java 11 with Tomcat 9 with the latest JSP/JSTL. I'm testing in Chrome 71 and Firefox 64.0 on Windows 10. I have the following test document:
<%# page contentType="text/html; charset=UTF-8" %>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8"/>
<title>Hello</title>
</head>
<body>
<c:if test="${not empty param.fullName}">
<p>Hello, ${param.fullName}.</p>
</c:if>
<form>
<div>
<label>Full name: <input name="fullName" /></label>
</div>
<button>Say Hello</button>
</form>
</body>
</html>
This is perhaps the simplest form possible. As you know the form method defaults to get, the form action defaults to "" (submitting to the same page), and the form enctype defaults to application/x-www-form-urlencoded.
If I enter the name "Flávio José" (a famous Brazilian forró singer and musícian) in the field and submit, the form is submitted via HTTP GET to the same page using hello.jsp?fullName=Fl%C3%A1vio+Jos%C3%A9. This is correct, and the page says:
Hello, Flávio José.
If I change the form method to post and enter the same name "Flávio José", the form contents are instead submitted via POST, with HTTP request contents:
fullName=Fl%C3%A1vio+Jos%C3%A9
This also appears correct. But this time the page says:
Hello, Flávio José.
Rather than seeing %C3%A as a sequence of UTF-8 octects, JSP seems to think that these are a series of ISO-8859-1 octets (or code page 1252 octets), and is therefore decoding them to the wrong character sequence.
But where is it getting ISO-8859-1? What is my JSP page lacking to indicate the correct encoding?
I'll note also that WHATWG specification says that application/x-www-form-urlencoded octets should be parsed as UTF-8 by default. Is the Java servlet specification simply broken? How do I work around this?
This is caused by Tomcat, but the root problem is the Java Servlet 4 specification, which is incorrect and outdated.
Originally HTML 4.0.1 said that application/x-www-form-urlencoded encoded octets should be decoded as US-ASCII. The servlet specification changed this to say that, if the request encoding is not specified, the octets should be decoded as ISO-8859-1. Tomcat is simply following the servlet specification.
There are two problems with the Java servlet specification. The first is that the modern interpretation of application/x-www-form-urlencoded is that encoded octets should be decoded using UTF-8. The second problem is that tying the octet decoding to the resource charset confuses two levels of decoding.
Take another look at this POST content:
fullName=Fl%C3%A1vio+Jos%C3%A9
You'll notice that it is ASCII!! It doesn't matter if you consider the POST HTTP request charset to be ISO-8859-1, UTF-8, or US-ASCII—you'll still wind up with exactly the same Unicode characters before decoding the octets! What encoding is used to decode the encoding octets is completely separate.
As a further example, let's say I download a text file instructions.txt that is clearly marked as ISO-8859-1, and it contains the URI https://example.com/example.jsp?fullName=Fl%C3%A1vio+Jos%C3%A9. Just because the text file has a charset of ISO-8859-1, does that mean I need to decode %C3%A using ISO-8859-1? Of course not! The charset used for decoding URI characters is a separate level of decoding on top of the resource content type charset! Similarly the octets of values encoded in application/x-www-form-urlencoded should be decoded using UTF-8, regardless of the underlying charset of the resource.
There are several workarounds, some of them found at found by looking at the Tomcat character encoding FAQ to "use UTF-8 everywhere".
Set the request character encoding in your web.xml file.
Add the following to your WEB-INF/web.xml file:
<request-character-encoding>UTF-8</request-character-encoding>
This setting is agnostic of the servlet container implementation, and is defined forth in the servlet specification. (You should be able to alternatively put it in Tomcat's conf/web.xml file, if want a global setting and don't mind changing the Tomcat configuration.)
Set the SetCharacterEncodingFilter in your web.xml file.
Tomcat has a proprietary equivalent: use the org.apache.catalina.filters.SetCharacterEncodingFilter in the WEB-INF/web.xml file, as the Tomcat FAQ above mentions, and as illustrated by https://stackoverflow.com/a/37833977/421049, excerpted below:
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This will make your web application only work on Tomcat, so it's better to put this in the Tomcat installation conf/web.xml file instead, as the post above mentions. In fact Tomcat's conf/web.xml installations have these two sections, but commented out; simply uncomment them and things should work.
Force the request character encoding to UTF-8 in the JSP or servlet.
You can force the character encoding of the servlet request to UTF-8, somewhere early in the JSP:
<% request.setCharacterEncoding("UTF-8"); %>
But that is ugly, unwieldy, error-prone, and goes against modern best practices—JSP scriptlets shouldn't be used anymore.
Hopefully we can get a newer Java servlet specification to remove any relationship between the resource charset and the decoding of application/x-www-form-urlencoded octets, and simply state that application/x-www-form-urlencoded octets must be decoded as UTF-8, as is modern practice as clarified by the latest W3C and WHATWG specifications.
Update: I've updated the Tomcat FAQ on Character Encoding Issues with this information.

How can I fix the '�' symbols on my page? It seems to be something happening between the servlet and the Javascript.

My servlet is giving me a CSV file with chars like 'é', 'á' or 'õ'. When I open the servlet via browser, it works fine. But when the information get to my page, it's all �.
I tried changing the coding in the actual database, but it didn't work.
console.log(csv), where csv is the information from the servlet, and it's also full of '�'s.
I'm not even sure where the problem is, because the actual page has several latin characters which are presented properly.
I'm using Microsoft SQL '08, tried both nvarchar and varchar fields.
The server has to be instructed to use UTF-8 to decode the JSP output. This can on a per-JSP basis be done by
<%#page pageEncoding="UTF-8" %>
or on an application-wide basis by
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
refer here for more.
What Ankur posted seems to fix most problems, but I would still have issues from time-to-time with encoding. I really hope you don't have to do this, but I was forced to make a little decode function to replace problematic characters (by using their '\u' code) with web-safe ones.
It seems that the encoding for the page is actually ISO-8859-1. While searching around stackoverflow for the terms mentioned by users I found this:
jQuery $.get() charset of reply when no header is set?
The error happens because there's no header whatever Ajax is getting, setting this up before the $.get command solves this:
$.ajaxSetup({
'beforeSend' : function(xhr) {
xhr.overrideMimeType('text/html; charset=ISO-8859-1');
},
});
Now, why is the server not configured for a more practical charset, I don't know.

How to solve UTF-8 in java

I currently use
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
in my jsp page.
And when I get data from textbox using request.getParameter("..."); it retrieves data like that öÉ?É?É?öİ . I saw this problem when I used characters that are not english chars. I add URIEncoding="UTF-8" to server.xml in tomcat. But it retrieved the same (öÉ?É?É?öİ). How to solve it?
Thank you
EDIT
Thanks for your answers. I tried a few things, but nothing has fixed the problem.
Here's what I've done:
I added <Connector URIEncoding="UTF-8" .../> in server.xml.
<meta ... charset=utf-8> tag is ok and I tried request.setCharacterEncoding("UTF-8");
I also tried <filter> tag in web.xml
None of these actions fixes the problem. I'm wondering if there's something else wrong with this...(remembering: I used <form method='post'>. I click submit button and when I get data using request.getParameter("..") the format of this data is not the correct format. )
You can try this code in your Servlet
if(request.getCharacterEncoding() == null) {
request.setCharacterEncoding("UTF-8");
}
May be because the actual character encoding is not UTF-8 ? If the characters itself are encoded in some other format then we just can't label them as UTF-8.
Try decoding them by giving various charset and see which one gives proper result. I think the input character encoding is latin1(ISO-8859-1). If yes, follow below code
String param1 = request.getParameter("...");
if(param1!=null)
{
param1 = new String(param1.getBytes("ISO-8859-1"));
}
UTF 8 should be set at all the layers of the application.
Do following
1) HTML Code
<meta contentType="text/html; charset="UTF-8"/>
2) Browser Setting for IE
View -- Encoding -- Unicode (UTF-8)
3) Tomcat Server
server.xml - In Connector tag added "URIEncoding" attribute as
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8"/>
catalina.sh/catalina.bat - added following
set JAVA_OPTS=--Xms256m -Xmx1024m -Xss268k -server -XX:MaxPermSize=256m -XX:-UseGCOverheadLimit -Djava.awt.headless=true -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
4) MIME type of response should be "application/x-www-form-urlencoded"
There is another place you can check. Did you include following declaration in your JSP file?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I think the problem is that browser still sends requests using default ISO-8859-1, which is the standard charset if not declared.
You can also check the HTTP headers received from server to make sure the correct charset is sent back.
Essentially the cleanest way to do it is to use Unicode in your property files and/or code if need be (not adviced).
This way you avoid all encoding issues, since your programm only has deal with ASCII code, the proper reprenstation is then handeled entierly by the client side and you do not have to worry about the standard os encoding or enviorment encoding.
You can also try adding the following filter at web.xml:
<filter>
<filter-name>Character Encoding Filter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
Hope this help
You should try it
String content= request.getParameter("content");
if(content!=null)
content = new String(content.getBytes("ISO-8859-1"));

xtag issue with special characters

I'm using x tag to parse through an xml that has special characters such as é Here is my xml
<stack>
<data title="thé"/>
</stack>
here is the xtag that prints out the output
<x:out select="#title" />
the view source of the page displays this output
theé
and visually this is displayed by the browser
theé
What am I doing wrong and how do I fix this issue?
Since the source view shows the character correctly, the problem is probably not with your JSTL XML tag expression. Instead, it might have to do with the content-type that the page is labeled with.
Single non-ASCII characters getting rendered as two characters (the first is typically an A with some sort of accent) is a pretty sure sign that UTF-8 content is getting treated as ISO-8859-1, or something similar. I'm not an expert in this area, but the browser needs to be told that the content you're serving is in UTF-8. So check the meta content-type of your output. It should specify UTF-8:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >

transfer UTF8 input from a JSP form to a Spring controller breaks umlauts [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
UTF-8 encoding and http parameters
I have a UTF8 encoded JSP with a pure UTF8 header (and the text file is also encoded as UTF-8) and a form inside that page:
<?xml version="1.0" encoding="UTF-8" ?>
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> </head>
<body>
This is a funny German character: ß
<form action="utf.do" method="post">
<input type="text" name="p" value="${p}" />
<input type="submit" value="OK"/>
</form>
Then I have a nice Spring-backed #Controller on the backend:
#Controller
public class UTFCtl {
#RequestMapping("/utf.do")
public ModelAndView handleUTF(#RequestParam(value="p", required=false) String anUTFString) {
ModelAndView ret = new ModelAndView("utf");
ret.addObject("p", anUTFString);
return ret;
}
}
As you see the form transports its data via POST. Typing some German umlauts into the form field yields a bunch of crumbled characters at the backend. So submitting hähöhü on the form field yields hähöhü as value after submitting. I used the debugger and the var value is already scrambled meaning that Spring/Tomcat/Servlet hasn't detected the encoding correctly or the browser didn't encode my input correctly. The colleagues' usual response to that is: encode in ISO for Germany or encode using Javascript before transmitting. This shouldn't be neccessary, should it?? I mean, this is 2011 and that's what UTF8 is good for!
[EDIT] I think this is proving that the input is incoming as ISO even though I tell him to use UTF8:
byte[] in = anUTFString.getBytes("iso-8859-1");
String out = new String(in,"UTF-8");
out is then displayed correctly in the JSP!
I'm using Spring 2.5 on Tomcat 5.5 with Firefox 4 beta 11 on a Windows XP SP3 box. I already told the Tomcat in its to use URIEncoding="utf-8" but that doesn't change the game. I analysed the Firefox transmissions using Firebug and it seems to transmit UTF8. I also checked the current Spring WebMVC setup and IMO there are no further encoding changers anywhere, not in the config, nor in the web.xml (no listeners, nothing)- I read and understood most of the UTF-8 related docs and I worked like that in a PHP environment without any problems (simply switching PHP to utf-8, done)...
So, indeed it's a matter of the server settings, too. Please note the duplication comment beneath the question. You have to tell your server as well as your deployment to use utf-8 and then everything's fine (pretty much like in PHP). Please note, that I'm duplicating the answer here (http://ibnaziz.wordpress.com/2008/06/10/spring-utf-8-conversion-using-characterencodingfilter/).
This works in a Tomcat environment:
edit your Tomcat's server.xml Connectors to deliver UTF-8:
<Connector URIEncoding="utf-8" port="8080" blabla="blabla" ... >
Then add to your web.xml:
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
this will tell the Spring framework to apply the UTF-8 filter for all kinds of requests (/*). After applying this you can even have links in the format ?q=äöüß which will be transported correctly. Though it's better to encode parameters for request transport:
URLEncoder.encode(aParameterWithUmlaut,"UTF-8")

Categories