Umlaut characters are not displayed properly in my JSP - java

I have a jsp which has the option for uploading a file. In my case I have uploaded a file that has the name in combination of English and umlaut characters - that will be displayed in next jsp where it displays properly for example üß_file.xls when the same code display as ?_file.xls in the higher environment ie.,test environment I had tried three options:
encoded to UTF-8 in the encoding option as the first line in my jsp.
I have changed the html:form attribute (accept character set) to UTF-8.
I have included only SetCharacter Encoded Sevlet filter which is setting response content type to UTF-8 and request .set character Encoding to UTF-8. It includes the change in web.xml with the param to force the jsp patterns to UTF-8 Encoding type.
Please suggest me some solutions to solve this issue in test environment (where it works fine in DEV and local environments).

Have you checked the encoding of the servlet-container? E.g Tomcat might use the plattform (OS) encoding which might not be UTF-8.

Related

Umlaut chars not well handled in JSP input form

After submitting an input form (JSP) in one server I can correctly see umlaut chars while in the other not and are replaced with strange one. The web.xml correctly set the encoding filter for utf8. What can be the issue? Are maybe JVM variables in play here? App is running on Pivotal Clound Foundry.

How do I set the charset portion of the Content-Type HTTP Header on an IBM HTTPD Server?

I have an application which is a set of Java Web Services and some static content (HTML, XML, JavaScript, etc.). I know that JavaScript has a limited character encoding that is possible, but HTML and XML can use various character encodings. I happen to know that all of these files are UTF-8 encoded. The WebSphere application server that I am using properly sets the Content-Type to 'text/html; charset=utf-8' for the HTML, but not for JavaScript or XML. They get the Content-Type header set to 'application/javascript' and 'text/xml' respectively. My security folks are telling me that ot specifying the charset for the XML files is a vulnerability. Remember these are static files.
On an IBM HTTPD web server (in front of the WebSphere application server) is there a directive that I can use to add the character encoding to the content type of 'text' types? On WebSphere is there a directive I can use to set the default character encoding for text types? I assume that after I "fix" this for the XML files that I will then be asked to fix it for CSS files, JavaScript files, etc. I would rather fix it once and be done.
If this question has been asked before, please provide the URL. I did find this question, but it is not the same. I am looking into the feasibility of this answer, but there are many folders and I would rather not have to remember to add a .htaccess file with this directive to each one.
You can just append AddDefaultCharset utf-8 to httpd.conf and everything will go out with that charset appended to it, even content generated by the application server. htaccess is not necessary and not useful for appserver content.
If you find you need to blacklist context roots, extensions,
or anything else, sue <LocationMatch> with AddDefaultCharset off.
Unfortunately Header edit Content-Type... will not work in IBM HTTP Server prior to V9. In V9 this allows you to easily cherry pick the current Content-Type:
Header always edit Content-Type ^(text/html)$ "$1 ; charset=utf8"
Header always edit Content-Type ^(application/javascript)$ "$1 ; charset=utf8"
Just as same as covener described:
Add the following lines into the conf/httpd.conf file:
AddDefaultCharset utf-8
AddCharset utf-8 .html .js .css
<Location />
Header always edit Content-Type ^(text/html)$ "$1; charset=utf8"
Header always edit Content-Type ^(application/javascript)$ "$1; charset=utf8"
RewriteRule ^(.*)$ $1 [R=200,L]
</Location>
and it should work.

JSP page is giving error in IE browser only

HTML1114: Codepage iso-8859-1 from (HTTP header) overrides conflicting codepage utf-8 from (META tag)
getQuotes?zip=20190&county=FAIRFAX&eff=01%2F13%2F2012&fam_income=30000.0&a0_dob=11%2F11%2F1981&a0_g=M&a0_t=true&a0_rel=self&appId=30&planId=4&changedSubsidy=%24100.98
SEC7111: HTTPS security is compromised by http://www.startssl.com/img/secured.gif
getQuotes?zip=20190&county=FAIRFAX&eff=01%2F13%2F2012&fam_income=30000.0&a0_dob=11%2F11%2F1981&a0_g=M&a0_t=true&a0_rel=self&appId=30&planId=4&changedSubsidy=%24100.98
what does this error means ?
There are two errors here.
The HTTP header says that the encoding is iso-8859-1 whereas the meta-tag in the HTML page says that it's UTF-8. Both should say the same, and should say the actual character encoding used.
You have a HTTPS page which contains an image downloaded over HTTP. So the whole page is not considered secure by IE.

Character encoding issue with linux and mysql

I am getting this string which contains this substring gratuit.AFLĂ MAI MULTEDe from webservice. when i save this in data base in my local(windows) works fine but when i try to save on server when it is deployed on linux i get following error:
Incorrect string value: '\xC4\x82 MAI...' for column 'description' at row 1
I am using hibernate 3.3 with mysql 5.5 (both windows and linux) and database usage default encoding (latin1).
I have tried setting -Dfile.encoding=UTF8 in JAVA_OPT but not worked, i think its this is os related problem.
Any suggestion?
(In general nowadays I would do all in UTF-8.) There is a long pipeline of points where encoding can be set. From the web service you get probably XML in UTF-8. That is automatically read correctly, as XML handles the encoding strict.
On database level there is the database and table and field with a default and explicit encoding. Furthermore the connection url should be parametrised to the correct encoding.
The error message shows the UTF-8 bytes for that accented A and I guess it is not available in Latin1.
For MySQL the connection string could look like:
jdbc:mysql://localhost/MYDB?useUnicode=true&characterEncoding=UTF-8

Handling Character Encoding in URI on Tomcat

On the web site I am trying to help with, user can type in an URL in the browser, like following Chinese characters,
http://localhost:8080?a=测试
On server, we get
GET /a=%E6%B5%8B%E8%AF%95 HTTP/1.1
As you can see, it's UTF-8 encoded, then URL encoded. We can handle this correctly by setting encoding to UTF-8 in Tomcat.
However, sometimes we get Latin1 encoding on certain browsers,
http://localhost:8080?a=ß
turns into
GET /a=%DF HTTP/1.1
Is there anyway to handle this correctly in Tomcat? Looks like the server has to do some intelligent guessing. We don't expect to handle the Latin1 correctly 100% but anything is better than what we are doing now by assuming everything is UTF-8.
The server is Tomcat 5.5. The supported browsers are IE 6+, Firefox 2+ and Safari on iPhone.
Unfortunately, UTF-8 encoding is a "should" in the URI specification, which seems to assume that the origin server will generate all URLs in such a way that they will be meaningful to the destination server.
There are a couple of techniques that I would consider; all involve parsing the query string yourself (although you may know better than I whether setting the request encoding affects the query string to parameter mapping or just the body).
First, examine the query string for single "high-bytes": a valid UTF-8 sequence must have two or more bytes (the Wikipedia entry has a nice table of valid and invalid bytes).
Less reliable would be to look a the "Accept-Charset" header in the request. I don't think this header is required (haven't looked at the HTTP spec to verify), and I know that Firefox, at least, will send a whole list of acceptable values. Picking the first value in the list might work, or it might not.
Finally, have you done any analysis on the logs, to see if a particular user-agent will consistently use this encoding?

Categories