I'm just using:
NumberFormat cfLocal = NumberFormat.getCurrencyInstance(Locale.JAPAN.toString());
And it works fine on most devices/browsers/currencies except in IE and Yen I'm getting a few extra characters - could it be a weird encoding being sent, or browser specific settings screwing up handling of the ¥ symbol?
The output looks like this:
ï¿¥15,180
Would appreciate any leads or tips.
Edit:
I am outputting the values with JSP. JSP file is defined with this preamble:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
I'm no encoding expert but your XML appears to say one thing and your content-type another - try setting both to UTF-8.
If your data is coming from outside the application (e.g. a database, file, etc.), what is the encoding of the source? For example, a MySQL database may have a different character set specified.
If you are using a web server like Apache, is that changing the encoding? For example you can have a httpd.conf directive to set the default character set:
AddDefaultCharset utf-8
It would be worth checking the HTTP Headers in the browser to see what is actually being sent to the browser, and work back from there.
EDIT
Thinking about it more, I'm not sure if the XML encoding is necessarily the problem. It would probably be best to check the headers first, and compare it to the the html being produced.
Related
i have a text file with WINDOWS-1252 characters like ø and ß. the file is being uploaded via form submit to a servlet, where it's being parsed with opencsv and returned as a List object to a jsp page where it's displayed.
the utf-8 chars are displayed as ? and i'm trying to figure out where along the way the encoding might have gone wrong.
i've tried a bunch of stuff:
my page has the tag <%#page contentType="text/html" pageEncoding="WINDOWS-1252"%>
file input is encoded - new FileInputStream(file), "WINDOWS-1252")
every string is encoded - s = new String(s.getBytes("WINDOWS-1252"));
where else can the encoding fail? any ideas?
Some troubleshooting suggestions:
Debug print or otherwise examine the text as hex at various phases, and verify that encoding really is what you expect it to be.
Make sure there is no BOM (Byte Order Marker), and see this question and links in it if there is and you don't have an easy way to get rid of it: Reading UTF-8 - BOM marker
OK problem is fixed.
So the first problem was that it wasn't a utf-8 file at all but a WINDOWS-1252 one. i determined that using the juniversalchardet lib (very helpful and easy-to-use).
Then i had to make sure that i'm reading the file with the right charset by using a FileInputStream:
new FileInputStream(file), "WINDOWS-1252")
the i just had to make sure that i am displaying it with the right charset in the jsp file using the tag <%#page contentType="text/html" pageEncoding="WINDOWS-1252"%>
that's pretty much it-
(1) determine charset
(2) make sure you're reading the file right
(3) make sure you display it right
So I'm redirecting my user using GET in a naive way:
response.sendRedirect("/path/index.jsp?type="+ e.getType()
+"&message="+ e.getMessage());
And this was working fine until I had to send messages, as actual text to be shown to users. The problem is if the message has non-ASCII characters in it. My .jsp files are encoded in UTF-8:
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
So all non-ASCII characters in 'message' gets garbled. I don't want to set my JVM default encoding to UTF-8, so how do I solve this? I tried to use
response.setCharacterEncoding("UTF-8");
on the Servlet before redirecting, but it doesn't work. when I try to execute:
out.print(request.getCharacterEncoding());
on my .jsp file it prints 'null'.
The sendRedirect() method doesn't encode the query string for you. You've to do it yourself.
response.sendRedirect("/path/index.jsp?type=" + URLEncoder.encode(e.getType(), "UTF-8")
+ "&message=" + URLEncoder.encode(e.getMessage(), "UTF-8"));
You might want to refactor the boilerplate to an utility method taking a Map or so.
Note that I assume that the server is configured to decode the GET request URI using UTF-8 as well. You didn't tell which one you're using, but in case of for example Tomcat it's a matter of adding URIEncoding="UTF-8" attribute to the <Context> element.
See also:
Unicode - How to get the characters right?
Unrelated to the concrete problem, the language="java" is the default already, just omit it. The contentType="text/html; charset=UTF-8" is also the default already when using JSP with pageEncoding="UTF-8", just omit it. All you really need is <%# page pageEncoding="UTF-8"%>. Note that this does effectively the same as response.setCharacterEncoding("UTF-8"), so that explains why it didn't have effect. The request.getCharacterEncoding() only concerns the POST request body, not the GET request URI, so it is irrelevant in case of GET requests.
Thanks you ... When i am using the response.sendRedirect("/path/index.jsp?type=" + URLEncoder.encode(e.getType(), "UTF-8"), My problem got fixed...
When we are using the response.sendRedirect(): We should encode the URL by the URLEncoder.encode() function then only.. it will be encoded correctly..
Thanks again...
I am using PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.
Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.
How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!
Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):
Add the following line to top of all JSPs:
<%# page pageEncoding="UTF-8" %>
This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.
Create a Filter which does the following in doFilter() method:
request.setCharacterEncoding("UTF-8");
Map this on the FacesServlet like follows:
<filter-mapping>
<filter-name>nameOfYourCharacterEncodingFilter</filter-name>
<servlet-name>nameOfYourFacesServlet</servlet-name>
</filter-mapping>
This sets the request encoding of all JSF POST requests to UTF-8.
This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.
See also:
Unicode - How to get the characters right?
When I add a Filter to a particular JSP file, the Arabic characters in the output appears like ???, even when the page encoding is been set to UTF-8 by <% #page pageEncoding="UTF-8"%> and <% response.setCharacterEncoding("UTF-8");%>.
The strange thing is, before I added the Filter, the output of all Arabic pages appears with correct encoding. Can someone tell how this problem is caused and how I can solve it?
The filter is either directly or indirectly commiting the response and/or accessing the Writer or OutputStream of the HttpServletResponse which causes that the encoding cannot be changed anymore in the JSP. Fix the code in the filter accordingly. The filter should in any way not be writing anything to the response body. There the JSP (for HTML) or Servlet (for other content) is for.
By the way, you don't need to call <% response.setCharacterEncoding("UTF-8");%>. The <%#page pageEncoding="UTF-8"%> already implicitly does that.
On the web pages in our app, the trademark symbol (TM) is appearing as a questions mark. The registered trademark (R) works, though. We are displaying the value using the c:out tag in the JSP standard library. If I put ™ or ™ on the page to test this, those show up as they are supposed to.
<td><c:out value="${item.description}"/></td> <!-- does not work -->
<td>yada yada yada Spiros™ yada yada yada</td> <!-- works -->
To add to this, we're also using YUI, and before we display these pages, they show up in a YUI data table as the results of a query (the user clicks on a row to go to the page described above). The (TM) shows up properly in that table. That tells me that we are properly fetching the value from our database, and as well the server code generating the XML to send back to the YUI data table also works.
So why is the same String displayed properly in the YUI data table, but not in a normal JSP, unless we hardcode the symbol onto the page?
You probably have an encoding issue. If you do not have an explicit encoding in your JSP:
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
then it's time to add one. Try UTF-8 and if that doesn't work try ISO-8859-1 ... or if you know the correct encoding, use that.
When a char appears as ? inside a browser (usually Firefox) it means that page encoding (as it's detected by the browser will not recognize the char. A good test would be to View->Character Encoding->UTF-8 in firefox. If the char appears correctly then it means that the (tm) char is encoded using UTF-8 standard. You have to instruct your page to set the response encoding header to UTF-8. This should work right now for you.
If that would not work you should first find out how is the character encoded (look at what encoding is read from the database for example) and try to set the page encoding header to that encoding.
The second format works because the (TM) char is encoded as a known html entity which the browser interprets regardless of the page encoding.