Display chinese characters in AJAX and also in excel - java

I am currently developing a Java Struts application and i been wondering how is it possible to display chinese characters in AJAX and also when generating excel file by servlet response.
Anyone can share any pointers?

As long as you use UTF8 for your character encoding, I think you should be fine. If UTF8 is not an option, and you're going to display only Chinese, you can even try one of the Chinese specific character encodings.
I'll try and get you more details, but at the very least, to set the character encoding, you'll have to do:
response.setCharacterEncoding("UTF-8");
or
response.setContentType("text/plain; charset=UTF-8");
Refer to the Java EE Specs (http://java.sun.com/j2ee/1.4/docs/api/index.html) for what these mean exactly.

Related

Java changes Cyrillic to unicode like \uXXXX

I am making an application in Java, that will log into my school diary using web api, so I will be able to make my own UI. As the title says, Java at some moment changes the cyrillic to unicode like \uXXXX symbolds. Here is the code on the Russian Stackoverflow: https://ru.stackoverflow.com/questions/1452959/%d0%a1%d0%b5%d1%80%d0%b2%d0%b5%d1%80-%d0%be%d1%82%d0%b2%d0%b5%d1%80%d0%b3%d0%b0%d0%b5%d1%82-%d0%b7%d0%b0%d0%bf%d1%80%d0%be%d1%81. Try to translate it, to understand more. When I am sending my request to https://httpbin.org/post instead of my LOGIN_URL with cyrillic symbols it returns them transformed, if I send request with ascii symbols, I get them back, and, in the linked post I mentioned the python project, which does exactly the same thing I want. And when I modify it to make it send request to httpbin, the cyrillic symbols are returned back! What do I do to fix my java code? P.S. Currently I am switched to okhttp3 from apache http client (same problem), but, I can go back.
Well, I solved my problem. It consisted not in the character encoding, but in the absence of two http headers, namely
httpPost.setHeader("X-Requested-With", "XMLHttpRequest");
httpPost.setHeader("Referer", Constants.BASE_URL);
(added to login request)

Display special characters using entity or hex values

I am trying to display ŵ through my jsf page but unable to do so. Basically the text with special characters is read from properties file , but on my application screen it becomes something else . I did try to use entity values but not succeeding for example if original text is :
ŵyhsne klqdw dwql
then after replacing with with entity or hexvalues:
**&wcirc ;**yhsne klqdw dwql but in my page it displays as it is
I can just guess your question. Please edit it and improve it.
If you are displaying in web, you should use ŵ (note: without spaces), but this also requires a fonts on client site that support such character.
If the string is in your code: replace the character with \u0175.
But probably the best way it is to use just ŵ either in code on in web, or on any file, and you should assure that such files (or sources) are interpreted ad UTF-8, and you deliver pages are UTF-8. If you are not using UTF-8, just check in similar way, that you are using consistently the correct encoding.
And sending a character doesn't mean it could be displayed. There is always the possibility that a font will not have all *special" characters in it.

How to convert UTF-8 string to japanese or any other language IText PDF?

I am parsing the XML using UTF-8 encoding which has some Chinese,japanese or kannada I am able to display the languages in the screen(HTML) page successfully but I wanted to generate the PDF I see only English...
I came across the Fonts but worried like have fonts for each and every language(sorry if am wrong).
In debug mode I am able to see Chinese and other languages in variable when it is converted to UTF -8 I see ?????????.
new String(myString.getBytes(Charset.forName("ISO-8859-1")),
Charset.forName("UTF-8"))
Please help to display any languages in IText pdf
NOTE:I am parsing the XML using UTF-8,When I fetch from DB without using UTF or anything I am able to print in excel...in PDF I think I should use fonts.
UTF-8 is able to represent text in all languages. ISO-8859-1 is only able to represent text in English and most text in a handful of European languages.
If you are converting text to ISO-8859-1 and then storing it as UTF-8 you are breaking support for text in other languages not supported by the limited subset of ISO-8859-1. Keep it in a Unicode form (eg. UTF-8).
As has been mentioned in the comments, Java strings are internally Unicode-compatible (they use UTF-16 internally) and so there is no need for any conversion, even to UTF-8, to fully support all languages. You would only need to convert if you need to do so for whatever you are using to export to PDF, but it doesn't seem like you've specified what that is.

How to encode special characters for a POST with Spring/Roo

I'm using Spring/Roo for an app server, and need to be able to post some special characters. Specifically, characters like the Yen symbol, or Euro symbol. When I receive these characters on my server, and display them in console, they appear as "?". How can they be properly encoded and received?
Try configuring src/main/resources/META-INF/spring/database.properties to this :
database.url=jdbc:mysql://[YOUR_DB_SERVER]:3306/[YOUR_DB_NAME]?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8
There are a couple of possible failure points here.
First, I'd check to see if the console supports the characters in question:
if the default encoding used by the JVM does not support the characters, they will be turned into question marks by System.out
if the console font does not support the characters, they will not be rendered properly
if the console is decoding the bytes using a different encoding to the one System.out is encoding them to, the characters will not display correctly
Instead of trying to print characters as literal, cast to int and print the hex value - then check the value against the Unicode charts.
Lossy or incorrect conversions can also happen between the browser and the server. Ideally, the server should use UTF-8 for encoding and decoding. If the encoding used by the browser when it encodes the data does not support the characters, they will be lossily encoded; the browser usually picks an encoding based on the encoding sent by the server for the GET request (or more rarely from a form attribute). Inspect the Accept-Charset header being sent with your data (you can do this with something like Firebug or Fiddler). I don't know anything about Roo, but there's bound to be some mechanism to configure encodings.

a question related to URL

Dear all,Now i have this question in my java program,I think it should be classified as URL problem,but not 100% sure.If you think I am wrong,feel free to recategorize this problem,thanks.
I would state my problem as simply as possible.
I did a search on the famouse Chinese search engine baidu.com for a Chinese key word "奥巴马" (Obama in English),and the way I do that is to pass a URL (in a Java Program)to the browser like:
http://news.baidu.com/ns?word=奥巴马
and it works perfectly just like I input the "奥巴马”keyword in the text field on baidu.com.
However,now my advisor wants another thing.Since he can not read the Chinese webpages,but he wants to make sure the webpages I got from Baidu.com is related to "Obama",he asked me to google translate it back,i.e,using google translate and translate the Chinese webpage to English one.
This sounds straightforward.However,I met my problem here.
If I simply pass the URL "http://news.baidu.com/ns?word=奥巴马" into Google Translate and tick "Chinese to English" translating option,the result looks awful.(I don't know the clue here,maybe related to Chinese character encoding).
Alternatively,if now my browser opens ""http://news.baidu.com/ns?word=奥巴马" webpage,but I click on the "百度一下" button (that simply means "search"),you will notice the URL will get changed,now if I pass this URL into the Google translate and do the same thing,the result works much better.
I hope I am not making this problem sound too complicated,and I appologize for some Chinese words invovled,but I really need your guys' help here.Becasue I did all this in a Java program,I couldn't figure out how to realize that "百度一下"(pressing search button) step then get the new URL.If I could get that new URL,things are easy,I could just call Google translate in my Java code,and pops out the new window to show my advisor.
Please share any of your idea or thougts here.Thanks a lot.
Robert
You could use
URLEncoder.encode("http://news.baidu.com/ns?word=奥巴马", "utf-8")
then pass the resulting URL to Google Translate like:
http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&sl=zh-CN&tl=en&u=YOUR_URL
Cheers
When you press the search button, the browser encodes the search term into %E5%A5%A5%E5%B7%B4%E9%A9%AC, which is the UTF-8 encoding for 奥巴马. It does this because UTF-8 is the default encoding for HTML forms.
Java uses a UTF-16 encoding internally, so it’s possible that the URL library builds a request in that encoding if you do not specify anything.
However, I could not reproduce your problem with Google translate — pasting that URL appeared to work correctly no matter how I did it.
Try calling
URLEncoder.encode("http://news.baidu.com/ns?word=奥巴马", "utf-8")
(or utf-16; I'm not quite familiar with the Chinese characters representation)
URLs can contain only ASCII characters. All other characters must be converted to bytes then %-encoded in ASCII. However there is no mandate on what charset is used to convert chars to bytes. UTF-8 is recommended, but not required. As long as a server expresses its preference on charset, the client should respect that and use the same charset for encoding.
You can see from page info that baidu uses gb2312 encoding. The characters 奥巴马 in a form on its page will be converted to bytes in gb2312: B0C2 B0CD C2ED, then %-encoded to %B0%C2%B0%CD%C2%ED. That is what actually sent to baidu server, http://www.baidu.com/s?wd=%B0%C2%B0%CD%C2%ED
Your OS happens to be configured to use gb2312 by default, therefore when you paste http://news.baidu.com/ns?word= 奥巴马 to the browser, browser does the same thing, and baidu gets the correct chars. When I paste that URL in my browser, it screws up, because my OS uses UTF-8, and the browser encodes these chinese characters in UTF-8, not something baidu expectes. (when entering a URL directly in a browser, the browser may not have communicated to the server and does not know the charset the server prefers, therefore the browser uses platform default charset)
Now, Google uses UTF-8. That's why if you paste the URL to google form, it will screw up just like on my OS. The chars are encoded in UTF-8, and baidu will try to parse it as gb2312, and gets totally wrong words.
Solution is easy. Just encode the parameter in the way that the server expects:
"http://news.baidu.com/ns?word=" + URLEncoder.encode("奥巴马", "gb2312")

Categories