JAVA:How to store gujrati(its other language) font in string?

JAVA:How to store gujrati(its other language) font in string? - java

I have an application in java.
This application contains one text box and button.
Now I want to save Gujrati(other language)data in to database in click event of button.
How is it possible? Actually I done this think but my string return some other format.
So i don't know how to store gujrati data in to string ?

This works in Java as long as source code is in a Unicode-defined encoding, such as UTF-8 or UTF-16:
String ગઉજ = "ઋઊઘ";
That part solved, you need to specify where exactly your problem lies.

Java works in Unicode. Gujarati characters have unicode values as shown here
You can directly store them in a string. However if you can't directly take Gujrati input you can use the character class like this
int c = 0x0A82;
String s = Character.toString((char)c);
//s is ં
And so on

There are some changes that you should make in you website
In JSP file change
<%# page language="java" contentType="text/html; charset=UTF-16" pageEncoding="UTF-16"%>
and in .property file
save your text in uni-code format
Like this(here language of uni-code character is Gujarati
global.Name = \u0AA8\u0ABE\u0AAE
This will surely work as it worked for me in struts2
output will be :
નામ

Related

JAXB: can not read Japanese characters properly

I am having a program which supports internationalisation. I have entries where input is provided in Japanese characters. On exporting that entry in XML, using JAXB, Japanese characters looks fine in the file. Proper character is been exported in the XML file. I am facing issue when unmarshal that XML file to get back data as Java object. I am not get proper unmarshalled value of japanese character.
Here is my marshalling code:
OutputStreamWriter outputWriter = new OutputStreamWriter(new FileOutputStream(file), "UTF-8");
JAXB.marshal(xmlobj, outputWriter);
Unmarshalling code:
InputStreamReader inputReader = new InputStreamReader(xml, "UTF-8");
xmlobj = JAXB.unmarshal(inputReader, <JAVA_CLASS_TO_UNMARSHAL>);
The text I am marshalling-unmarshalling is: 説明_1
It displays correctly on fetching this record and display it to browser, but in case of JAXB unmarshalling incorrect value is displayed. After converting it to HTML compatible code I got value 説明_1, which is actually correct conversion of Japanese characters. And it should appear as proper character on the browser, but it does not do so. It displays as HTML codes 説明_1 to the browser.
Any guess where I am doing wrong?

If the HTML contains
<html>
<body>
説明_1<br>
</body>
</html>
and good browser like Firefox (I have 31.0) should display 説明_1. Can you add the HTML section to your question?
If your browser isn't fit to display these characters, you should see something like .
You report that you see 説明_1, which is possible if your HTML text contains
&#35500;&#26126;_1<br>
which would mean that the transformation to HTML hasn't worked correctly.
Once more: check your HTML code, and how it is produced from the XML.

Try using UTF-8 in your HTML Header. Note that just changing the charset in the header won't convert the content — you need to make sure that the content is actually UTF-8 as well.
<Meta http-equiv = "Content-Type" content = "text / html; charset = UTF-8" >

Comment specified by Wundwin Born has solved the issue. I forgot to unescape string.
Here is the code snippet.
org.apache.commons.lang.StringEscapeUtils.unescapeHtml(xmlString);

displaying WINDOWS-1252 encoded text from file as html

i have a text file with WINDOWS-1252 characters like ø and ß. the file is being uploaded via form submit to a servlet, where it's being parsed with opencsv and returned as a List object to a jsp page where it's displayed.
the utf-8 chars are displayed as ? and i'm trying to figure out where along the way the encoding might have gone wrong.
i've tried a bunch of stuff:
my page has the tag <%#page contentType="text/html" pageEncoding="WINDOWS-1252"%>
file input is encoded - new FileInputStream(file), "WINDOWS-1252")
every string is encoded - s = new String(s.getBytes("WINDOWS-1252"));
where else can the encoding fail? any ideas?

Some troubleshooting suggestions:
Debug print or otherwise examine the text as hex at various phases, and verify that encoding really is what you expect it to be.
Make sure there is no BOM (Byte Order Marker), and see this question and links in it if there is and you don't have an easy way to get rid of it: Reading UTF-8 - BOM marker

OK problem is fixed.
So the first problem was that it wasn't a utf-8 file at all but a WINDOWS-1252 one. i determined that using the juniversalchardet lib (very helpful and easy-to-use).
Then i had to make sure that i'm reading the file with the right charset by using a FileInputStream:
new FileInputStream(file), "WINDOWS-1252")
the i just had to make sure that i am displaying it with the right charset in the jsp file using the tag <%#page contentType="text/html" pageEncoding="WINDOWS-1252"%>
that's pretty much it-
(1) determine charset
(2) make sure you're reading the file right
(3) make sure you display it right

Display content with the same format i entered into the text area in java or jquery

I have a text area and enter some content in this text-area I want to display the content on jsp page in the same format in which I have entered the content in text-area. So its like I entered the text in text-area then save it in mysql database then retrieve it from database and display it on Jsp page. I google it also but can't find the solution.
I am using Spring MVC with mysql database.

Enclose the text into a pre tag:
<pre><c:out value="${theTextFromDatabase}"/></pre>
The c:out JSTL tag is necessary to HTML-escape all the HTML special chars: < become <, > becomes >, & becomes &, etc.

i got the answer to remove the Textarea Problem,Now there is no need to add Tag in Textarea.
just append the value which coming from textarea.like this.i did it & it is working well.
String ques=request.getParameter("ques");
StringBuffer sb=new StringBuffer();
sb.append("<pre>"+ques+"</pre>");
String ss=sb.toString();

When you say in the same format you mean bold, italic, eol, and so on? The text is saved in your database with the format that you have introduced ? . I think that you need some kind of plugin wysiwyg ( what you see is what you get) like TinyMce (http://www.tinymce.com/) or save the correct format in your database and then process the text to format it according the text you received from your database.

Jsoup Whitelist: Parsing non-english character

I am trying to clean HTML text and to extract plain text from it using Jsoup. The HTML might contain non-english character.
For example the HTML text is:
String html = "<p>Á <a href='http://example.com/'><b>example</b></a> link.</p>";
Now if I use Jsoup#parse(String html):
String text = Jsoup.parse(html).text();
It is printing:
Á example link.
And if I clean the text using Jsoup#clean(String bodyHtml, Whitelist whitelist):
String text = Jsoup.clean(html, Whitelist.none());
It is printing:
Á example link.
My question is, how can I get the text
Á example link.
using Whitelist and clean() method? I want to use Whitelist since I might be needed to use Whitelist#addTags(String... tags).
Any information will be very helpful to me.
Thanks.

Not possible in current version (1.6.1), jsoup print Á as Á because the entity escaping feature, there is no "don't escape" mode now (check Entities.EscapeMode).
You can 1. unescape these HTML entities, 2. extend jsoup's source code by adding a new escape mode with an empty map.

JSPs and trademark symbol

On the web pages in our app, the trademark symbol (TM) is appearing as a questions mark. The registered trademark (R) works, though. We are displaying the value using the c:out tag in the JSP standard library. If I put ™ or ™ on the page to test this, those show up as they are supposed to.
<td><c:out value="${item.description}"/></td> <!-- does not work -->
<td>yada yada yada Spiros™ yada yada yada</td> <!-- works -->
To add to this, we're also using YUI, and before we display these pages, they show up in a YUI data table as the results of a query (the user clicks on a row to go to the page described above). The (TM) shows up properly in that table. That tells me that we are properly fetching the value from our database, and as well the server code generating the XML to send back to the YUI data table also works.
So why is the same String displayed properly in the YUI data table, but not in a normal JSP, unless we hardcode the symbol onto the page?

You probably have an encoding issue. If you do not have an explicit encoding in your JSP:
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
then it's time to add one. Try UTF-8 and if that doesn't work try ISO-8859-1 ... or if you know the correct encoding, use that.

When a char appears as ? inside a browser (usually Firefox) it means that page encoding (as it's detected by the browser will not recognize the char. A good test would be to View->Character Encoding->UTF-8 in firefox. If the char appears correctly then it means that the (tm) char is encoded using UTF-8 standard. You have to instruct your page to set the response encoding header to UTF-8. This should work right now for you.
If that would not work you should first find out how is the character encoded (look at what encoding is read from the database for example) and try to set the page encoding header to that encoding.
The second format works because the (TM) char is encoded as a known html entity which the browser interprets regardless of the page encoding.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JAVA:How to store gujrati(its other language) font in string? - java

This works in Java as long as source code is in a Unicode-defined encoding, such as UTF-8 or UTF-16: String ગઉજ = "ઋઊઘ"; That part solved, you need to specify where exactly your problem lies.

Java works in Unicode. Gujarati characters have unicode values as shown here You can directly store them in a string. However if you can't directly take Gujrati input you can use the character class like this int c = 0x0A82; String s = Character.toString((char)c); //s is ં And so on

Related

JAXB: can not read Japanese characters properly

displaying WINDOWS-1252 encoded text from file as html

Display content with the same format i entered into the text area in java or jquery

Jsoup Whitelist: Parsing non-english character

JSPs and trademark symbol

Categories

Resources