How to set special character (UTF-8) in java String and xml? - java

I want to save special character (UTF-8) in java.
In my JSF page I am setting string value in model
<h:inputTextarea id="que" value="#{dataModel.question}"/>
When I am going to fetch that String in my java controller that time this give
different character eg. I have sentence like ΔLMN ≠ΔXYZ when I am printing the value
in controller that time it prints like ΔLMN ≠ΔXYZ.
In my project I am fetch the value from XML file and write the same value in XML like
option.addContent(new CDATA(new String(this.launchModel.getQuestionList().get(i).getOptionList().get(k).getOption().getBytes("UTF-8"), "UTF-8")));
How to sovle this problem..I am trting my side..

Related

Parsing text file based on a pre-defined template

I have to parse several text files in pre-defined format and get the parsed data into Java code. For eg, a file would look something like below.
12345 5abcd 18864 12585
24584 4frrf 44855 84745
98745 2rgr4 25584 36546
where the first 2 characters in each line are area code (12,24,98 etc) and characters from 3 to 5 positions are district code and similarly, each position has its own meaning.
I am looking for a Java library that would allow me to create a template something like below
$for(int i=1;i<=endOfLine();i++){
$(area_code)[2]$(district_code)[3] $(address_line1)[5] $(address_line2)[5] $(address_line3)[5]
}
and read the file content like
Integer areaCode = template.getNumber("area_code");
Integer districtCode = template.getNumber("district_code");
String addressLine1 = template.getString("address_line1");
...
...
Any suggestion on which library can be used or how to do this?

Why is my UTF-8 encoded data not staying ?UTF-8? encoded?

The problem I'm trying to fix is this:
Users of our application are copy/pasting characters from windows-related docs like Word for instance, and our application is not recognizing single and double quotes or bullets.
These are the steps I've taken so far to get this data into UTF format:
inside servers.xml, in Connector tag, I added the attribute URIEncoding="UTF-8".
in the bean charged with storing the input, I created a byte[] and passed in String holding inputNote text, then converted it to UTF-8. Then passed the UTF-8 converted String back to inputNoteText String. Please see directly below for condensed code on this.
byte[] bytesInUTF8inputNoteText = inputNoteText.getBytes("UTF-8");
inputNoteText = new String(bytesInUTF8inputNoteText, "UTF-8");
this.var = inputNoteText;
In the variable-setter charged with holding the result from the db query:
setNoteText(noteText) to convert the note data coming from database query into bytes in UTF8 format, then converted it back into a String and set it to String noteText property. Also below.
public void setNoteText(String noteText) throws UnsupportedEncodingException {
byte[] bytesInUTF8inputNoteText = noteText.getBytes("UTF-8");
String noteTextUTF8 = new String(bytesInUTF8inputNoteText, "UTF-8");
this.noteText = noteTextUTF8;}
In SQL Server I changed the data type from text to nvarchar(MAX) to store the data in Unicode, even though that is a different type of Unicode.
What I see when I copy/paste from a MS Word doc into our JSF input textbox:
In Eclipse if I set a watch on the property in the bean, once the data in that String property has been converted into UTF-8, all characters are in UTF-8 format. When I post to to SQL Server the string of data held in nvarchar(max) datatype shows all characters in UTF-8 format correctly. Then when the resultSet is returned and the holding property is populated with the String returned from the db query, it also shows as all being correctly formatted in UTF-8....BUT,...somewhere in between the correct string value that's sitting in the property that's tied into the JSF page and the JSF page, 1.2 by the way, the value is being unformatted so that I see question marks where I should see single/double quotes and bullet points. I hope that someone has run into this type of issue before and can shed some light on what I need to do to fix this. Seems kind of like a JSF bug, thanks in advance for your input!!
try this
String noteText = new String (noteText.getBytes ("iso-8859-1"), "UTF-8");
When you copy paste from windows documents, the encoding format is not UTF-8 but [Windows-1252] (http://en.wikipedia.org/wiki/Windows-1252). Note the cells marked in thick green borders. These chars DONT map to UTF-8 charset and so you will have to use Windows-1252 encoding while reading.

Assign Special Characters inside a String Array

Android 2.3.3 / Java
I have a String[] that is assigned with static values containing special characters like √x²+y² and so on. When I try to save the file, i get the following error, please refer to the image below. How do I get over this? Assigning these values statically is a must. Is there a way I can change the encoding of the java file and yet have other things remain the same.
Here is how I am trying to assign the values ::: For example...
commandNames = new String[] {"√x²+y²","√x²+y²","√x²+y²"};
Your file have utf-8 characters.There is no issue by saving it as Save as UTF-8
Click that saveasUTF8 button
or
Change your file properties by
Windows > Preferences > General > Content Types, set UTF-8 as the
default encoding for all content types.
if you are using this in jsp page then include meta tag in header of your jsp page and
specify your content format in meta tag
and if it are using it in servlet then set response type as
response.setContentType("");

Text encoding converts junk character in Play! 1.2.4 framework

Issue: Character encoding in Play! 1.2.4 framework becomes.
Context: We are trying to store the text "《我叫MT繁體版》台港澳專屬伺服器上線!" from input text field to mysql using Play! 1.2.4 framework.
Steps that we followed:
1) UI to get the input from user. just any lang text, so we tried Japneese Char. Note: page is set to UTF-8 character encoding.
2) Post submission to Play! controller, the controller just reads the input and stores it using Play! model. snippet mentiond below,
public static void text_create() throws UnsupportedEncodingException,
ParseException {
System.out.println("params :: text string value :: " + params.get("text"));
String oldString = params.get("text");
// Converting the input string(which is UTF-8 format) and parsing to Windown-1252
String newString = new String(oldString.getBytes(), "WINDOWS-1252");
// 1. passing encoded text to mysql.
// 2. TextCheck table and the column 'text' has encoding and collation format as UTF-8.
// 3. TextCheck > text column mentioned as String in model.
TextCheck a = new TextCheck(newString);
List<Object> text = TextCheck.TextList();
render(a,text);
}
It stores as TEXT value as "《我�MT�體版》�港澳專屬伺�器上線�"
Problem is there are character � in between value. when i read this
raw data from mysql using other platforms like java, ruby or some
other language it converts but makes those � characters as junk. just
junk.
Note: Interstingly when i read it from same Play! framework. it looks all fine even that junk characters were read correctly.
Question: Why those junk characters ?
The problem is the following line:
String newString = new String(oldString.getBytes(), "WINDOWS-1252");
This looks like nonsense to me. Java stores all strings internally using UTF-16, so you can't adjust the encoding of a Java string in the manner you've attempted here.
The getBytes() method returns the bytes of the string using the default platform encoding. You then covert these bytes into a new string using a (probably) different charset. The result is almost certain to be broken.

Handle French Characters in Java

I Have a Page where I search for a term and it is displaying perfect. Whatever character type it is.
Now when I have few checkboxes in JSP and I check it and submit. In these checkboxes I have one box name like ABC Farmacéutica Corporation.
When I click on submit button, I am calling a function and will set all parameters to a form and will submit that form. (I tested putting alert for the special character display before submit and it is displaying good).
Now, coming to the Java end, I use Springs Frame work. When I print the term in controller, then it is displayed like ABC Farmacéutica Corporation.
Please help...
Thanks in advance.
EDIT :
Please try this sample Example
import java.net.*;
class sample{
public static void main(String[] args){
try{
String aaa = "ABC Farmacéutica Corporation";
String bbb = "ABC Farmacéutica Corporation";
aaa = URLEncoder.encode(aaa, "UTF-8");
bbb = URLDecoder.decode(bbb, "UTF-8");
System.out.println("aaa "+aaa);
System.out.println("bbb "+bbb);
}catch(Exception e){
System.out.println(e);
}
}
}
I am getting output as,
aaa PiSA+Farmac%C3%A9utica+Mexicana+Corporativo
bbb PiSA Farmacéutica Mexicana Corporativo
Try to print the string aaa as it is.
You get "ABC Farmacéutica Corporation" because the string you receive from the client is ISO-8859-1, you need to convert it into UTF-8 before you URL decode it. Like this :
bbb = URLDecoder.decode(new String(bbb.getBytes("ISO-8859-1"), "UTF-8"), "UTF-8");
NOTE : some encodings cannot be converted from and to different encodings without risking data loss. For example, you cannot convert Thaï characters (TIS-620) to another encoding, not even UTF-8. For this reason, avoid converting from one encoding to another, unless ultimately necessary (ie. the data comes from an external, third perty, or proprietary source, etc.) This is only a solution on how to convert from one source to another, knowing the source encoding.
This is an encoding problem, and the à clearly identify that this is UTF-8 text interpreted as ISO-Latin-1 (or one of its cousins).
Ensure that your JSP-page at the top show that it uses UTF-8 encoding.
I suspect the problem is with character encoding on the page. Make sure the page you submit from and the one you display to use the same character set and make sure that you set it explicitely.
for instance if your server runs on Linux the default encoding will be UTF-8 but if you view the page on Windows it will assume (if no encoding is specified) it to be ISO-8859-1.
Also when you are receiving the submitted text on your server side, the server will assume the default character set when building the string -- whereas your user might have used a differrent encoding if you didn't specify one.
As I understand it, the text is hardcoded in controller code like this:
ModelAndView mav = new ModelAndView("hello");
mav.addObject("message", "ABC Farmacéutica Corporation");
return mav;
I expect this would work:
ModelAndView mav = new ModelAndView("hello");
mav.addObject("message", "ABC Farmac\u00e9utica Corporation");
return mav;
If so, the problem is due to a mismatch between the character encoding your Java editor is using and the encoding your compiler uses to read the source code.
For example, if your editor saves the Java file as UTF-8 and you compile on a system where UTF-8 is not the default encoding, then you would need to tell your compiler to use that encoding:
javac -cp foo.jar -encoding UTF-8 Bar.java
Your build scripts and IDE settings need to be consistent when handling character data.
If your text editor saved your file as UTF-8 then, in a hex editor, é would be the byte sequence C3 A9; in many other encodings, it would have the value E9. ISO-8859-1 and windows-1252 would encode é as C3 A9. You can read about character encoding in Java source files here.
Change the encoding of jsp page to UTF-8 in the File> Properties then add this line in the head of your jsp page: <%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

Categories