character encoding issue - GB2312

character encoding issue - GB2312 - java

I am displaying simplified chines character retrieved from database using the below code snippet but it is displaying junk character
String text="×°ÏäÊ±ÇëÅÄÕÕ"; // retrieved from database
String result=new String(text.getBytes("utf-8"),"GB2312");
Actual output is : �掳�盲�卤�毛����
Expected Output is : 装箱时请拍照
please help

A string always should have the correct characters. Only during conversion to a byte stream the encoding comes into play.
So when text is what you got from the DB then you have the problem already in fetching the string from the DB.

Related

read unique char: 'あ' from json file in java

I am reading a JSON file in Java using this code:
String data = Files.readFile(jsonFile)
.trim()
.replaceAll("[^\\x00-\\x7F]", "")
.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "")
.replaceAll("\\p{C}", "");
In my JSON file, there is a unique char: 'あ' (12354) that is interpreted to: "" (nothing) when reading the file.
How can I make this char show up in my variable "data"?
Due to answers I've got, I understand that the data is cleaned from high ASCII characters by adding replaceAll("[^\\x00-\\x7F]", ""). But what can I do if I want all high ASCII characters to be cleaned except this one 'あ'?

The character you want is the unicode character HIRAGANA LETTER A and has code U+3042.
You can simply add it to the list of valid characters:
...
.replaceAll("[^\\x00-\\x7F\\u3042]", "")
...

Filter Special Characters in Spring / Java

I'm using jsoup to get all text from websites.
Document doc = Jsoup.connect("URL").get();
String allText doc.text().toLowerCase();
Then I'm using Hibernate to persist the object that holds all text to a MySQL DB:
...
#Column(name="all_text")
#Lob
private String allText = null;
...
Everything is good so far. Only that sometimes I get a MySQL error when I try to save the object with allText:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8A s...' for column 'all_text' at row 1
Already looked this up and it's an encoding error. Probably have some special characters on their websites. I found a way to fix this by changing the encoding in the DB.
But my actual question is: what's the best way to filter and remove the special characters from the allText string and not persist them at all?
EDIT: To clarify, by special characters I mean Emoticons and all that stuff. Definitely anything that doesn't fit into UTF-8 encoding. I'm not concerned about ~ ^ etc...
Thanks in advance!

Just use regex:
allText.replaceAll("\\p{C}", "");
Don't forget to import java.util.regexPattern

Croatian character in Java standard output

I have a database with some cratian characters in it like Đ , in the database the character is stored correctly, when using a datatable in primefaces it also shows the character in the webpage just fine.
The problem is that when I send it to the out.println() the character Đ in the name is missing.
for (People p : people) {
System.out.println("p.getName());
}
I tried using String name2 = p.getName().getBytes("ISO-8859-2"); but it still not working

I assume you are using UTF-8 as default encoding on the Database and for Primefaces
Have also a look to this:
Display special characters using System.out.println

Java: Search in a wrong encoded String without modifying it

I have to find a user-defined String in a Document (using Java), which is stored in a database in a BLOB. When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one).
Some additional information:
The document's content is returned as String in "ISO-8859-1" (Latin1).
Here is an example, what a String could look like:
Die Erkenntnis, daÃ der KÃ¼nstler Schutz braucht, ...
This is how it should look like:
Die Erkenntnis, daß der Künstler Schutz braucht, ...
If I am searching for Künstler it would fail to find it, because it looks for ü but only finds Ã¼.
Is it possible to convert Künstler into KÃ¼nstler so I can search for the wrong encoded version instead?
Note:
We are using the Hibernate Framework for Database access. The original Getter for the Document's Content returns a byte[]. The String is than returned by calling
new String(getContent(), "ISO-8859-1")
The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way.

Okay, looks like I've found a way to mess up the encoding on purpose.
new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")
By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to KÃ¼nstler. It's a hell of a hack but seems to work well.

Already answered by yourself.
An altoghether different approach:
If you can search the blob, you could search using
"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"
This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design.

xmlhttp.setRequestHeader not working

this is my code
// assume var data has japanese characters
xmlhttp.open("POST","adminUpdate?&value="+data,true); // tried GET as well
xmlhttp.setRequestHeader("Content-Type", "text/plain;charset=UTF-8");
xmlhttp.send();
if I insert alert(data) then i can see japanese characters perfectly fine.
But on the server side (servlet class) when I add this code :
String query = request.getParameter("value");
system.out.println(query)
Now I see garbage value ??????
Ok so I added this line server side :
System.out.println("content type : "+ request.getContentType());
and I got this : text/plain;charset=UTF-8
So now my question is if the encoding is set correctly then why I cant see Japanese characters

One option is to send the query parameters as part of the request body and have the content type set to application/x-www-form-urlencoded.
Then, before getting the parameter, set the request's content character encoding
request.setCharacterEncoding("UTF-8");
String query = request.getParameter("value");
Note that wherever you're printing the query value has to be able to display UTF-8 encoded characters.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

character encoding issue - GB2312 - java

A string always should have the correct characters. Only during conversion to a byte stream the encoding comes into play. So when text is what you got from the DB then you have the problem already in fetching the string from the DB.

Related

read unique char: 'あ' from json file in java

Filter Special Characters in Spring / Java

Croatian character in Java standard output

Java: Search in a wrong encoded String without modifying it

xmlhttp.setRequestHeader not working

Categories

Resources