Retrieving non-English characters inserted values in MySQL database - java

I am working on inserting non-English characters in the database which is UTF-8 enabled.
I am able to insert non-English characters in the database. When I insert the URL (it passes through response of JSP and because UTF8 encoding is set, it's converted to UTF8 format)
"1"
http://heenaparekh.com/category/%E0%AA%AF%E0%AA%BE%E0%AA%A4%E0%AB%8D%E0%AA%B0%E0%AA%BE%E0%AA%AA%E0%AB%8D%E0%AA%B0%E0%AA%B5%E0%AA%BE%E0%AA%B8/
it appears correctly in database table as
"2"
http://heenaparekh.com/category/યાતરાપરવાસ
But when I am trying to select the value with same string "1" which I used for inserting in the database (without passing through request and response of JSP) :
When the select statement is run with the link as same URL "1", it is converted to URL "2" by request of JSP.
When I want to select the rows with this URL, I am not able to extract as I am not able to convert URL "1" to URL "2" in my JSP application and I don't need to shift to different page so I have tried the following methods to convert it into URL "2" format:
I have tried converting my string to UTF-8 format but it's not helping. String is printed to be same. When I convert it using Entities class of Nutch to UTF8 format then also it is not fetching results, maybe as UTF8 conversion by response of JSP and Entities class are different.
Please help me how to bring both the URLs to same format so that I can extract it from the database.

You need to URL-unencode rather than "UTF-unencode" the string. Internally, Java stores strings as UTF-16, but that doesn't matter for your purposes. Try the java.net.URLDecoder.decode(String, String) method.

Related

Reading Arabic text from Oracle database encoded in WE8ISO8859P1 using java

I connect to Oracle database which has NLS_CHARACTERSET (WE8ISO8859P1) , which as far as I know cannot support storing Arabic text.
But Toad for Oracle can read Arabic from this database:
However, I cannot read this using java code.
even I tried to get row of them in bytes using UTL_RAW.CAST_TO_RAW
The result was "218,227,237,225,228,199,32,199,225,218,210,237,210,161,225,222,207,32,199,211,202,229,225,223,202,32,32,56,48,37,32,227,228,32,230,205,207,199,202,32,221,225,237,223,211,32,32,32"
In a test java class, I tried to create new String(new char[]{}) using the above mentioned bytes, with no luck to display Arabic characters.
Any help ? , thank you.
This could be caused by quite a few things:
Check the column type in database it should be NVARCHAR not VARCHAR (notice the "N" at the beginning of the word)
Try to put charset=utf8 in the connection string
Convert the byte[] to string using UTF-8 encoding like this
String arabicText = new String(byteArray, "UTF-8");

Getting rid of binary code from string before inserting row in MySQL database

I am fetching tweets from Twitter and storing them in a database for future use. I am using UTF-8 encoding in my driver, utf8_mb4_bin in my VARCHAR fields and utf8mb4_general_ciserver collation. The problem with that is that when inserting a value in a VARCHAR field, if the text has any binary code then it will throw an exception since VARCHAR utf8 does not accept binary.
Here is an example, I am fetching the text from here and try inserting it in my database and I get the error:
Incorrect string value: '\xF0\x9F\x98\xB1\xF0\x9F...' for column 'fullTweet' at row 1
My guess is that the two emoticons are causing this. How do I get rid of them before inserting the tweet text in my database?
Update:
Looks like I can manually enter the emoticons. I run this query:
INSERT INTO `tweets`(`id`, `createdAt`, `screenName`, `fullTweet`, `editedTweet`) VALUES (450,"1994-12-19","john",_utf8mb4 x'F09F98B1',_utf8mb4 x'F09F98B1')
and this is what the row in the table looks like:
You can remove non ascii characters from tweet string before inserting.
tweetStr = tweetStr.replaceAll("[^\\p{ASCII}]", "");
It looks like utf8mb4 support is still not configured correctly.
In order to use utf8mb4 in your fields you need to do the following:
Set character-set-server=utf8mb4 in your my.ini or my.cnf. Only character-set-server really matters here, other settings don't.
Add characterEncoding=UTF-8 to connection URL:
jdbc:mysql://localhost:3306/db?characterEncoding=UTF-8
Configure collation of the field

Storing Special Character in MySQL

I am using mysql query browser to store the following names in the Person table which contains fields of personNumber and personName. I have the character set of personName at utf-8 and if i insert the name via query browse the query is running correctly but when i try that via JDBC or JPA, the name's special characters become the '?'. What is the problem here?...
The names are
1.Năstase
2.Hrustanović
3.Ogris-Martič and some similar names.
Have you set your connection string correctly?
jdbc:mysql://localhost:3306/administer?characterEncoding=utf8
Try this code
jdbc:mysql://localhost:3306/MY_DB?useUnicode=yes&characterEncoding=UTF8

Insert query with latin1_swedish_ci charset

I made a program that is generating me a INSERT queries for a MySql database. The database has 2 fields that are encoded with latin1_swedish_ci charset. If I run the query from PhpMyAdmin when I preview the content with my php script some special char (like "ó", "é"...) is not showed correctly.
I think the problem is that Java is encoding String as utf8 charset so when i copy paste the query in the phpmyadmin and i run it, the inserted record is generated with the wrong charset. How can I generate the correct query (with the correct charset) in Java?
Thanks In advance for your help

ms sql2000 arabic problem

I hava a table in ms sql2000 with a column defined as nvarchar
when query this table in java i get data for this column like this :
يا هلا بالشباب الحلوين يا شباب ا٠شلونكو؟.
When i try php with adodb i get the data as it should be ,in arabic.
but i need to use java not php ,please can any one help me.
i use a normal sql statement "select * from news"
i use the latest Microsoft jdbc driver(sqljdbc4.jar).
i have no direct access to the sql server.
That looks to me like an encoding issue, make sure you're using the proper encoding in Java to get the text back. Some variant of unicode obviously.
At every character processing step (getting data, modifying data, saving data, displaying data, etcetera) ensure that you're using UTF-8 character encoding.
If it is a client application, you usually only have to worry about it in the database table and if necessary also the JDBC connection string.
If it is a webapplication, then you need to take more into account: request and response encoding. For GET requests this is an appserver setting and for POST requests and all responses you can set it in the appropriate request/response objects.

Categories