Character encoding problem using ScrollableResults and MySql - java

I'm doing
private void doSomething(ScrollableResults scrollableResults) {
while(scrollableResults.next()) {
Object[] result = scrollableResults.get();
String columnValue = (String) result[0];
}
}
I tried this in two computers
It works fine. It is a Windows 7. System.getProperty("file.encoding") returns Cp1252.
When the word in the database has accents columnValue gets strange values. Is is a CentOS. System.getProperty("file.encoding") returns UTF-8.
Both databases are MySql, Charset: latin1, Collation: latin1_swedish_ci.
What should I do to correct this?

My suggestion would be to use UTF-8 everywhere:
at the database/tables level (the following ALTER will change the character set not only for the table itself, but also for all existing textual columns)
ALTER TABLE <some table> CONVERT TO CHARACTER SET utf8
in the connection string (which is required with MySQL's JDBC driver or it will use the client's encoding)
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
References
MySQL 5.0 Reference Manual
9.1.3.2. Database Character Set and Collation
9.1.3.3. Table Character Set and Collation
Connector/J (JDBC) Reference
20.3.4.4. Using Character Sets and Unicode

Related

Retrieving data using JDBC in utf-8 from a mysql server where charset is set to latin1

The schema and tables all have charset and collation of Latin1, but if I try to retrieve Chinese characters from table, it only gives ?????. I don't have access to change the schema or tables properties. How can I convert the charset to actual characters in JDBC?

Inserting spanish text in mysql

I do a HTTP GET call in Java to get content which may contain spanish characters, for example: Ñañez
But what I get as a response from Mysql - Ñañez
So far I searched online and did the below:
Appended utf-8 as encoding in connection String(Using Java)
jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8
Updated the table's encoding
ALTER TABLE test CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
The problem is still there..
Anything I am missing??
Server is Tomcat 6
try altering table column
ALTER TABLE `test` CHANGE `columnname` `columnname` VARCHAR(200)
CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL;
you must run this query before your insert query in mysql:
SET NAMES 'utf8'
Mojibake is usually caused by
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
Include characterEncoding=utf-8 in the connection string.

Getting rid of binary code from string before inserting row in MySQL database

I am fetching tweets from Twitter and storing them in a database for future use. I am using UTF-8 encoding in my driver, utf8_mb4_bin in my VARCHAR fields and utf8mb4_general_ciserver collation. The problem with that is that when inserting a value in a VARCHAR field, if the text has any binary code then it will throw an exception since VARCHAR utf8 does not accept binary.
Here is an example, I am fetching the text from here and try inserting it in my database and I get the error:
Incorrect string value: '\xF0\x9F\x98\xB1\xF0\x9F...' for column 'fullTweet' at row 1
My guess is that the two emoticons are causing this. How do I get rid of them before inserting the tweet text in my database?
Update:
Looks like I can manually enter the emoticons. I run this query:
INSERT INTO `tweets`(`id`, `createdAt`, `screenName`, `fullTweet`, `editedTweet`) VALUES (450,"1994-12-19","john",_utf8mb4 x'F09F98B1',_utf8mb4 x'F09F98B1')
and this is what the row in the table looks like:
You can remove non ascii characters from tweet string before inserting.
tweetStr = tweetStr.replaceAll("[^\\p{ASCII}]", "");
It looks like utf8mb4 support is still not configured correctly.
In order to use utf8mb4 in your fields you need to do the following:
Set character-set-server=utf8mb4 in your my.ini or my.cnf. Only character-set-server really matters here, other settings don't.
Add characterEncoding=UTF-8 to connection URL:
jdbc:mysql://localhost:3306/db?characterEncoding=UTF-8
Configure collation of the field

UTF-8 Queries with JDBC

i want to ask the MYSQL an UTF-8 Query but it does not work fine . when i try the following query , the result comes up truly :
String query = "select * from Terms where Term = 'lol'";
but with the following query doesn't make a response :
String query = "select * from Terms where Term = 'خدابخش'";
where the
'خدابخش'
part is in Persian and UTF-8 .
note that the connection to the database is fine .
Chances are that you may need to set your character encoding in your JDBC connection. If you are using MySQL JDBC Connector you do it using the property characterEncoding. Somewhat like this:
jdbc:mysql://localhost/some_db?useUnicode=yes&characterEncoding=UTF-8
You may want to read the reference on encoding and character sets in your connector JDBC documentation.
This is the one that mentions the use of characterEncoding for the MySQL JDBC Connector:
Connector JDBC: Using Character Sets and Unicode
One or more of the following is true:
The Java compiler, compiling your code, is set to read the source file with a different encoding in which the source file was actually stored. In other words, there is a discrepancy between the encoding that your editor uses, the encoding in which the file is actually saved, and the encoding with which the Java compiler is reading your source code.
Your database isn't set correctly to accept/store Unicode characters. Ensure that your database is set correctly. Looks like you're using MySQL. You may want to create a dump of the database using mysqldump and witness how the database was created with respect to character sets.

Inserting Arabic characters into mySQL DB using java (SE)

I have three textfields (ID,Name,Address) which reflects columns in DB table i'm trying to insert arabic characters into that table but it appears as "?????"
iam connecting to DB using JDBC
my Database info. : MySQL Server 5.5.14 community
Server characterset: latin1
Db characterset: utf8
Client characterset: latin1
Conn. characterset: latin1
i try to encode strings using the following code:
private String ArEncode(String text){
String txt="";
try {
Charset cset = Charset.forName("utf8");
CharsetEncoder encoder = cset.newEncoder();
CharsetDecoder decoder = cset.newDecoder();
ByteBuffer buffer = encoder.encode(CharBuffer.wrap(text));
txt=buffer.asCharBuffer().toString();
} catch (Exception ex) {
Logger.getLogger(UserView.class.getName()).log(Level.SEVERE, null, ex);
}
return txt;
}
then the returned string "txt" is inserted in to the database
note: when i try to insert values directly into DB from netbeans it is inserted correctly and the Arabic characters appears correctly.
Why would you do this? It's the job of the mysql JDBC driver to encode strings correctly in the declared database character set. Set your character set in the database to UTF-8, set the JDBC options correctly, and just do a plain insert of a plain old String.
your encoding logic should if it works correctly return the exact same value in txt as in text thus it is not needed.
And: set the connection characterset to utf8 - the JDBC will/should take care of the rest...

Categories