Inserting spanish text in mysql - java

I do a HTTP GET call in Java to get content which may contain spanish characters, for example: Ñañez
But what I get as a response from Mysql - Ñañez
So far I searched online and did the below:
Appended utf-8 as encoding in connection String(Using Java)
jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8
Updated the table's encoding
ALTER TABLE test CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
The problem is still there..
Anything I am missing??
Server is Tomcat 6

try altering table column
ALTER TABLE `test` CHANGE `columnname` `columnname` VARCHAR(200)
CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL;

you must run this query before your insert query in mysql:
SET NAMES 'utf8'

Mojibake is usually caused by
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
Include characterEncoding=utf-8 in the connection string.

Related

How to deal with foreign characters using MySql within Java EE environment

I want to read data from a csv file and then write to MySql. The data contains foreign Languages.
I got this error when I tried to insert a record, which contains Japanese Characters, into MySql.
"1366Incorrect string value: '\xE6\xB0\xB4\xE7\x9D\x80...' for column 'name' at row 1"
The SQL sentence looks like this:
INSERT INTO `MerchandiseMaster` (id,name) VALUES ('20000101','JANIE AND JACK水着 鶯茶系 大胆花柄')
My csv file uses UTF-8 Encoding and the charset of MySql database schema is utf8_gerneral_ci.
I have put these parameters when I connect to database through JDBC(mysql-connector-java-5.1.34-bin.jar):
connect = DriverManager.getConnection("jdbc:mysql://localhost/mydata?"
+ "useUnicode=yes&characterEncoding=UTF-8&user=user123&password=user123.");
My question is:
Is there anything else that I am missing to deal with foreign characters correctly?
I found this on a website, so caveat emptor, but apparently MySQL's UTF-8 support is incomplete. In 2010 they added new support, utf8mb4 that supports the entire UTF-8 encoding scheme.
Add to your MySQL configuration file:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
Here's a link to the full article. I haven't tried this out, so test everything carefully first, and make a back-up of your database before doing anything.

java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8F' for column 'tweetcontent' at row 1

I try to save twitter feeds in mysql database in the following table
CREATE TABLE `tweets` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tweetcontent` varchar(255) CHARACTER SET utf8mb4 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
but the following error has appeared
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\x8F'
for column 'tweetcontent' at row 1
Can anyone help me please?
Already answered here
MySQL's utf8 permits only the Unicode characters that can be represented with 3 bytes in UTF-8. Here you have a character that needs 4 bytes: \xF0\x90\x8D\x83 (U+10343 GOTHIC LETTER SAUIL).
If you have MySQL 5.5 or later you can change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8.
You may also have to set the server property character_encoding_server to utf8mb4 in the MySQL configuration file. It seems that Connector/J defaults to 3-byte Unicode otherwise:
For example, to use 4-byte UTF-8 character sets with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.

Getting rid of binary code from string before inserting row in MySQL database

I am fetching tweets from Twitter and storing them in a database for future use. I am using UTF-8 encoding in my driver, utf8_mb4_bin in my VARCHAR fields and utf8mb4_general_ciserver collation. The problem with that is that when inserting a value in a VARCHAR field, if the text has any binary code then it will throw an exception since VARCHAR utf8 does not accept binary.
Here is an example, I am fetching the text from here and try inserting it in my database and I get the error:
Incorrect string value: '\xF0\x9F\x98\xB1\xF0\x9F...' for column 'fullTweet' at row 1
My guess is that the two emoticons are causing this. How do I get rid of them before inserting the tweet text in my database?
Update:
Looks like I can manually enter the emoticons. I run this query:
INSERT INTO `tweets`(`id`, `createdAt`, `screenName`, `fullTweet`, `editedTweet`) VALUES (450,"1994-12-19","john",_utf8mb4 x'F09F98B1',_utf8mb4 x'F09F98B1')
and this is what the row in the table looks like:
You can remove non ascii characters from tweet string before inserting.
tweetStr = tweetStr.replaceAll("[^\\p{ASCII}]", "");
It looks like utf8mb4 support is still not configured correctly.
In order to use utf8mb4 in your fields you need to do the following:
Set character-set-server=utf8mb4 in your my.ini or my.cnf. Only character-set-server really matters here, other settings don't.
Add characterEncoding=UTF-8 to connection URL:
jdbc:mysql://localhost:3306/db?characterEncoding=UTF-8
Configure collation of the field

Character encoding: java.sql.SQLException: Incorrect string value: '\xF5fi S\xE1...' for column

I'm writing a spring-mvc jpa application to a legacy database. The character set of the database tables are iso-8859-2 (latin2). This cannot change for legacy reasons.
The database is mysql 5.5. The jpa implementation is hibernate 4.
The fields in the database can contain accented characters, like "áéíöőüű" etc.
When I try to merge the object, updating the database, I get an
SQLException: Incorrect string value : '\xF5fi S\xE1...' for column"
for a value "Petőfi Sándor".
The views are jsp-s, and I set the encoding to ISO-8859-2:
<meta charset="ISO-8859-2">
and the characters are displaying right, (well in the input fields are written with html special chars, for example
Á instead of Á
but they are displaying fine.
I've looked at similar problems here, but did not find a solution. I would really appreciate if someone would know the solution. I'm in a pickle right now.
Edit:
Relevant parts from the table definition:
CREATE TABLE `cimek` (
...
`name` varchar(30) COLLATE latin2_hungarian_ci NOT NULL,
...
) ENGINE=InnoDB AUTO_INCREMENT=4040
DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci
Only the accented characters ő and ű are causing trouble.
Try to add
characterEncoding=ISO-8859-2
in your jdbc driver init
jdbc:mysql://location/dbname?characterEncoding=ISO-8859-2

Character encoding problem using ScrollableResults and MySql

I'm doing
private void doSomething(ScrollableResults scrollableResults) {
while(scrollableResults.next()) {
Object[] result = scrollableResults.get();
String columnValue = (String) result[0];
}
}
I tried this in two computers
It works fine. It is a Windows 7. System.getProperty("file.encoding") returns Cp1252.
When the word in the database has accents columnValue gets strange values. Is is a CentOS. System.getProperty("file.encoding") returns UTF-8.
Both databases are MySql, Charset: latin1, Collation: latin1_swedish_ci.
What should I do to correct this?
My suggestion would be to use UTF-8 everywhere:
at the database/tables level (the following ALTER will change the character set not only for the table itself, but also for all existing textual columns)
ALTER TABLE <some table> CONVERT TO CHARACTER SET utf8
in the connection string (which is required with MySQL's JDBC driver or it will use the client's encoding)
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
References
MySQL 5.0 Reference Manual
9.1.3.2. Database Character Set and Collation
9.1.3.3. Table Character Set and Collation
Connector/J (JDBC) Reference
20.3.4.4. Using Character Sets and Unicode

Categories