How to properly set utf8 encoding with jdbc and MySQL?

How to properly set utf8 encoding with jdbc and MySQL? - java

JDBC and MySQL work just fine in my project except when it comes down to accented characters.
This is the URL I use to access the database:
jdbc:mysql://localhost:3306/dbname?useUnicode=yes&characterEncoding=UTF-8
Suppose a resultSet = preparedStatement.executeQuery(), and then a System.out.println(resultSet.getString("text_with_accents"));. What's stored in the database is àèìòù (note that I've already set the right encoding in the database and all its tables), but what I get is ?????.
Is there a way to fix this?

Try to change your url like
url="jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8"
The & must be represented as &

Probably...
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)

Related

Can write but cannot read Unicode from MYSQL table using java and jdbc

I can successfully write to a mysql database table using java/jdbc for the unicode text "привет моя работа программист"
When I search the database table using the mysql command prompt on windows 10 I see the exact text in the table.
However when I read the text back using java jbdc the text from the result set is as follows
Ð¿Ñ€Ð¸Ð²ÐµÑ‚ Ð¼Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð° Ð¿Ñ€Ð¾Ð³Ñ€Ð°Ð¼Ð¼Ð¸ÑÑ‚
The url I use to call is
jdbc:mysql://localhost/dbname?useUnicode=true&characterEncoding=utf-8
I use the following code
PreparedStatement ps = con.prepareStatement(SELECT_STATEMENT_EMAIL);
ps.setString(1, idemail);
ps.setString(2, password);
ResultSet res = ps.executeQuery();
if (res.next()) {
String description = res.getString("description");
}
I have converted the database and database table to utf8 using the following commands
ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Can anyone point me in the right direction?

Change the file encoding to UTF8.
Most likely, you have IntelliJ IDEA that likes to change the encoding of the file...

I fear you have two problems.
Ð¿Ñ€Ð¸Ð²ÐµÑ‚ Ð¼ sounds like Mojibake.
In trying to decode that, I get привет мо� работа программи�т. The black diamonds usually come from the wrong <meta...> on the output page. But they must be coming from somewhere else.
Never mind. I see that я is D18F. But 8F, treated as latin1 seems to be a non-printing character, thereby messing up the cadence of 2-byte utf8 codes, leading to the black diamond.
The decoding was BINARY(CONVERT(col USING latin1)), but I suspect that cannot be relied upon.
Mojibake usually comes from
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
xx The column in the table was declared CHARACTER SET latin1. (Or possibly it was inherited from the table/database.) (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
So, that gives you two things to fix. And, sorry, but I don't think the data already in the table can be fixed. However, I could look deeper, if you provide SELECT col, HEX(col) FROM ... WHERE ... showing either that string, or at least some string with я or с in it.

Hibernate: is there any way save object with utf-8 characters to table in latin1 encoding?

currently my spring application is connecting to mysql database with characterEncoding=UTF-8 connection, but several tables have encoding in latin1, really I don't need utf-8 characters be stored to dabase as original, for me saving '?????' instead of them is acceptable, but now I'm getting this error:
Caused by: java.sql.SQLException: Incorrect string value: '\xD1\x84\xD1\x8B\xD0\xB2...' for colum
Is there any way save question marks for given tables instead of throwing error? (converting table to utf-8 is not acceptable way for my situation)

You could either replaceAll non-latin1 characters with '?' using
value.replaceAll("[^\x0-\x7f]", "?")
Or you could probably simply encode and re-decode using StandardCharsets.US_ASCII.

Getting UTF-8 character issue in Oracle and Java

Have one table in Oracle which contains column type is NVARCHAR2. In this column I am trying to save Russian characters. But it showing ¿¿¿¿¿¿¿¿¿¿¿ like this. When I try to fetch same characters from Java, I am getting same string.
I have try with NCHAR VARCHAR2. But in all cases same issue.
Is this problem from Java or Oracle? I have try same Java code, same characters using PostgreSQL which is working fine. I am not getting whether this problem is from Oracle or Java?
In my oracle database NLS_NCHAR_CHARACTERSET property value is AL16UTF16.
Any Idea how can I show UTF-8 characters as it is in Java which saved in Oracle.

Problem with characters is that you cannot trust your eyes. Maybe the database stores the correct character values but your viewing tool does not understand them. Or maybe the characters get converted somewhere along the way due to language settings.
To find out what character values are stored in your table use the dump function:
Example:
select dump(mycolumn),mycolumn from mytable;
This will give you the byte values of your data and you can check whether or not the values in your table are as they should be.

After doing some google I have resolved my issue. Here is ans: AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
It means while creating database in Oracle, set AL32UTF8 character set.
Here is link for this
http://docs.oracle.com/cd/B19306_01/server.102/b14231/create.htm#i1008322

You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

Java PreparedStatement UTF-8 character problem

I have a prepared statement:
PreparedStatement st;
and at my code i try to use st.setString method.
st.setString(1, userName);
Value of userName is şakça. setString methods changes 'şakça' to '?akça'. It doesnt recognize UTF-8 characters. How can i solve this problem?
Thanks.

The number of ways this can get screwed up is actually quite impressive. If you're using MySQL, try adding a characterEncoding=UTF-8 parameter to the end of your JDBC connection URL:
jdbc:mysql://server/database?characterEncoding=UTF-8
You should also check that the table / column character set is UTF-8.

Whenever a database changes a character to ?, then it simply means that the codepoint of the character in question is completely out of the range for the character encoding as the table is configured to use.
As to the cause of the problem: the ç lies within ISO-8859-1 range and has exactly the same codepoint as in UTF-8 (U+00E7). However, the UTF-8 codepoint of ş lies completely outside the range of ISO-8859-1 (U+015F while ISO-8859-1 only goes up to U+00FF). The DB won't persist the character and replace it by ?.
So, I suspect that your DB table is still configured to use ISO-8859-1 (or in one of other compatible ISO-8859 encodings where ç has the same codepoint as in UTF-8).
The Java/JDBC API is doing its job perfectly fine with regard to character encoding (Java uses Unicode all the way) and the JDBC DB connection encoding is also configured correctly. If Java/JDBC would have incorrectly used ISO-8859-1, then the persisted result would have been ÅakÃ§a (the ş exist of bytes 0xC5 and 0x9F which represents Å and a in ISO-8859-1 and the ç exist of bytes 0xC3 and 0xA7 which represents
Ã and § in ISO-8859-1).

setString methods changes 'şakça' to
'?akça'
How do you know that setString changes this? Or do you see the content in the database and decide this?
It could be that the database is not configured for UTF-8, or simply that the tool you use to see the contects of the database (SQL*PLUS for Oracle...) is not capable of diaplaying UTF-8.

you can use query as below to set unicode strings in prepared statement.
PreparedStatement st= conn.prepareStatement("select * from users where username=unistr(?)");// unistr method is for oracle
st.setString(1, userName);

DB2 database using unicode

I have a problem with DB2 databases that should store unicode characters. The connection is established using JDBC.
What do I have to do if I would like to insert a unicode string into the database?
INSERT INTO my_table(id, string_field) VALUES(1, N'my unicode string');
or
INSERT INTO my_table(id, string_field) VALUES(1, 'my unicode string');
I don't know if I have to use the N-prefix or not. For most of the databases out there it works pretty well when using it but I am not quite sure about DB2. I also have the problem that I do not have a DB2 database at hand where I could test these statements. :-(
Thanks a lot!

The documentation on constants (as of DB2 9.7) says this about graphic strings:
A graphic string constant specifies a varying-length graphic string consisting of a sequence of double-byte characters that starts and ends with a single-byte apostrophe ('), and that is preceded by a single-byte G or N. The characters between the apostrophes must represent an even number of bytes, and the length of the graphic string must not exceed 16 336 bytes.

I have never heard of this in context of DB2. Google learns me that this is more MS SQL Server specific. In DB2 and every other decent RDBMS you only need to ensure that the database is using the UTF-8 charset. You normally specify that in the CREATE statement. Here's the DB2 variant:
CREATE DATABASE my_db USING CODESET UTF-8;
That should be it in the DB2 side. You don't need to change the standard SQL statements for that. You also don't need to worry about Java as it internally already uses Unicode.

Enclosing the unicode string constant within N'' worked through JDBC application for DB2 DB.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to properly set utf8 encoding with jdbc and MySQL? - java

Try to change your url like url="jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8" The & must be represented as &

Probably... you had utf8-encoded data (good) SET NAMES latin1 was in effect (default, but wrong) the column was declared CHARACTER SET latin1 (default, but wrong)

Related

Can write but cannot read Unicode from MYSQL table using java and jdbc

Hibernate: is there any way save object with utf-8 characters to table in latin1 encoding?

Getting UTF-8 character issue in Oracle and Java

Java PreparedStatement UTF-8 character problem

DB2 database using unicode

Categories

Resources