I have a prepared statement:
PreparedStatement st;
and at my code i try to use st.setString method.
st.setString(1, userName);
Value of userName is şakça. setString methods changes 'şakça' to '?akça'. It doesnt recognize UTF-8 characters. How can i solve this problem?
Thanks.
The number of ways this can get screwed up is actually quite impressive. If you're using MySQL, try adding a characterEncoding=UTF-8 parameter to the end of your JDBC connection URL:
jdbc:mysql://server/database?characterEncoding=UTF-8
You should also check that the table / column character set is UTF-8.
Whenever a database changes a character to ?, then it simply means that the codepoint of the character in question is completely out of the range for the character encoding as the table is configured to use.
As to the cause of the problem: the ç lies within ISO-8859-1 range and has exactly the same codepoint as in UTF-8 (U+00E7). However, the UTF-8 codepoint of ş lies completely outside the range of ISO-8859-1 (U+015F while ISO-8859-1 only goes up to U+00FF). The DB won't persist the character and replace it by ?.
So, I suspect that your DB table is still configured to use ISO-8859-1 (or in one of other compatible ISO-8859 encodings where ç has the same codepoint as in UTF-8).
The Java/JDBC API is doing its job perfectly fine with regard to character encoding (Java uses Unicode all the way) and the JDBC DB connection encoding is also configured correctly. If Java/JDBC would have incorrectly used ISO-8859-1, then the persisted result would have been Åakça (the ş exist of bytes 0xC5 and 0x9F which represents Å and a in ISO-8859-1 and the ç exist of bytes 0xC3 and 0xA7 which represents
à and § in ISO-8859-1).
setString methods changes 'şakça' to
'?akça'
How do you know that setString changes this? Or do you see the content in the database and decide this?
It could be that the database is not configured for UTF-8, or simply that the tool you use to see the contects of the database (SQL*PLUS for Oracle...) is not capable of diaplaying UTF-8.
you can use query as below to set unicode strings in prepared statement.
PreparedStatement st= conn.prepareStatement("select * from users where username=unistr(?)");// unistr method is for oracle
st.setString(1, userName);
Related
I'm trying to set the connection Charset for my Firebird connection using Pentaho DI but still couldn't read the data in the right encoding.
I used many parameters like encoding , charSet,...etc but no luck.
Correct me what i have missed ?
You either need to use encoding with the Firebird name of the character set, or charSet with the Java name of the character set(*).
WIN1256 is not a valid Java character set name, so the connection will fail. If you specify charSet, then you need to use the Java name Cp1256 or - with Jaybird 2.2.1 or newer - windows-1256.
If this doesn't work then either Pentaho is not correctly passing connection properties, or your data is stored in a column with character set NONE in a different encoding than WIN1256 (or worse: stored in a column with character set WIN1256, but the data is actually a different encoding).
*: Technically you can combine encoding and charSet, but it is only for special use cases where you want Firebird to read data in one character set, and have Jaybird interpret it in another character set.
JDBC and MySQL work just fine in my project except when it comes down to accented characters.
This is the URL I use to access the database:
jdbc:mysql://localhost:3306/dbname?useUnicode=yes&characterEncoding=UTF-8
Suppose a resultSet = preparedStatement.executeQuery(), and then a System.out.println(resultSet.getString("text_with_accents"));. What's stored in the database is àèìòù (note that I've already set the right encoding in the database and all its tables), but what I get is ?????.
Is there a way to fix this?
Try to change your url like
url="jdbc:mysql://localhost:3306/dbname?useUnicode=true&characterEncoding=UTF-8"
The & must be represented as &
Probably...
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
I can successfully write to a mysql database table using java/jdbc for the unicode text "привет моя работа программист"
When I search the database table using the mysql command prompt on windows 10 I see the exact text in the table.
However when I read the text back using java jbdc the text from the result set is as follows
привет Ð¼Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð° программиÑÑ‚
The url I use to call is
jdbc:mysql://localhost/dbname?useUnicode=true&characterEncoding=utf-8
I use the following code
PreparedStatement ps = con.prepareStatement(SELECT_STATEMENT_EMAIL);
ps.setString(1, idemail);
ps.setString(2, password);
ResultSet res = ps.executeQuery();
if (res.next()) {
String description = res.getString("description");
}
I have converted the database and database table to utf8 using the following commands
ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Can anyone point me in the right direction?
Change the file encoding to UTF8.
Most likely, you have IntelliJ IDEA that likes to change the encoding of the file...
I fear you have two problems.
привет м sounds like Mojibake.
In trying to decode that, I get привет мо� работа программи�т. The black diamonds usually come from the wrong <meta...> on the output page. But they must be coming from somewhere else.
Never mind. I see that я is D18F. But 8F, treated as latin1 seems to be a non-printing character, thereby messing up the cadence of 2-byte utf8 codes, leading to the black diamond.
The decoding was BINARY(CONVERT(col USING latin1)), but I suspect that cannot be relied upon.
Mojibake usually comes from
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
xx The column in the table was declared CHARACTER SET latin1. (Or possibly it was inherited from the table/database.) (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
So, that gives you two things to fix. And, sorry, but I don't think the data already in the table can be fixed. However, I could look deeper, if you provide SELECT col, HEX(col) FROM ... WHERE ... showing either that string, or at least some string with я or с in it.
I am parsing a bunch of XML files and inserting the value obtained from them into a MySQL database. The character set of the mysql tables is set to utf8. I'm connecting to the database using the following connection url - jdbc:mysql://localhost:3306/articles_data?useUnicode=false&characterEncoding=utf8
Most of the string values with unicode characters are entered fine (like Greek letters etc.), except for some that have a math symbol. An example in particular - when I try to insert a string with mathematical script capital g (img at www.ncbi.nlm.nih.gov/corehtml/pmc/pmcents/1D4A2.gif) ( http://graphemica.com/𝒢 ) (Trying to parse and insert this article), I get the following exception -
java.sql.SQLException: Incorrect string value: '\xF0\x9D\x92\xA2 i...' for column 'text' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3515)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3447)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2554)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1761)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2046)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1964)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1949)
If I change my connection URL to - jdbc:mysql://localhost:3306/articles_data, then the insert works, but all regular UTF8 characters are replaced with a question mark.
There are two possible ways I'm trying to fix it, and haven't succeeded at either yet -
When parsing the article, maintain the encoding. I'm using org.apache.xerces.parsers.DOMParser to parse the xml files, but can't figure out how to prevent it from decoding (relevant XML - <p>𝒢 is a set containing...</p>). I could re-encode it, but that just seems inefficient.
Insert the math symbols into the database.
MySQL up to version 5.1 seems to only support unicode characters in the basic multilingual plane, which when encoded as utf-8 take no more than 3 bytes. From the manual on unicode support in version 5.1:
MySQL 5.1 supports two character sets for storing Unicode data:
ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character
utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character
In version 5.5 some new character sets where added:
...
utf8mb4, a UTF-8 encoding of the Unicode character set using one to four bytes per character
ucs2 and utf8 support BMP characters. utf8mb4, utf16, and utf32 support BMP and supplementary characters.
So if you are on mysql 5.1 you would first have to upgrade. In later versions you have to change the charset to utf8mb4 to work with these supplementary characters.
It seems the jdbc connector also requires some further configuration (From Connector/J Notes and Tips):
To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set.
I have a problem with DB2 databases that should store unicode characters. The connection is established using JDBC.
What do I have to do if I would like to insert a unicode string into the database?
INSERT INTO my_table(id, string_field) VALUES(1, N'my unicode string');
or
INSERT INTO my_table(id, string_field) VALUES(1, 'my unicode string');
I don't know if I have to use the N-prefix or not. For most of the databases out there it works pretty well when using it but I am not quite sure about DB2. I also have the problem that I do not have a DB2 database at hand where I could test these statements. :-(
Thanks a lot!
The documentation on constants (as of DB2 9.7) says this about graphic strings:
A graphic string constant specifies a varying-length graphic string consisting of a sequence of double-byte characters that starts and ends with a single-byte apostrophe ('), and that is preceded by a single-byte G or N. The characters between the apostrophes must represent an even number of bytes, and the length of the graphic string must not exceed 16 336 bytes.
I have never heard of this in context of DB2. Google learns me that this is more MS SQL Server specific. In DB2 and every other decent RDBMS you only need to ensure that the database is using the UTF-8 charset. You normally specify that in the CREATE statement. Here's the DB2 variant:
CREATE DATABASE my_db USING CODESET UTF-8;
That should be it in the DB2 side. You don't need to change the standard SQL statements for that. You also don't need to worry about Java as it internally already uses Unicode.
Enclosing the unicode string constant within N'' worked through JDBC application for DB2 DB.