DB2 database using unicode - java

I have a problem with DB2 databases that should store unicode characters. The connection is established using JDBC.
What do I have to do if I would like to insert a unicode string into the database?
INSERT INTO my_table(id, string_field) VALUES(1, N'my unicode string');
or
INSERT INTO my_table(id, string_field) VALUES(1, 'my unicode string');
I don't know if I have to use the N-prefix or not. For most of the databases out there it works pretty well when using it but I am not quite sure about DB2. I also have the problem that I do not have a DB2 database at hand where I could test these statements. :-(
Thanks a lot!

The documentation on constants (as of DB2 9.7) says this about graphic strings:
A graphic string constant specifies a varying-length graphic string consisting of a sequence of double-byte characters that starts and ends with a single-byte apostrophe ('), and that is preceded by a single-byte G or N. The characters between the apostrophes must represent an even number of bytes, and the length of the graphic string must not exceed 16 336 bytes.

I have never heard of this in context of DB2. Google learns me that this is more MS SQL Server specific. In DB2 and every other decent RDBMS you only need to ensure that the database is using the UTF-8 charset. You normally specify that in the CREATE statement. Here's the DB2 variant:
CREATE DATABASE my_db USING CODESET UTF-8;
That should be it in the DB2 side. You don't need to change the standard SQL statements for that. You also don't need to worry about Java as it internally already uses Unicode.

Enclosing the unicode string constant within N'' worked through JDBC application for DB2 DB.

Related

Java unable to read chinese characters from Db2 Database

I am trying to read from Java application Chinese characters from Db2 database
Db2 database configuration
DB2 database XDSN3T configuration:
with DB2 CLP data are displayed correctly
also from another delphi application chinese data are correct
To obtain this I set:
Regional and language options, Advanced, non unicode programs --> Chinese RPC
non unicode programs:
- enviroment variables, DB2CODEPAGE = 1252
db2codepage:
Only Java is not able to display data correctly --> ÃæÁÏ¡¢¸¨ÁÏ¡¢¸½¼þ
Maybe something related to JDBC..
When you open the connection you can define the encoding, not sure if it's available for chinese. but here is an example:
Connection con = DriverManager.getConnection("jdbc:mysql://examplehost:8888/dbname?useUnicode=yes&characterEncoding=UTF-8","user", "pass");
As it's been said the encoding might be an issue; characters in java are stored using UTF-16 encoding which has itself some issues regarding the encoding of Chinese (also some emoji) characters.
You can find the character list for UTF-16 here: https://www.fileformat.info/info/charset/UTF-16/list.htm
The issue with UTF-16 comes when characters cannot be encoded using a single 16-bit unit; these characters are encoded using two 16-bit units which is called a surrogate pair. see: https://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#unicode
Sorry I cannot provide a complete answer, but I hope this will help

Storing Unicode and special characters in MySQL tables

My current requirement is to store Unicode and other special characters, such as double quotes in MySQL tables. For that purpose, as many have suggested, we should use Apache's StringEscapeUtils.escapeJava() method. The problem is, although this method does replace special characters with their respective unicodes (\uxxxx), the MySQL table stores them as uxxxx and not \uxxxx. Due to this, when I try to decode it while fetching from the database, StringEscapeUtils.unescapeJava() fails (since it cannot find the '\').
Here are my questions:
Why is it happening (that is, '\' are skipped by the table).
What is the solution for this?
Don't use Unicode "codepoints" (\uxxxx), use UTF8.
Dont' use any special functions. Instead announce that everything is UTF-8 (utf8mb4 in MySQL).
See Best Practice
(If you are being provided \uxxxx, then you are stuck with converting to utf8 first. If your real question is on how to convert, then ask it that way.)
`

Getting UTF-8 character issue in Oracle and Java

Have one table in Oracle which contains column type is NVARCHAR2. In this column I am trying to save Russian characters. But it showing ¿¿¿¿¿¿¿¿¿¿¿ like this. When I try to fetch same characters from Java, I am getting same string.
I have try with NCHAR VARCHAR2. But in all cases same issue.
Is this problem from Java or Oracle? I have try same Java code, same characters using PostgreSQL which is working fine. I am not getting whether this problem is from Oracle or Java?
In my oracle database NLS_NCHAR_CHARACTERSET property value is AL16UTF16.
Any Idea how can I show UTF-8 characters as it is in Java which saved in Oracle.
Problem with characters is that you cannot trust your eyes. Maybe the database stores the correct character values but your viewing tool does not understand them. Or maybe the characters get converted somewhere along the way due to language settings.
To find out what character values are stored in your table use the dump function:
Example:
select dump(mycolumn),mycolumn from mytable;
This will give you the byte values of your data and you can check whether or not the values in your table are as they should be.
After doing some google I have resolved my issue. Here is ans: AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
It means while creating database in Oracle, set AL32UTF8 character set.
Here is link for this
http://docs.oracle.com/cd/B19306_01/server.102/b14231/create.htm#i1008322
You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

Java PreparedStatement UTF-8 character problem

I have a prepared statement:
PreparedStatement st;
and at my code i try to use st.setString method.
st.setString(1, userName);
Value of userName is şakça. setString methods changes 'şakça' to '?akça'. It doesnt recognize UTF-8 characters. How can i solve this problem?
Thanks.
The number of ways this can get screwed up is actually quite impressive. If you're using MySQL, try adding a characterEncoding=UTF-8 parameter to the end of your JDBC connection URL:
jdbc:mysql://server/database?characterEncoding=UTF-8
You should also check that the table / column character set is UTF-8.
Whenever a database changes a character to ?, then it simply means that the codepoint of the character in question is completely out of the range for the character encoding as the table is configured to use.
As to the cause of the problem: the ç lies within ISO-8859-1 range and has exactly the same codepoint as in UTF-8 (U+00E7). However, the UTF-8 codepoint of ş lies completely outside the range of ISO-8859-1 (U+015F while ISO-8859-1 only goes up to U+00FF). The DB won't persist the character and replace it by ?.
So, I suspect that your DB table is still configured to use ISO-8859-1 (or in one of other compatible ISO-8859 encodings where ç has the same codepoint as in UTF-8).
The Java/JDBC API is doing its job perfectly fine with regard to character encoding (Java uses Unicode all the way) and the JDBC DB connection encoding is also configured correctly. If Java/JDBC would have incorrectly used ISO-8859-1, then the persisted result would have been Åakça (the ş exist of bytes 0xC5 and 0x9F which represents Å and a in ISO-8859-1 and the ç exist of bytes 0xC3 and 0xA7 which represents
à and § in ISO-8859-1).
setString methods changes 'şakça' to
'?akça'
How do you know that setString changes this? Or do you see the content in the database and decide this?
It could be that the database is not configured for UTF-8, or simply that the tool you use to see the contects of the database (SQL*PLUS for Oracle...) is not capable of diaplaying UTF-8.
you can use query as below to set unicode strings in prepared statement.
PreparedStatement st= conn.prepareStatement("select * from users where username=unistr(?)");// unistr method is for oracle
st.setString(1, userName);

Are escape sequences preserved in CLOB?

We are using Java and Oracle for development.
I have table in a oracle database which has a CLOB column in it. Some XYZ application dumps a text file in this column. The text file has multiple rows.
Is it possible that while reading the same CLOB file thru Java application, the escape sequences (new line chars, etc) may get lost??
Reason I asked this is, we gona parse this file line by line and if the escape sequences are lost, then we would be trouble. I would have done this analysis myself, but I am on vacation and my team needs urgent help.
Would really appreciate if you could provide any thoughts/inputs.
You need to ensure that you use the one correct and same character encoding throughout the whole process. I strongly recommend you to pickup UTF-8 for that. It covers every human character known at the world. Every step which involves handling of character data should be instructed to use the very same encoding.
In SQL context, ensure that the DB and table is created with UTF-8 charset. In JDBC context, ensure that JDBC driver is using UTF-8; this is often configureable by JDBC connection string. In Java code context, ensure that you're using UTF-8 when reading/writing character data from/to streams; you can specify it as 2nd constructor argument in InputStreamReader and OutputStreamWriter.
A CLOB stores character data. Carriage returns and line feeds are valid characters, though unprintable ones. As long as your XYZ app is correctly filling your CLOBs, the contents should be just as manageable to you as if they had come from the file.
Depending on the platform and the nature of said "XYZ app," lines could be separated by either \r(Mac), \r\n (DOS/Windows) or \n (Unix/Linux), and you should make allowance for this fact if necessary. This is one aspect where BufferedReader.readLine() is more convenient, as it transparently gets rid of this difference for you.
I'm not 100% sure what you mean by escape sequences in this context. Within a (for example) Java literal string, "\n" is an escape sequence representing a newline, but once that string is outputted into something (say, a database), it's not an escape sequence any more, it's an actual newline character.
Anyhow, to your direct question, Java through can read text from Oracle CLOBs perfectly fine. Newlines are not lost.

Categories