Java unable to read chinese characters from Db2 Database - java

I am trying to read from Java application Chinese characters from Db2 database
Db2 database configuration
DB2 database XDSN3T configuration:
with DB2 CLP data are displayed correctly
also from another delphi application chinese data are correct
To obtain this I set:
Regional and language options, Advanced, non unicode programs --> Chinese RPC
non unicode programs:
- enviroment variables, DB2CODEPAGE = 1252
db2codepage:
Only Java is not able to display data correctly --> ÃæÁÏ¡¢¸¨ÁÏ¡¢¸½¼þ
Maybe something related to JDBC..

When you open the connection you can define the encoding, not sure if it's available for chinese. but here is an example:
Connection con = DriverManager.getConnection("jdbc:mysql://examplehost:8888/dbname?useUnicode=yes&characterEncoding=UTF-8","user", "pass");

As it's been said the encoding might be an issue; characters in java are stored using UTF-16 encoding which has itself some issues regarding the encoding of Chinese (also some emoji) characters.
You can find the character list for UTF-16 here: https://www.fileformat.info/info/charset/UTF-16/list.htm
The issue with UTF-16 comes when characters cannot be encoded using a single 16-bit unit; these characters are encoded using two 16-bit units which is called a surrogate pair. see: https://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#unicode
Sorry I cannot provide a complete answer, but I hope this will help

Related

Store Arabic in String and insert it into database using Java

I am trying to pass Arabic String into Function that store it into a database but the String's Chars is converted into '?'
as example
String str = new String();
str = "عشب";
System.out.print(str);
the output will be :
"???"
and it is stored like this in the database.
and if i insert into database directly it works well.
Make sure your character encoding is utf-8.
The snippet you showed works perfectly as expected.
For example if you are encoding your source files using windows-1252 it won't work.
The problem is that System.out.println is PrintWriter which converts the Arabic string into bytes using the default encoding; which presumably cannot handle the arabic characters. Try
System.out.write(str.getBytes("UTF-8"));
System.out.println();
Many modern operating systems use UTF-8 as default encoding which will support non-latin characters correctly. Windows is not one of those, with ANSI being the default in Western installations (I have not used Windows recently, so that may have changed). Either way, you should probably force the default character encoding for the Java process, irrespective of the platform.
As described in another Stackoverflow question (see Setting the default Java character encoding?), you'll need to changed the default as follows, for the Java process:
java -Dfile.encoding=UTF-8
Additionally, since you are running in IDE you may need to tell it to display the output in the indicated charset or risk corruption, though that is IDE specific and the exact instructions will depend on your IDE.
One other thing, is if you are reading or writing text files then you should always specify the expected character encoding, otherwise you will risk falling back to the platform default.
You need to set character set utf-8 for this.
at java level you can do:
Charset.forName("UTF-8").encode(myString);
If you want to do so at IDE level then you can do:
Window > Preferences > General > Content Types, set UTF-8 as the default encoding for all content types.

Java code page table

We transfer data from mainframe to Linux/Windows servers, using a file-transfer software. To decrease the time in transfer process, we are using the ZipDataset class, from JZOS API ToolKit. There is an option to convert the data before, from EBCDIC to ASCii, and we can use all the options available in the Java environment:
https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
The Java default is:
public static final String DEFAULT_TARGET_ENCODING = "ISO8859-1";
I need to find the documentation, where I can find the EBCDIC character to ASCii conversion and how it is possible to create a special table conversion.
For example, for ISO8859-1, the ECBDIC character 'A' is x'C1' and the related character in ASCii is x'41'.
Where can I find the other doc page encoding relation?

Getting UTF-8 character issue in Oracle and Java

Have one table in Oracle which contains column type is NVARCHAR2. In this column I am trying to save Russian characters. But it showing ¿¿¿¿¿¿¿¿¿¿¿ like this. When I try to fetch same characters from Java, I am getting same string.
I have try with NCHAR VARCHAR2. But in all cases same issue.
Is this problem from Java or Oracle? I have try same Java code, same characters using PostgreSQL which is working fine. I am not getting whether this problem is from Oracle or Java?
In my oracle database NLS_NCHAR_CHARACTERSET property value is AL16UTF16.
Any Idea how can I show UTF-8 characters as it is in Java which saved in Oracle.
Problem with characters is that you cannot trust your eyes. Maybe the database stores the correct character values but your viewing tool does not understand them. Or maybe the characters get converted somewhere along the way due to language settings.
To find out what character values are stored in your table use the dump function:
Example:
select dump(mycolumn),mycolumn from mytable;
This will give you the byte values of your data and you can check whether or not the values in your table are as they should be.
After doing some google I have resolved my issue. Here is ans: AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
It means while creating database in Oracle, set AL32UTF8 character set.
Here is link for this
http://docs.oracle.com/cd/B19306_01/server.102/b14231/create.htm#i1008322
You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

Turkish character while writing to database (postgresql)

I am working with Java and PostgreSQL on Windows . I have some words which include turkish characters like İ,ş,ö,ç etc.
In Java I assign words to a string and try to write it to the database. When I print it on java its encoding appears correct and all characters display correctly. However, while writing it to database the text appears to get mangled/scrambled.
I created my database with this command:
CREATE DATABASE dbname ENCODING "UTF-8"
I tried to fix it by converting Turkish characters into the ISO-8859-1 encoding like (İ -> \u0130 , ş -> \u015F)
//\u0130leti\u015Fim = İletişim
title = \u0130leti\u015Fim
String mytitle = new String(title.getBytes("ISO-8859-1"), "UTF-8");
And then I tried to write mytitle to database but it did not work.
Thanks for your advice.
SOLVED : I realized that it could write turkish characters to database, but the problem was on the response. I added these lines before write to response.
String contentType= "text/html;charset=UTF-8";
response.setContentType(contentType);
response.setCharacterEncoding("utf-8");
After adding this, it works now. I hope, i could explain cleanly.
When you call title.getBytes("ISO-8859-1"), you're promising the Java runtime that the characters in the string can be represented as ISO-8859-1 bytes, which is not actually true for either \u0130 or \u015f.
Therefore already the conversion to bytes will do something unspecified with your Turkish characters -- probably they will just be dropped.
Next, attempting to interpret whichever bytes you get out of it as UTF-8 even though they're really ISO-8859-1 is then guaranteed to make a complete mess of everything that wasn't ASCII to begin with.
(The repretoire of ISO-8859-1 happens to coincide exactly with the Unicode characters that can be written as \u00XX for some XX).
With encoding issues you have several things to check:
Whether your source file is in the encoding you expect it to be.
How client_encoding is set
What the database encoding is
In the case of Java, PgJDBC requires client_encoding to always be UTF-8 and will choke if you set it to something else, so that's not going to be the issue. You've shown that your database is UTF-8 too. So it seems likely that your Java sources aren't in the same encoding the Java compiler and runtime expect them to be in.
By default javac will interpret your source code in the platform default encoding. If you've saved your sources in a different encoding, weird things will happen. Save your sources either:
in the default encoding for your Windows platform;
as Unicode ("UTF-16" or "UCS-2"); or
As UTF-8 with a Byte Order Mark (BOM). Many programs don't add a BOM for UTF-8.
Then recompile your program. If that doesn't help, you'll need to follow up with more detail, starting with what exactly "it did not work" means, output of SELECTing the data you inserted with Java using psql, etc.
You should create the database like this:
CREATE DATABASE <db name>
WITH OWNER <owner user name>
TEMPLATE template0
ENCODING 'UTF-8'
LC_COLLATE 'tr_TR.UTF-8'
LC_CTYPE = 'tr_TR.UTF-8';

Java convert ISO-8859-1 to UTF-8 with correct unicode characters

I have some ISO-8859-1 text that I have tried to convert to UTF-8 but end up with some characters that are not mapped correctly.
I have been using plethora of standard built-in Java charset conversion which are pretty much all based on Charset.decode and the built-in CharsetDecoder.
This leads to two problems:
I have some characters that look fine in ISO but crap in Java since I output in UTF-8 as do most java apps.
I cannot insert into MySQL even though its set to UTF-8
For MySQL I get the exception (see link above):
Caused by: java.sql.SQLException: Incorrect string value: '\xC2\x9Esk\xC3\xA9...' for column 'b' at row 1
Is there a Java iconv or better Character decoder/mapper than whats built-in?
Are you certain that you have ISO-8859-1? You might have some Win-1252, which can be sorta close except for a dozen or so characters. That \x9E raises that suspiscion with me.
Try labeling your source as WIN-1252 and it should convert correctly.

Categories