Getting UTF-8 character issue in Oracle and Java

Getting UTF-8 character issue in Oracle and Java - java

Have one table in Oracle which contains column type is NVARCHAR2. In this column I am trying to save Russian characters. But it showing ¿¿¿¿¿¿¿¿¿¿¿ like this. When I try to fetch same characters from Java, I am getting same string.
I have try with NCHAR VARCHAR2. But in all cases same issue.
Is this problem from Java or Oracle? I have try same Java code, same characters using PostgreSQL which is working fine. I am not getting whether this problem is from Oracle or Java?
In my oracle database NLS_NCHAR_CHARACTERSET property value is AL16UTF16.
Any Idea how can I show UTF-8 characters as it is in Java which saved in Oracle.

Problem with characters is that you cannot trust your eyes. Maybe the database stores the correct character values but your viewing tool does not understand them. Or maybe the characters get converted somewhere along the way due to language settings.
To find out what character values are stored in your table use the dump function:
Example:
select dump(mycolumn),mycolumn from mytable;
This will give you the byte values of your data and you can check whether or not the values in your table are as they should be.

After doing some google I have resolved my issue. Here is ans: AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
It means while creating database in Oracle, set AL32UTF8 character set.
Here is link for this
http://docs.oracle.com/cd/B19306_01/server.102/b14231/create.htm#i1008322

You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

Related

Storing Unicode and special characters in MySQL tables

My current requirement is to store Unicode and other special characters, such as double quotes in MySQL tables. For that purpose, as many have suggested, we should use Apache's StringEscapeUtils.escapeJava() method. The problem is, although this method does replace special characters with their respective unicodes (\uxxxx), the MySQL table stores them as uxxxx and not \uxxxx. Due to this, when I try to decode it while fetching from the database, StringEscapeUtils.unescapeJava() fails (since it cannot find the '\').
Here are my questions:
Why is it happening (that is, '\' are skipped by the table).
What is the solution for this?

Don't use Unicode "codepoints" (\uxxxx), use UTF8.
Dont' use any special functions. Instead announce that everything is UTF-8 (utf8mb4 in MySQL).
See Best Practice
(If you are being provided \uxxxx, then you are stuck with converting to utf8 first. If your real question is on how to convert, then ask it that way.)
`

How does hyphen, en-dash and em-dash gets stored in oracle database having character set WEISO98859P1?

I have a java web application which uses oracle database. Character set of my database is WEISO98859P1. The problem which i am facing is, The characters like en-dash and em-dash gets stored inverted ? or some other weird symbol. Same goes while retrieving and displaying the data.
What can be the workaround for this problem?

You cannot store “en-dash” (–) and “em-dash” (—) in an Oracle database with character set WE8ISO8859P1, since these characters are not defined in that character set.
The best thing would be to convert the database to character set WE8MSWIN1252. You can do that without changing any of the data in the database, since WE8MSWIN1252 is a superset of WE8ISO8859P1. WE8MSWIN1252 contains “en-dash” (code point 96) and “em-dash” (code point 97).

Hibernate: is there any way save object with utf-8 characters to table in latin1 encoding?

currently my spring application is connecting to mysql database with characterEncoding=UTF-8 connection, but several tables have encoding in latin1, really I don't need utf-8 characters be stored to dabase as original, for me saving '?????' instead of them is acceptable, but now I'm getting this error:
Caused by: java.sql.SQLException: Incorrect string value: '\xD1\x84\xD1\x8B\xD0\xB2...' for colum
Is there any way save question marks for given tables instead of throwing error? (converting table to utf-8 is not acceptable way for my situation)

You could either replaceAll non-latin1 characters with '?' using
value.replaceAll("[^\x0-\x7f]", "?")
Or you could probably simply encode and re-decode using StandardCharsets.US_ASCII.

Extra spaces in the end while retrieving CHAR values

I am using ibatis and oracle 10. The jdbc driver is oracle.jdbc.driver.OracleDriver. When I retrieve data from table, I found two spaces ' ' are appended. Let's say column ACTIVE_IND CHAR(1), the data retrieved is 'A '.
Please note that this is happening for all the CHAR fields. And no of extra spaces is always two times the length of CHAR. For example, if there is a column of CHAR(14), no of extra spaces in the end are 28.
This is happening in the System Testing environment only. In our local desktops, using the same ojdbc14.jar and same code, we did not get any extra spaces.
I think the only thing different in System Testing environment is database. Is it related to some character encoding? Do we have some configuration in database to change it?

It sounds very like a character encoding issue. Have you checked
the configuration of the Oracle db in each case
what character encoding your app is running under for each environment (you can configure this using -Dfile.encoding=UTF8 or similar - I would strongly recommend this)

It cannot be completely determined from the information you gave what exactly went wrong. As far as the encoding for a column is concerned it is affected by the COLLATION SETTING. You can check the following link for more information: http://www.lessanvaezi.com/collation-in-oracle-using-nls_sort/

DB2 database using unicode

I have a problem with DB2 databases that should store unicode characters. The connection is established using JDBC.
What do I have to do if I would like to insert a unicode string into the database?
INSERT INTO my_table(id, string_field) VALUES(1, N'my unicode string');
or
INSERT INTO my_table(id, string_field) VALUES(1, 'my unicode string');
I don't know if I have to use the N-prefix or not. For most of the databases out there it works pretty well when using it but I am not quite sure about DB2. I also have the problem that I do not have a DB2 database at hand where I could test these statements. :-(
Thanks a lot!

The documentation on constants (as of DB2 9.7) says this about graphic strings:
A graphic string constant specifies a varying-length graphic string consisting of a sequence of double-byte characters that starts and ends with a single-byte apostrophe ('), and that is preceded by a single-byte G or N. The characters between the apostrophes must represent an even number of bytes, and the length of the graphic string must not exceed 16 336 bytes.

I have never heard of this in context of DB2. Google learns me that this is more MS SQL Server specific. In DB2 and every other decent RDBMS you only need to ensure that the database is using the UTF-8 charset. You normally specify that in the CREATE statement. Here's the DB2 variant:
CREATE DATABASE my_db USING CODESET UTF-8;
That should be it in the DB2 side. You don't need to change the standard SQL statements for that. You also don't need to worry about Java as it internally already uses Unicode.

Enclosing the unicode string constant within N'' worked through JDBC application for DB2 DB.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting UTF-8 character issue in Oracle and Java - java

You need to specify useUnicode=true&characterEncoding=UTF-8 while getting the connection from the database. Also make sure the column supports UTF-8 encoding. Go through Oracle documentation.

Related

Storing Unicode and special characters in MySQL tables

How does hyphen, en-dash and em-dash gets stored in oracle database having character set WEISO98859P1?

Hibernate: is there any way save object with utf-8 characters to table in latin1 encoding?

Extra spaces in the end while retrieving CHAR values

DB2 database using unicode

Categories

Resources