CSV to H2 - character encoding missmatch - java

In my app, I:
let Hibernate create H2 DB
populate DB through JDBC SQL statement with CSV import (INSERT INTO ... SELECT ... FROM CSVREAD(file.csv)). File is in UTF-8 encoding.
On Linux special characters in the DB are correct.
On Windows (default encoding cp1250) special characters are incorrect.
When I try different CSV file encoding (cp1250, iso-8859-2), it works on Windows, but not on Linux.
Is there any way to tell H2 it needs to respect UTF-8 encoding on Windows?

UTF-8 needs to be set in the options parameter of the CSVREAD function, as follows:
CSVREAD('file.csv', null, 'charset=UTF-8')

Related

Retrieving data using JDBC in utf-8 from a mysql server where charset is set to latin1

The schema and tables all have charset and collation of Latin1, but if I try to retrieve Chinese characters from table, it only gives ?????. I don't have access to change the schema or tables properties. How can I convert the charset to actual characters in JDBC?

Exporting HSQLDB database with UTF-8 encoding

I'm trying to export the GeoTools HSQL 2 database and load it back into HSQL 1 for a legacy system that needs the older database format. The tables include characters like the degree symbol. However, it's coming out as the escape sequence \u0080 rather the encoded character. I need to either fix that or have HSQL 1 import convert the escaped characters back into the correct encoding.
e.g.
cp modules/plugin/epsg-hsql/src/main/resources/org/geotools/referencing/factory/epsg/EPSG.zip /tmp
cd /tmp
unzip EPSG.zip
java -jar hsqldb-2.4.1.jar
# For the file, put jdbc:hsqldb:file:/tmp/EPSG
SELECT 'epsg-dump'
And in the results I see things like this \u00b5:
INSERT INTO EPSG_ALIAS VALUES(389,'epsg_unitofmeasure',9109,7302,'\u00b5rad','')
Looking into hsqldb, I'm not sure how to control the encoding the of the data being written, assuming that this is the correct location to look:
https://github.com/ryenus/hsqldb/blob/master/src/org/hsqldb/scriptio/ScriptWriterText.java
You can use the following procedure:
In the source database, create TEXT tables with exactly the same columns as the original tables. Use CREATE TEXT TABLE thecopyname (LIKE thesourcename) for each table.
Use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8' for each of the copy tables.
INSERT into each thecopyname table with SELECT * FROM thesourcename.
Use SET TABLE thecopyname SOURCE OFF for each thecopyname
You will now have several thecopyname.csv files (each with its own name) with UTF8 encoding.
Use the reverse procedure on the target database. You need to explicity create the TEXT tables then use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8'
The encoding looks like Unicode (one to four hex digits).
Try this in bash (quick & dirty):
echo -ne "$(< dump.sql)" > dump_utf8.sql

Unicode messages in oracle raise_application_error

I am working on enabling globalization support in my DB.
I have done migrating character set to UTF (AL16UTF16).
After migration, I can pass Unicode characters from Java to Oracle and store in table's NVARCHAR2 column. Also I can retrieve from DB and pass to Java.
But, If I do a raise_application_error with the Unicode data. It sends the error message to java like below
; nested exception is java.sql.SQLException: ORA-20001: ¿¿¿ ¿¿¿¿¿¿¿¿¿
Can anyone tell me what's wrong? and how can I get the Unicode error messages in java?
Thanks in advance.
The problem is I have done character set migration using the below steps, but it doesn't work for me.
1.Backup the database.
2.Run CSSCAN command.
3.Restart the DB with RESTRICT mode.
4.Run CSALTER script.
5.Restart the DB.
After that I have tried using the below steps.
1.Take backup of the DB using expdp command.
2.Create a new database with required character set (Unicode AL32UTF8).
3.Import the backup dump file into the newly created DB.
That's all. It works!
Now I don't need to use NVARCHAR2 data type to store unicode data (VARCHAR2 itself stores Unicode). raise_application_error also works fine (sends error messages with Unicode data to Java).
Thanks.

UTF-8 Queries with JDBC

i want to ask the MYSQL an UTF-8 Query but it does not work fine . when i try the following query , the result comes up truly :
String query = "select * from Terms where Term = 'lol'";
but with the following query doesn't make a response :
String query = "select * from Terms where Term = 'خدابخش'";
where the
'خدابخش'
part is in Persian and UTF-8 .
note that the connection to the database is fine .
Chances are that you may need to set your character encoding in your JDBC connection. If you are using MySQL JDBC Connector you do it using the property characterEncoding. Somewhat like this:
jdbc:mysql://localhost/some_db?useUnicode=yes&characterEncoding=UTF-8
You may want to read the reference on encoding and character sets in your connector JDBC documentation.
This is the one that mentions the use of characterEncoding for the MySQL JDBC Connector:
Connector JDBC: Using Character Sets and Unicode
One or more of the following is true:
The Java compiler, compiling your code, is set to read the source file with a different encoding in which the source file was actually stored. In other words, there is a discrepancy between the encoding that your editor uses, the encoding in which the file is actually saved, and the encoding with which the Java compiler is reading your source code.
Your database isn't set correctly to accept/store Unicode characters. Ensure that your database is set correctly. Looks like you're using MySQL. You may want to create a dump of the database using mysqldump and witness how the database was created with respect to character sets.

Insert query with latin1_swedish_ci charset

I made a program that is generating me a INSERT queries for a MySql database. The database has 2 fields that are encoded with latin1_swedish_ci charset. If I run the query from PhpMyAdmin when I preview the content with my php script some special char (like "ó", "é"...) is not showed correctly.
I think the problem is that Java is encoding String as utf8 charset so when i copy paste the query in the phpmyadmin and i run it, the inserted record is generated with the wrong charset. How can I generate the correct query (with the correct charset) in Java?
Thanks In advance for your help

Categories