I'm trying to export the GeoTools HSQL 2 database and load it back into HSQL 1 for a legacy system that needs the older database format. The tables include characters like the degree symbol. However, it's coming out as the escape sequence \u0080 rather the encoded character. I need to either fix that or have HSQL 1 import convert the escaped characters back into the correct encoding.
e.g.
cp modules/plugin/epsg-hsql/src/main/resources/org/geotools/referencing/factory/epsg/EPSG.zip /tmp
cd /tmp
unzip EPSG.zip
java -jar hsqldb-2.4.1.jar
# For the file, put jdbc:hsqldb:file:/tmp/EPSG
SELECT 'epsg-dump'
And in the results I see things like this \u00b5:
INSERT INTO EPSG_ALIAS VALUES(389,'epsg_unitofmeasure',9109,7302,'\u00b5rad','')
Looking into hsqldb, I'm not sure how to control the encoding the of the data being written, assuming that this is the correct location to look:
https://github.com/ryenus/hsqldb/blob/master/src/org/hsqldb/scriptio/ScriptWriterText.java
You can use the following procedure:
In the source database, create TEXT tables with exactly the same columns as the original tables. Use CREATE TEXT TABLE thecopyname (LIKE thesourcename) for each table.
Use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8' for each of the copy tables.
INSERT into each thecopyname table with SELECT * FROM thesourcename.
Use SET TABLE thecopyname SOURCE OFF for each thecopyname
You will now have several thecopyname.csv files (each with its own name) with UTF8 encoding.
Use the reverse procedure on the target database. You need to explicity create the TEXT tables then use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8'
The encoding looks like Unicode (one to four hex digits).
Try this in bash (quick & dirty):
echo -ne "$(< dump.sql)" > dump_utf8.sql
Related
I use the following command to import data from a .csv file into a MySQL database table like so:
String loadQuery = "LOAD DATA LOCAL INFILE '" + file + "' INTO TABLE source_data_android_cell FIELDS TERMINATED BY ','" + "ENCLOSED BY '\"'"
+ " LINES TERMINATED BY '\n' " + "IGNORE 1 LINES(.....)" +"SET test_date = STR_TO_DATE(#var1, '%d/%m/%Y %k:%i')";
However, as one of the columns in the sourcefile contains a really screwy data which is: viva Y31L.RastaMod䋢_Version the program refuses to import the data into MySQL and keeps throwing this error:
java.sql.SQLException: Invalid utf8 character string: 'viva
Y31L.RastaMod'
I searched up on this but cant really understand what exactly the error was, other than that the INPUT format of this string "viva Y31L.RastaMod䋢_Version" was wrong and didn't fit the utf8 format used in the MySQL database?
However, I already did the following which is SET NAMES UTF8MB4 in my MySQL db, since it was suggested in other questions that UTF8MB4 was more flexible in accepting weird characters.
I explored this further by manually inserting that weird data into MySQL database table in the Command Prompt, which worked fine. In fact, the table displayed almost the full entry: viva Y31L.RastaMod?ã¢_Version. But if I ran my program from the IDE the file gets rejected.
Would appreciate any explanations.
Second minor question related to the import process of csv file into mySQL:
I noticed that I couldn't import a copy of the same file into the MySQL database. Errors thrown included that the data was a duplicate. Is that because MySQL rejects duplicate column data? But when I changed all the data of one column leaving the rest the same in that copied file, it gets imported correctly. Why is that so?
I don't think this immediate error has to do with the destination of the data not being able to cope with UTF-8 characters, but rather the way you are using LOAD DATA. You can try specifying the character set which should be used when loading the data. Consider the following LOAD DATA command, which is what you had originally but slightly modified:
LOAD DATA LOCAL INFILE path/to/file INTO TABLE source_data_android_cell
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES(.....)
SET test_date = STR_TO_DATE(#var1, '%d/%m/%Y %k:%i')
This being said, you should also make sure that the target table uses a character set which supports the data you are trying to load into it.
Currently I'm trying to import an SQL-INSERT dump from a postgres database into my Derby development/testing database using the Eclipse's Data Tools SQL scratchpad. The export created a lot of data that looked like the following:
CREATE TABLE mytable ( testfield BLOB );
INSERT INTO mytable ( testfield ) VALUES ('\x0123456789ABCDEF');
Executing it in Eclispe's SQL Scratchpad results in (translated from german):
Columns of type 'BLOB' shall not contain values of type 'CHAR'.
The problem seems, that the PostgreSQL admin tool exported BLOB data in a format like '\x0123456789ABCDEF' which is not recognized by Derby (Embedded).
Changing this to X'0123456789ABCDEF' or simply '0123456789ABCDEF'did also not work.
The only thing that worked was CAST (X'123456789ABCDEF' AS BLOB), but I'm not yet sure, if this results in the correct binary data when read back in Java and if the X'0123456789ABCDEF'is 100% portable.
CAST (...whatever... AS BLOB) doesn't work in java DB / Apache DERBY!
One must use the built-in system procedure
SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_FROM_EXTFILE. I do not think there is any other way. For instance:
CALL SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_FROM_EXTFILE (
'MYSCHEMA', 'MYTABLE',
'MY_KEY, MY_VARCHAR, MY_INT, MY_BLOB_DATA',
'1,3,4,2',
'c:\tmp\import.txt', ',' , '"' , 'UTF-8',
0);
where the referenced "import.txt" file will be CSV like (as specified by the ',' and '"' arguments above) and contain as 2nd field (I scrambled the CSV field order versus DB column names order on purpose to illustrate) a file name that contains the binary data in proper for the BLOB's. For instance, "import.txt" is like:
"A001","c:\tmp\blobimport.dat.0.55/","TEST",123
where the supplied BLOB data file name bears the convention "filepath.offset.length/"
Actually, you can first export your table with
CALL SYSCS_UTIL.SYSCS_EXPORT_TABLE_LOBS_TO_EXTFILE(
'MYSCHEMA', 'MYTABLE', 'c:\tmp\export.txt', ',' ,'"',
'UTF-8', 'c:\tmp\blobexport.dat');
to generate sample files with the syntax to reuse on import.
I am working on enabling globalization support in my DB.
I have done migrating character set to UTF (AL16UTF16).
After migration, I can pass Unicode characters from Java to Oracle and store in table's NVARCHAR2 column. Also I can retrieve from DB and pass to Java.
But, If I do a raise_application_error with the Unicode data. It sends the error message to java like below
; nested exception is java.sql.SQLException: ORA-20001: ¿¿¿ ¿¿¿¿¿¿¿¿¿
Can anyone tell me what's wrong? and how can I get the Unicode error messages in java?
Thanks in advance.
The problem is I have done character set migration using the below steps, but it doesn't work for me.
1.Backup the database.
2.Run CSSCAN command.
3.Restart the DB with RESTRICT mode.
4.Run CSALTER script.
5.Restart the DB.
After that I have tried using the below steps.
1.Take backup of the DB using expdp command.
2.Create a new database with required character set (Unicode AL32UTF8).
3.Import the backup dump file into the newly created DB.
That's all. It works!
Now I don't need to use NVARCHAR2 data type to store unicode data (VARCHAR2 itself stores Unicode). raise_application_error also works fine (sends error messages with Unicode data to Java).
Thanks.
i want to ask the MYSQL an UTF-8 Query but it does not work fine . when i try the following query , the result comes up truly :
String query = "select * from Terms where Term = 'lol'";
but with the following query doesn't make a response :
String query = "select * from Terms where Term = 'خدابخش'";
where the
'خدابخش'
part is in Persian and UTF-8 .
note that the connection to the database is fine .
Chances are that you may need to set your character encoding in your JDBC connection. If you are using MySQL JDBC Connector you do it using the property characterEncoding. Somewhat like this:
jdbc:mysql://localhost/some_db?useUnicode=yes&characterEncoding=UTF-8
You may want to read the reference on encoding and character sets in your connector JDBC documentation.
This is the one that mentions the use of characterEncoding for the MySQL JDBC Connector:
Connector JDBC: Using Character Sets and Unicode
One or more of the following is true:
The Java compiler, compiling your code, is set to read the source file with a different encoding in which the source file was actually stored. In other words, there is a discrepancy between the encoding that your editor uses, the encoding in which the file is actually saved, and the encoding with which the Java compiler is reading your source code.
Your database isn't set correctly to accept/store Unicode characters. Ensure that your database is set correctly. Looks like you're using MySQL. You may want to create a dump of the database using mysqldump and witness how the database was created with respect to character sets.
I made a program that is generating me a INSERT queries for a MySql database. The database has 2 fields that are encoded with latin1_swedish_ci charset. If I run the query from PhpMyAdmin when I preview the content with my php script some special char (like "ó", "é"...) is not showed correctly.
I think the problem is that Java is encoding String as utf8 charset so when i copy paste the query in the phpmyadmin and i run it, the inserted record is generated with the wrong charset. How can I generate the correct query (with the correct charset) in Java?
Thanks In advance for your help