Invalid utf8 character string when importing csv file into MySQL database - java

I use the following command to import data from a .csv file into a MySQL database table like so:
String loadQuery = "LOAD DATA LOCAL INFILE '" + file + "' INTO TABLE source_data_android_cell FIELDS TERMINATED BY ','" + "ENCLOSED BY '\"'"
+ " LINES TERMINATED BY '\n' " + "IGNORE 1 LINES(.....)" +"SET test_date = STR_TO_DATE(#var1, '%d/%m/%Y %k:%i')";
However, as one of the columns in the sourcefile contains a really screwy data which is: viva Y31L.RastaMod䋢_Version the program refuses to import the data into MySQL and keeps throwing this error:
java.sql.SQLException: Invalid utf8 character string: 'viva
Y31L.RastaMod'
I searched up on this but cant really understand what exactly the error was, other than that the INPUT format of this string "viva Y31L.RastaMod䋢_Version" was wrong and didn't fit the utf8 format used in the MySQL database?
However, I already did the following which is SET NAMES UTF8MB4 in my MySQL db, since it was suggested in other questions that UTF8MB4 was more flexible in accepting weird characters.
I explored this further by manually inserting that weird data into MySQL database table in the Command Prompt, which worked fine. In fact, the table displayed almost the full entry: viva Y31L.RastaMod?ã¢_Version. But if I ran my program from the IDE the file gets rejected.
Would appreciate any explanations.
Second minor question related to the import process of csv file into mySQL:
I noticed that I couldn't import a copy of the same file into the MySQL database. Errors thrown included that the data was a duplicate. Is that because MySQL rejects duplicate column data? But when I changed all the data of one column leaving the rest the same in that copied file, it gets imported correctly. Why is that so?

I don't think this immediate error has to do with the destination of the data not being able to cope with UTF-8 characters, but rather the way you are using LOAD DATA. You can try specifying the character set which should be used when loading the data. Consider the following LOAD DATA command, which is what you had originally but slightly modified:
LOAD DATA LOCAL INFILE path/to/file INTO TABLE source_data_android_cell
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES(.....)
SET test_date = STR_TO_DATE(#var1, '%d/%m/%Y %k:%i')
This being said, you should also make sure that the target table uses a character set which supports the data you are trying to load into it.

Related

error while importing data to postgres with java

well my problem is that i need to import data to a postgres table through java from a csv file, the thing is when i import the data with PGadmin4 it is imported without errors, however when i
execute this sql statement with java COPY account FROM 'path/account.csv' DELIMITER ',' QUOTE '"' HEADER CSV an exception occures stating this :
was aborted: ERROR: date/time field value out of range: "24/3/1995"
Indice : Perhaps you need a different "datestyle" setting.
i have checked the datetime and it's dmy

Exporting HSQLDB database with UTF-8 encoding

I'm trying to export the GeoTools HSQL 2 database and load it back into HSQL 1 for a legacy system that needs the older database format. The tables include characters like the degree symbol. However, it's coming out as the escape sequence \u0080 rather the encoded character. I need to either fix that or have HSQL 1 import convert the escaped characters back into the correct encoding.
e.g.
cp modules/plugin/epsg-hsql/src/main/resources/org/geotools/referencing/factory/epsg/EPSG.zip /tmp
cd /tmp
unzip EPSG.zip
java -jar hsqldb-2.4.1.jar
# For the file, put jdbc:hsqldb:file:/tmp/EPSG
SELECT 'epsg-dump'
And in the results I see things like this \u00b5:
INSERT INTO EPSG_ALIAS VALUES(389,'epsg_unitofmeasure',9109,7302,'\u00b5rad','')
Looking into hsqldb, I'm not sure how to control the encoding the of the data being written, assuming that this is the correct location to look:
https://github.com/ryenus/hsqldb/blob/master/src/org/hsqldb/scriptio/ScriptWriterText.java
You can use the following procedure:
In the source database, create TEXT tables with exactly the same columns as the original tables. Use CREATE TEXT TABLE thecopyname (LIKE thesourcename) for each table.
Use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8' for each of the copy tables.
INSERT into each thecopyname table with SELECT * FROM thesourcename.
Use SET TABLE thecopyname SOURCE OFF for each thecopyname
You will now have several thecopyname.csv files (each with its own name) with UTF8 encoding.
Use the reverse procedure on the target database. You need to explicity create the TEXT tables then use SET TABLE thecopyname SOURCE 'thecopyname.csv;encoding=UTF-8'
The encoding looks like Unicode (one to four hex digits).
Try this in bash (quick & dirty):
echo -ne "$(< dump.sql)" > dump_utf8.sql

Insert BLOB in Derby Database using SQL

Currently I'm trying to import an SQL-INSERT dump from a postgres database into my Derby development/testing database using the Eclipse's Data Tools SQL scratchpad. The export created a lot of data that looked like the following:
CREATE TABLE mytable ( testfield BLOB );
INSERT INTO mytable ( testfield ) VALUES ('\x0123456789ABCDEF');
Executing it in Eclispe's SQL Scratchpad results in (translated from german):
Columns of type 'BLOB' shall not contain values of type 'CHAR'.
The problem seems, that the PostgreSQL admin tool exported BLOB data in a format like '\x0123456789ABCDEF' which is not recognized by Derby (Embedded).
Changing this to X'0123456789ABCDEF' or simply '0123456789ABCDEF'did also not work.
The only thing that worked was CAST (X'123456789ABCDEF' AS BLOB), but I'm not yet sure, if this results in the correct binary data when read back in Java and if the X'0123456789ABCDEF'is 100% portable.
CAST (...whatever... AS BLOB) doesn't work in java DB / Apache DERBY!
One must use the built-in system procedure
SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_FROM_EXTFILE. I do not think there is any other way. For instance:
CALL SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_FROM_EXTFILE (
'MYSCHEMA', 'MYTABLE',
'MY_KEY, MY_VARCHAR, MY_INT, MY_BLOB_DATA',
'1,3,4,2',
'c:\tmp\import.txt', ',' , '"' , 'UTF-8',
0);
where the referenced "import.txt" file will be CSV like (as specified by the ',' and '"' arguments above) and contain as 2nd field (I scrambled the CSV field order versus DB column names order on purpose to illustrate) a file name that contains the binary data in proper for the BLOB's. For instance, "import.txt" is like:
"A001","c:\tmp\blobimport.dat.0.55/","TEST",123
where the supplied BLOB data file name bears the convention "filepath.offset.length/"
Actually, you can first export your table with
CALL SYSCS_UTIL.SYSCS_EXPORT_TABLE_LOBS_TO_EXTFILE(
'MYSCHEMA', 'MYTABLE', 'c:\tmp\export.txt', ',' ,'"',
'UTF-8', 'c:\tmp\blobexport.dat');
to generate sample files with the syntax to reuse on import.

Data mismatch while copy to * CSV file

Runnning sql query to export the contents to CSV file i notice that certain columns do not get displayed properly.in my cast date timestamp is not properly displayed in cell
Code as below :
COPY (select hostname as "Host Name",devicetype as "Device Type",platform as "Model",Ipaddress as "IP Address",swversion as "Software Version",configuredTime as "Configured",activeTime as "Active",cluster as "Clusters",location as "Location",macaddress as "Mac Address",devicepool as "Device Pool" from (select getmodelinfo.ipaddress,max(getmodelinfo.hostname) as
hostname, max(getmodelinfo.macaddress) as macaddress,
max(getmodelinfo.devicetype) as devicetype,
max(getmodelinfo.platform) as platform,
max(getmodelinfo.swversion) as swversion, min(getmodelinfo.day_end_date) as configuredtime,
max(getmodelinfo.active_end_date) as activetime,
max(getmodelinfo.ucmclustername)
as cluster, max(getmodelinfo.ucmlocation) as location,
max(getmodelinfo.ucmdevicepool) as devicepool from (select
pcwh_inv.uniquedeviceid,pcwh_inv.ipaddress,
pcwh_inv.endpointmodel, pcwh_inv.hostname, pcwh_inv.devicetype, pcwh_inv.platform,
pcwh_inv.macaddress, pcwh_inv.version as
swversion,pcwh_inv.deployed_day_end_date as day_end_date,
pcwh_inv.lastupdated_day_end_date as active_end_date,
pcwh_inv.ucmclustername,pcwh_inv.ucmlocation,pcwh_inv.ucmdevicepool
from pcwh_inventory_20160410 pcwh_inv,
(select pcwh_inv.uniquedeviceid, max(pcwh_inv.lastupdated_day_end_date) as
lasttime from pcwh_inventory_20160410 pcwh_inv where pcwh_inv.ipaddress
notnull group by
pcwh_inv.uniquedeviceid)gettime where pcwh_inv.endpointmodel = 'SX' and
pcwh_inv.uniquedeviceid = gettime.uniquedeviceid and
pcwh_inv.lastupdated_day_end_date = gettime.lasttime and pcwh_inv.mgmtstatus not in ('Deleted')
)getmodelinfo group by ipaddress) M) TO '/opt/emms/emsam/export/raj2.csv' DELIMITER ',' CSV HEADER
Please find below screenshot of the result I get on running query without exporting to csv , on export to csv (image below) , image of cell content when i click on particular row content it shows correctly though(bottom most image)
It seems to me an issue with formatting csv file could you please let me know how I could do this from query level to display exact contents as in db column?
I believe nothing is wrong with your code, but that Excel is treating that column as something other than a date, this happens often. It can be resolved by selecting all of the affected cells, right clicking, and changing the format to text through the subsequent menus.
If this CSV is going to be manipulated further by another application, I don't think you will have an issue, as the data is seemingly fine.
If for whatever reason you need it to be correctly displayed in Excel dynamically, you may need to look into something like this.. Java - Excel Cell Type
The problem is straight forward there is no error with CSV format.Actually the data entries done by you are in the wrong way u can't have a space in a data entry that will cause a problem.
So in the 6th and 7th column u can see there is a space between the data entries so due to which problem is taking place.
To prevent that just add a new column for date.
It will solve all the problems and the data will be entered in your database.

Handling word with comma in between in CSV file for input to LOAD DATA LOCAL INFILE

I am working on a java application which inserts data from a CSV file to mysql database table using below query
LOAD DATA LOCAL INFILE 'test.csv' INTO TABLE Test FIELDS TERMINATED BY ',' LINES TERMINATED BY ', ' IGNORE 1 LINES (id, name, address)
I have a issue when there is a comma between words for the a field like for address = pune, india. It is pushing the word after comma to next column.
I have found some workaround, such as escaping comma in Java code like
strAddress = strAddressWithComma.replaceAll(",", "\\"+"\\,");
This works but after data is inserted in DB table it looks like “pune, india” (double quotes around the original string)
Another workaround is adding ENCLOSED BY ‘\”’ clause in LOAD DATA LOCAL INFILE and using below java code.
strAddress ="\"" + strAddressWithComma+ "\"";
But with this workaround also we have string with double quotes around the original string in DB when we use LOAD DATA LOCAL INFILE.
I do not need double quotes around original value in DB after
insertion with Load Data statement . DB entry should be same as that
of data in original csv file.
I know we may have a DB trigger which can strip double quotes from the new string but we want to handle this is application only.
Any solution or suggestion will be appreciated

Categories