I have a csv file, that I want to import to MySQL, but LINES TERMINATED doesn't work with '\n'. I tried to replace \n with '\r' or '\r\n', but it still doesn't work.
If I open my file in HEX editor it is obvious, that my Java App (that writes this file) works fine ('\n' terminators are highlighted).
But when I run
LOAD DATA LOCAL INFILE '/home/vsafonov/testDir/test.csv'
INTO TABLE express.objs
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
I get
If I replace '\n' for new lines with some symbol (for example ';'), LOAD DATA INFILE works fine. But I have no ideas, why it is impossible to load file with '\n' line terminator. Some thoughts?
Table's CREATE TABLE
CREATE TABLE IF NOT EXISTS `test_objs` (
`id` bigint(20) NOT NULL,
`object_id` bigint(20) DEFAULT NULL,
`next_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_unicode_ci;
My file example:
id,object_id,next_id
"1866227","98363301","156715750"
"293","171","454"
"1890275","171","177646470"
For me with query worked
LOAD DATA LOCAL INFILE '/home/vsafonov/testDir/test.csv'
INTO TABLE express.objs
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
IGNORE 1 LINES;
Default terminator worked fine with my '/n'.
Terminator in my query was accepted literally (didn't understand why).
Related
I have a txt file that contains the following
SELECT TOP 20 personid AS "testQu;otes"
FROM myTable
WHERE lname LIKE '%pi%' OR lname LIKE '%m;i%';
SELECT TOP 10 personid AS "testQu;otes"
FROM myTable2
WHERE lname LIKE '%ti%' OR lname LIKE '%h;i%';
............
The above query can be any legit SQl statement (on one or multiple lines , i.e. any way user wishes to type in )
I need to split this txt and put into an array
File file ... blah blah blah
..........................
String myArray [] = text.split(";");
But this does not work properly because it take into account ALL ; . I need to ignore those ; that are within ";" AND ';'. For example ; in here '%h;i%' does not count because it is inside ''. How can I split correctly ?
Assuming that each ; you want to split on is at the end of line you can try to split on each ; + line separator after it like
text.split(";"+System.lineSeparator())
If your file has other line separators then default ones you can try with
text.split(";\n")
text.split(";\r\n")
text.split(";\r")
BTW if you want to include ; in split result (if you don't want to get rid of it) you can use look-behind mechanism like
text.split("(?<=;)"+System.lineSeparator())
In case you are dynamically reading file line-by-line just check if line.endsWith(";").
I see a 'new line' after your ';' - It is generalizable to the whole text file ?
If you must/want use regular expression you could split with a regex of the form
;$
The $ means "end of line", depending of the regex implementation of Java (don't remember).
I will not use regex for this kind of task. Parsing the text and counting the number of ' or " to be able to recognize the reals ";" delimiters is sufficient.
In my application I am reading a csv file into DB through "load data local infile" filename command in sql. In case when a back slash comes in one of the field the adjacent field get merged. How to ignore the back slash when reading a file into DB.
Example,
"abcd", "efgh\", "ijk"
it goes to table as
col1 | col2 | col3
abcd | efghijk | null
where I want this to go as
col1 | col2 | col3
abcd | efgh | ijk
any pointer would be helpful.
Thanks,
Ashish
By default LOAD DATA uses \ as the escape character. Consider your input:
"abcd", "efgh\", "ijk"
That sequence \" is interpreted as a literal non-enclosing quote, not a backslash followed by a quote.
The best solution is to properly escape backslashes in your CSV file, e.g.:
"abcd", "efgh\\", "ijk"
If you cannot do that, you can disable escaping in your LOAD DATA INFILE statement by adding ESCAPED BY '' to the statement. That will prevent it from recognizing \ as an escape character, but keep in mind it will disable all other escape sequences in your input file as well. That will also import efgh\, the backslash will not be ignored.
If importing efgh\ is unacceptable then you will have to fix the format of your input file, or remove the trailing \ later on in your application logic or with another SQL query.
See MySQL LOAD DATA INFILE Syntax for more information about file format options.
Hope that helps.
I´ve been having this problem for a long time, I´ve searched the internet many times for the solution, tried lots of them but not found an adequate solution.
I really don´t know what to do so if you could please help me I´d be very thankful.
(Sorry for my poor english).
Question: How can I solve the charset incompatibility between the input archive and a MYSql table?
Problem: When importing the archive from on my computer the information appears in my database, but some chars as ('ã', 'ç', 'á', etc..) are shown as ?.
Aditional information
I'm using MYSql, my version and variable status are:
MySQL VERSION : 5.5.10
HOST : localhost
USER : root
PORT : 3306
SERVER DEFAULT CHARSET : utf8
character_set_client : utf8
character_set_connection : utf8
character_set_database : utf8
character_set_filesystem : BINARY
character_set_results : utf8
character_set_server : utf8
character_set_system : utf8
collation_connection : utf8_general_ci
collation_database : utf8_general_ci
collation_server : utf8_general_ci
completion_type : NO_CHAIN
concurrent_insert : AUTO
The query that´s being used is:
LOAD DATA LOCAL INFILE 'xxxxx/file.txt'
INTO TABLE xxxxTable
FIELDS TERMINATED BY ';'
LINES TERMINATED BY ' '
IGNORE 1 LINES
( status_ordenar,numero,newstatus,rede,data_emissao,inicio,termino,tempo_indisp
, cli_afet,qtd_cli_afet,cod_encerr,uf_ofensor,localidades,clientes_afetados
, especificacao,equipamentos,area_ofens,descricao_encerr,criticidade,cod_erro
, observacao,id_falha_perc,id_falha_conf,nba,solucao,falhapercebida,falhaconfirmada
, resp_i,resp_f,resp_ue,pre_handover,falha_identificada,report_netcool,tipo_falha
, num_notificacao,equip_afetados,descricao)
About the file being imported:
I´ve opened the file with open office whith 3 charsets:
UTF8 - Gave me strange chars in place of the 'ç', 'ã', etc...
ISO-8859-1 - OK.
WIN-1252 - OK.
ASCII/US - OK.
Already tested: I´ve tested some charsets in my database: latin1, utf-8, ascii, but all of them gave me the same result (? instead of 'á', 'ç' etc).
Extra: I'm using Java with Java JDBC to generate and send the query.
file.txt is saved in ISO-8859-1 or Windows-1252 (these two are very similar), and being interpreted as UTF-8 by MySQL. These are incompatible.
How can I tell?
See point 3.: the file displays correctly when interpreted as ISO-8859-1 or Windows-1252.
See point 1.: character_set_database : utf8
Solution: either convert the file to UTF-8, or tell MySQL to interpret it as ISO-8859-1 or Windows-1252.
Background: the characters you provide (ã etc.) are single-byte values in windows-1252, and these bytes are illegal values in UTF-8, thus yielding the '?'s (unicode replacement characters).
Snippet from MySQL docs:
LOAD DATA INFILE Syntax
The character set indicated by the character_set_database system variable is used to interpret the information in the file.
Saved your characters with standard Windows Notepad as UTF-8 file (Notepad++ is also OK).
Exact file content:
'ã', 'ç', 'á'
MySQL version: 5.5.22
Database charset: utf8
Database collation: utf8_general_ci
CREATE TABLE `abc` (
`qwe` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Imported data with command
LOAD DATA LOCAL INFILE 'C:/test/utf8.txt'
INTO TABLE abc
FIELDS TERMINATED BY ';'
LINES TERMINATED BY ' '
IGNORE 1 LINES
( qwe)
Result (displayed in SQLyog):
So, first - you should check original file with reliable editor (notepad, notepad++). If file corrupted, then you should take another file.
Second - if file is OK, add you Java code for sending data to MySql into question.
i want to load csv arabic file into mysql
I tried the following
set session collation_database=utf8_general_ci;
set session character_set_database=utf8 ;
#SET NAMES utf8;
#SET character_set_database=utf8;
LOAD DATA local INFILE "D:\\trade20120314.csv"
INTO TABLE trade
CHARACTER SET utf8
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n' ;
but the arabic columns still appears as
"?????? ????????????????".
I am sure that the csv file is encoded to utf8 and the default character set of server and client is as utf8 and also the columns character set and collation as utf8 .
I don't know where is the problem , any suggestions .
Thanks .
Try adding to the JDBC URI ?useUnicode=true&characterEncoding=UTF-8 (or &...). The driver class also has to know the transfer protocol. (Of course the file should be in UTF-8.)
I am wondering with this behavior. In my application I am getting data from server , or my own created database. ( I clone server database)
.replaceAll ( "\r\n" , "<br/>" ) ;
When the data is come from server that it replace. But When data is get from sqlite database its unable to replace the above. As I have try .replaceAll ( "a" , "??" ) ; and its working.
The database data is
Bradley Ambrose is the freelance cameraman who recorded the John Key and John Banks tea meeting.\r\n\r\nHe intentionally placed a black bag with a recording device on the table where Key and Banks were sitting, although he claims it was a mistake, If that were true then how did so many people get a copy of it???\r\n\r\nAlso this guy bloody changed his name from Brad White what the hell is this guy an international man of mystery or something.
I have also debug that issue in detail. But the is not replaced even code is executed the above line successfully.
I have also try
replaceAll ( "\n" , "<br/>" )
replaceAll ( "\r" , "<br/>" )
There is debugging picture.
Does the input string contain actual CR and LF characters or pairs of \ and r and \ and n?
The regex won't work in latter case. It would require .replaceAll("\\\\r\\\\n" , "<br/>")
Can you try with Pattern#quote() ?
Something like:
System.out.println("hello\r\n\r\n something".replaceAll(Pattern.quote("\r\n"), ""));
The code is fine. The data you are seeing in the debug screen is wrong. Do the same debug session and insert a system.out.println and check the output with the output in the debug screen.
Unless you you mean the database actually has the string "\r\n". The above assumes that the database actually contains the carrige return and line feed characters. If your database actually has the backslash character followed by the 'n' character then your regex needs a simple tweak. s.replaceAll("\\\\r\\\\n", "")