read csv file with new line character as data java - java

I need your help here . Trying to figure out if there is a better solution than manipulation of data in the text file.
So i have a csv file where some of the data has new line characters in it. This file is generated from the salesforce portal
The issue occurs when i try to read this file line by line to insert in a sql table in the database.
BufferedReader bReader = new BufferedReader(new InputStreamReader(con.getInputStream()));
line = bReader.readLine()
interprets the new line in the data as end of line . as a result this line is interepreted as 2 lines.
Any idea how can handle such data while reading the file.
Any insights would be highly appreciated.
Thanks.

I assume you already know the answer for moffeltje's question in terms of how to decide which is really valid newline and which one is embeded. If that is the case then, you may use opencsv
One of the features it has "Handling quoted entries with embedded carriage returns (ie entries that span multiple lines)." This may help you working around the issue.

Another approach I can think of is to compare the last character of the line. each value in the .csv is in quotes . therefore, if the last character is not quote then append the next line to this till a quote is identified.

Related

How to remove comma from eachline of a csv file using java

Remove the comma from end of ecah line of a csv file using java
Ex:
"A","B","C","D","E",""
I need till "E"
Not quite sure, but I think you can achive that by using a Buffered Reader. Then you save the text as a result. Now you can cut out the comma's and put it back in using the BufferedWriter

Java - CSVReader split correctly with comma inside the values

I am reading a csv file in java using CSVReader. It is "working" but I've found a minor problem when I try to split the line into an array of strings like this:
reader = new CSVReader(new FileReader(csvFile));
line = reader.readNext()
String[] lineDetail = line[0].split(";", -1);
Here is my problem: The line below work correctly:
[ABEL MESQUITA JR.;178957;1;2015;RR;DEM;55;1;MANUTENÇÃO DE ESCRITÓRIO DE APOIO À ATIVIDADE PARLAMENTAR;0;;WM PAPELARIA E ESCRITÓRIO;12132854000100;3592;0;2017-04-26 00:00:00;296;0;296;4;2017;0;;;1377952;5828;0;3074;6266962]
But the line below when I try to read using CSVReader, results in 3 arrays of strings:
[ABEL MESQUITA JR.;178957;1;2015;RR;DEM;55;3;COMBUSTÍVEIS E LUBRIFICANTES.;1;Veículos Automotores;B.B. PETROLEO LTDA;03625917000170;4339;0;2017-01-31 00:00:00;4007, 06;0;4007, 06;1;2017;0;;;1354058;5711;0;3074;6196889]
The arrays look like this:
ABEL MESQUITA JR.;178957;1;2015;RR;DEM;55;3;COMBUSTÍVEIS E LUBRIFICANTES.;1;Veículos Automotores;B.B. PETROLEO LTDA;03625917000170;4339;0;2017-01-31 00:00:00;4007
06;0;4007
06;1;2017;0;;;1354058;5711;0;3074;6196889
I think the problem is because of this value: 4007, 06 since in the first line the value is an integer 296.
Does anyone know how to make the CSVReader returns only one array, instead of 3?
Thanks in advance!!
EDIT 1
The result that I need is the second and third array concatenated with the first. So I would have the 4007,06 together instead of separated.
Your line of CSV data is getting split on comma, which is default CSV field separator.
To avoid splitting on commas, before reading your CSV file initialize your CSV reader to use some unused character (for example Tab) as CSV field separator. This way you will get one field per row. But your expectation seems to be pointless, because normally there should be easier way:
Why you don't configure CSV field separator to ; so it will split your CSV directly into semicolon-separated fields and you won't need to do additional splitting? This will also solve your problem with commas.
Your line[0].split(";", -1) is basically a bug, because it won't be able to split this valid CSV into 2 values:
Value 1; "Value;2"
Using your approach you will get 3 values instead.
To get further advice on CSVReader, please add the information which one are you using (by denoting its package).

Skip empty lines while CSV parsing

I'm currently working with pulling a CSV file from a URL and modifying it's entries. I'm currently using a StreamReader to read each line of the CSV and split it into an array, where I can modify each entry based on its position.
The CSV is generated from an e-form provider where a particular form entry is a Multi-Line field, where a user can add multiple notes. However, when a user enters a new note, they are separating each note by a line return.
CSV Example:
"FName","LName","Email","Note 1: some text
Note 2: some text"
Since my code is splitting each CSV entry by line, once it reaches these notes, it believes it to be a new CSV entry. This is causing my code that modifies the entries to not work since the element positions become incorrect. (CSV entries with empty or single line note fields work fine)
Any ideas on the best approach to take for this? I've tried adding code to replace carriage returns or to skip empty lines but it doesn't seem to help.
You can check for first column value in a row is null or not. If it is null continue to read next line.
Assuming the CSV example you have provided is supposed to be just one entry in the CSV file (with the last field spanning over several different lines due to newline breaks), you could try something like this, using 2 loops.
Keep a variable for the current CSV record (of String[] type) currentRecord and a recordList (a List or an array) to keep all the CSV records.
Read a line of the CSV file
Split it into an array of strings using the comma as the delimiter. Keep this array in a temporary variable.
If the size of this array is 1, append this string to the last element (4th) in currentRecord (if currentRecord is not null).
Keep reading lines off the CSV file, and repeating step 4 until the array size is 4.
If the size is 4, then this indicates that the record is the next record in the CSV file and you can add the currentRecord to recordList
Keep repeating steps 2 to 6 until you reach the end of the CSV file
It would be better if you can remove the line breaks in the field and clean the CSV file before parsing it though. It'll make things much simpler.
Use a proper CSV library to handle the writing and parsing. There's a few edge cases to handle here, not only the new line. Users could also insert commas or quotes in their notes and it will become very messy to handle this by yourself.
Try uniVocity-parsers as it can handle all sorts of situations when parsing and writing CSV.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Java File Splitting

What will be the most eficient way to split a file in Java ?
Like to get it grid ready...
(Edit)
Modifying the question.
Basically after scouring the net I understand that there are generally two methods followed for file splitting....
Just split them by the number of bytes
I guess the advantage of this method is that it is fast, but say I have all the data in a line and suppose the file split puts half the data in one split and the other half the data in another split, then what do I do ??
Read them line by line
This will keep my data intact, fine, but I suppose this ain't as fast as the above method
Well, just read the file line by line and start saving it to a new file. Then when you decide it's time to split, start saving the lines to a new place.
Don't worry about efficiency too much unless it's a real problem later.
My first impression is that you have something like a comma separated value (csv) file. The usual way to read / parse those files is to
read them line by line
skip headers and empty lines
use String#split(String reg) to split a line into values (reg is chosen to match the delimiter)

Regarding Java Split Command CSV File Parsing

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.
I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.
opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.
If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882
Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.

Categories