I'm currently working with pulling a CSV file from a URL and modifying it's entries. I'm currently using a StreamReader to read each line of the CSV and split it into an array, where I can modify each entry based on its position.
The CSV is generated from an e-form provider where a particular form entry is a Multi-Line field, where a user can add multiple notes. However, when a user enters a new note, they are separating each note by a line return.
CSV Example:
"FName","LName","Email","Note 1: some text
Note 2: some text"
Since my code is splitting each CSV entry by line, once it reaches these notes, it believes it to be a new CSV entry. This is causing my code that modifies the entries to not work since the element positions become incorrect. (CSV entries with empty or single line note fields work fine)
Any ideas on the best approach to take for this? I've tried adding code to replace carriage returns or to skip empty lines but it doesn't seem to help.
You can check for first column value in a row is null or not. If it is null continue to read next line.
Assuming the CSV example you have provided is supposed to be just one entry in the CSV file (with the last field spanning over several different lines due to newline breaks), you could try something like this, using 2 loops.
Keep a variable for the current CSV record (of String[] type) currentRecord and a recordList (a List or an array) to keep all the CSV records.
Read a line of the CSV file
Split it into an array of strings using the comma as the delimiter. Keep this array in a temporary variable.
If the size of this array is 1, append this string to the last element (4th) in currentRecord (if currentRecord is not null).
Keep reading lines off the CSV file, and repeating step 4 until the array size is 4.
If the size is 4, then this indicates that the record is the next record in the CSV file and you can add the currentRecord to recordList
Keep repeating steps 2 to 6 until you reach the end of the CSV file
It would be better if you can remove the line breaks in the field and clean the CSV file before parsing it though. It'll make things much simpler.
Use a proper CSV library to handle the writing and parsing. There's a few edge cases to handle here, not only the new line. Users could also insert commas or quotes in their notes and it will become very messy to handle this by yourself.
Try uniVocity-parsers as it can handle all sorts of situations when parsing and writing CSV.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
Related
I am getting a big production data in a csv file where some records are broken into two or more lines due to new line characters in the record. But the problem is that the field values that are broken into multiple lines are not enclosed in the double quotes.
I am parsing the file using OpenCSV API (using readNext method of **
com.opencsv.CSVReader**) in java. Do we have anything in OpenCSV or in any other API that identify the record split and enclose the splitted values in the record with double quotes, so that readNext method consider it as one record in-spite of new line character involved in the record.
Actually the point is, I don't want to write any complex logic to assemble the splitted records, as there can be column values for the record that can be less than or equal to the header count in the file.
I am reading a csv file in java using CSVReader. It is "working" but I've found a minor problem when I try to split the line into an array of strings like this:
reader = new CSVReader(new FileReader(csvFile));
line = reader.readNext()
String[] lineDetail = line[0].split(";", -1);
Here is my problem: The line below work correctly:
[ABEL MESQUITA JR.;178957;1;2015;RR;DEM;55;1;MANUTENÇÃO DE ESCRITÓRIO DE APOIO À ATIVIDADE PARLAMENTAR;0;;WM PAPELARIA E ESCRITÓRIO;12132854000100;3592;0;2017-04-26 00:00:00;296;0;296;4;2017;0;;;1377952;5828;0;3074;6266962]
But the line below when I try to read using CSVReader, results in 3 arrays of strings:
[ABEL MESQUITA JR.;178957;1;2015;RR;DEM;55;3;COMBUSTÍVEIS E LUBRIFICANTES.;1;Veículos Automotores;B.B. PETROLEO LTDA;03625917000170;4339;0;2017-01-31 00:00:00;4007, 06;0;4007, 06;1;2017;0;;;1354058;5711;0;3074;6196889]
The arrays look like this:
ABEL MESQUITA JR.;178957;1;2015;RR;DEM;55;3;COMBUSTÍVEIS E LUBRIFICANTES.;1;Veículos Automotores;B.B. PETROLEO LTDA;03625917000170;4339;0;2017-01-31 00:00:00;4007
06;0;4007
06;1;2017;0;;;1354058;5711;0;3074;6196889
I think the problem is because of this value: 4007, 06 since in the first line the value is an integer 296.
Does anyone know how to make the CSVReader returns only one array, instead of 3?
Thanks in advance!!
EDIT 1
The result that I need is the second and third array concatenated with the first. So I would have the 4007,06 together instead of separated.
Your line of CSV data is getting split on comma, which is default CSV field separator.
To avoid splitting on commas, before reading your CSV file initialize your CSV reader to use some unused character (for example Tab) as CSV field separator. This way you will get one field per row. But your expectation seems to be pointless, because normally there should be easier way:
Why you don't configure CSV field separator to ; so it will split your CSV directly into semicolon-separated fields and you won't need to do additional splitting? This will also solve your problem with commas.
Your line[0].split(";", -1) is basically a bug, because it won't be able to split this valid CSV into 2 values:
Value 1; "Value;2"
Using your approach you will get 3 values instead.
To get further advice on CSVReader, please add the information which one are you using (by denoting its package).
I am writing some values in csv file but the value containing commas get split into >1 once
e.g. a,b,c is one value and should appear in 1 cell but it's appearing in 3 cells.
writer.append(node.getLongName());
this is how I am writing data into csv files using FileWriter. If node.getLongName() gives me value having commas then value is split according to internal comma.
Can anyone please tell how to make this work and avoid splitting of value.
You are writing in to a CSV file but do you know out of your source file which fields should not be separated. If you do then you can change the seperator for that field from comma to some other seperator like '+' and than append with the other element of the CSV. As an example:
10/09/2016, cycling club,(sam+1000+oklahoma),(henry+ 1001+california),( bill+1002+NY)
Here inside the parenthesis It has the details of students. They were command separated before but I changed it to plus sign.
Although is can be manipulated by hand for trivial tasks, CSV format is tricky as soon as you need to process delimiter or new line escaping.
Unless you want to do the heavy testing yourself for all corner cases, you best bet is to rely on a well known CSV library like the one from apache.
Here it is still simple enough (assuming you only need to escape commas), and the common usage is to quote fields containing blanks or delimiters. That means to not write a,b,c but "a,b,c":
writer.append("\"" + node.getLongName()+ "\"");
i want to write strings to a textfile, everytime to the bottom of the file. And then if im searching for a certain string in the textfile and finds it, i want to replace that line with another.
I'm thinking this: Count rows in textfile and add +1 and then write the string i want to write to that index. But is it even possible to write to a certain linenumber in a textfile?
And how about to update a certain row to another string ?
thanks!
You do not want to do that: it is a recipe for disaster. If, during the original file modification, you fail to write to it, the original file will be corrupted.
Use a double write protocol, write the modified file to another file, and only if the write suceeds, rename that file to the original.
Provided your file is not too big, for some definition of "big", I'd recommend creating a List<String> for the destination file: read the original file line by line, add to that list; once the list processing is complete (your question is unclear what should really happen), write each String to the other file, flush and close, and if that succeeds, rename to the original.
If you want to append strings, the FileOutputStream does have an alternate constructor which you can set to true so you can open for appending.
If you'd like, say, to replace strings into a file without copying it, your best bet would be to rely in RandomAccessFile instead. However, if the line length is varying, this is unreliable. For fixed-length records, this should work as such:
Move to the offset
Write
You can also 'truncate' (via setLength), so if there's a trailing block you need to get rid, you could discard as such.
A Third Solution would be to rely in mmap. This requires on a Memory-Mapped Bytebuffer for the whole file. I'm not considering the whole feasibility of the solution (it works in plain C), but that actually 'looks' the more correct, if you consider both the Java Platform + the Operating System
What will be the most eficient way to split a file in Java ?
Like to get it grid ready...
(Edit)
Modifying the question.
Basically after scouring the net I understand that there are generally two methods followed for file splitting....
Just split them by the number of bytes
I guess the advantage of this method is that it is fast, but say I have all the data in a line and suppose the file split puts half the data in one split and the other half the data in another split, then what do I do ??
Read them line by line
This will keep my data intact, fine, but I suppose this ain't as fast as the above method
Well, just read the file line by line and start saving it to a new file. Then when you decide it's time to split, start saving the lines to a new place.
Don't worry about efficiency too much unless it's a real problem later.
My first impression is that you have something like a comma separated value (csv) file. The usual way to read / parse those files is to
read them line by line
skip headers and empty lines
use String#split(String reg) to split a line into values (reg is chosen to match the delimiter)