Remove the comma from end of ecah line of a csv file using java
Ex:
"A","B","C","D","E",""
I need till "E"
Not quite sure, but I think you can achive that by using a Buffered Reader. Then you save the text as a result. Now you can cut out the comma's and put it back in using the BufferedWriter
Related
I am writing some values in csv file but the value containing commas get split into >1 once
e.g. a,b,c is one value and should appear in 1 cell but it's appearing in 3 cells.
writer.append(node.getLongName());
this is how I am writing data into csv files using FileWriter. If node.getLongName() gives me value having commas then value is split according to internal comma.
Can anyone please tell how to make this work and avoid splitting of value.
You are writing in to a CSV file but do you know out of your source file which fields should not be separated. If you do then you can change the seperator for that field from comma to some other seperator like '+' and than append with the other element of the CSV. As an example:
10/09/2016, cycling club,(sam+1000+oklahoma),(henry+ 1001+california),( bill+1002+NY)
Here inside the parenthesis It has the details of students. They were command separated before but I changed it to plus sign.
Although is can be manipulated by hand for trivial tasks, CSV format is tricky as soon as you need to process delimiter or new line escaping.
Unless you want to do the heavy testing yourself for all corner cases, you best bet is to rely on a well known CSV library like the one from apache.
Here it is still simple enough (assuming you only need to escape commas), and the common usage is to quote fields containing blanks or delimiters. That means to not write a,b,c but "a,b,c":
writer.append("\"" + node.getLongName()+ "\"");
I need your help here . Trying to figure out if there is a better solution than manipulation of data in the text file.
So i have a csv file where some of the data has new line characters in it. This file is generated from the salesforce portal
The issue occurs when i try to read this file line by line to insert in a sql table in the database.
BufferedReader bReader = new BufferedReader(new InputStreamReader(con.getInputStream()));
line = bReader.readLine()
interprets the new line in the data as end of line . as a result this line is interepreted as 2 lines.
Any idea how can handle such data while reading the file.
Any insights would be highly appreciated.
Thanks.
I assume you already know the answer for moffeltje's question in terms of how to decide which is really valid newline and which one is embeded. If that is the case then, you may use opencsv
One of the features it has "Handling quoted entries with embedded carriage returns (ie entries that span multiple lines)." This may help you working around the issue.
Another approach I can think of is to compare the last character of the line. each value in the .csv is in quotes . therefore, if the last character is not quote then append the next line to this till a quote is identified.
I'm currently writing something which is validating our vbscript files. Right at the start I wish to remove all lines of code which are comments. I was expecting to be able to use the "'" (comment symbol in vbscript) and '\n'. However, when I write the content of the file to screen, the new lines are not formatting. Does this mean there are actually no new lines in the original vbscript file and if not, how could I remove comments?
first read whole file in string example
then use regex or simply substring for removing extra syntax
How are you parsing the file? Are you also taking the '\r' into consideration when removing the comments? Or maybe you are accidentally removing all newline characters.
I would create some state flags to tell the parser when I was in a comment or not.
What will be the most eficient way to split a file in Java ?
Like to get it grid ready...
(Edit)
Modifying the question.
Basically after scouring the net I understand that there are generally two methods followed for file splitting....
Just split them by the number of bytes
I guess the advantage of this method is that it is fast, but say I have all the data in a line and suppose the file split puts half the data in one split and the other half the data in another split, then what do I do ??
Read them line by line
This will keep my data intact, fine, but I suppose this ain't as fast as the above method
Well, just read the file line by line and start saving it to a new file. Then when you decide it's time to split, start saving the lines to a new place.
Don't worry about efficiency too much unless it's a real problem later.
My first impression is that you have something like a comma separated value (csv) file. The usual way to read / parse those files is to
read them line by line
skip headers and empty lines
use String#split(String reg) to split a line into values (reg is chosen to match the delimiter)
I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.
I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.
opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.
If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882
Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.