How to get proper string array when parsing CSV? - java

Using jcsv I'm trying to parse a CSV to a specified type. When I parse it, it says length of the data param is 1. This is incorrect. I tried removing line breaks, but it still says 1. Am I just missing something in plain sight?
This is my input string csvString variable
"Symbol","Last","Chg(%)","Vol",
INTC,23.90,1.06,28419200,
GE,26.83,0.19,22707700,
PFE,31.88,-0.03,17036200,
MRK,49.83,0.50,11565500,
T,35.41,0.37,11471300,
This is the Parser
public class BuySignalParser implements CSVEntryParser<BuySignal> {
#Override
public BuySignal parseEntry(String... data) {
// console says "Length 1"
System.out.println("Length " + data.length);
if (data.length != 4) {
throw new IllegalArgumentException("data is not a valid BuySignal record");
}
String symbol = data[0];
double last = Double.parseDouble(data[1]);
double change = Double.parseDouble(data[2]);
double volume = Double.parseDouble(data[3]);
return new BuySignal(symbol, last, change, volume);
}
}
And this is where I use the parser (right from the example)
CSVReader<BuySignal> cReader = new CSVReaderBuilder<BuySignal>(new StringReader( csvString)).entryParser(new BuySignalParser()).build();
List<BuySignal> signals = cReader.readAll();

jcsv allows different delimiter characters. The default is semicolon. Use CSVStrategy.UK_DEFAULT to get to use commas.
Also, you have four commas, and that usually indicates five values. You might want to remove the delimiters off the end.
I don't know how to make jcsv ignore the first line
I typically use CSVHelper to parse CSV files, and while jcsv seems pretty good, here is how you would do it with CVSHelper:
Reader reader = new InputStreamReader(new FileInputStream("persons.csv"), "UTF-8");
//bring in the first line with the headers if you want them
List<String> firstRow = CSVHelper.parseLine(reader);
List<String> dataRow = CSVHelper.parseLine(reader);
while (dataRow!=null) {
...put your code here to construct your objects from the strings
dataRow = CSVHelper.parseLine(reader);
}

You shouldn't have commas at the end of lines. Generally there are cell delimiters (commas) and line delimiters (newlines). By placing commas at the end of the line it looks like the entire file is one long line.

Related

Importing two CSV files into Java and then parsing them. The first one works the second doesnt

Im working on my code where I am importing two csv files and then parsing them
//Importing CSV File for betreuen
String filename = "betreuen_4.csv";
File file = new File(filename);
//Importing CSV File for lieferant
String filename1 = "lieferant.csv";
File file1 = new File(filename1);
I then proceed to parse them. For the first csv file everything works fine. The code is
try {
Scanner inputStream = new Scanner(file);
while(inputStream.hasNext()) {
String data = inputStream.next();
String[] values = data.split(",");
int PInummer = Integer.parseInt(values[1]);
String MNummer = values[0];
String KundenID = values[2];
//System.out.println(MNummer);
//create the caring object with the required paramaters
//Caring caring = new Caring(MNummer,PInummer,KundenID);
//betreuen.add(caring);
}
inputStream.close();
}catch(FileNotFoundException d) {
d.printStackTrace();
}
I then proceed to parse the other csv file the code is
// parsing csv file lieferant
try {
Scanner inputStream1 = new Scanner(file1);
while(inputStream1.hasNext()) {
String data1 = inputStream1.next();
String[] values1 = data1.split(",");
int LIDnummer = Integer.parseInt(values1[0]);
String citynames = values1[1];
System.out.println(LIDnummer);
String firmanames = values1[2];
//create the suppliers object with the required paramaters
//Suppliers suppliers = new
//Suppliers(LIDnummer,citynames,firmanames);
//lieferant.add(suppliers);
}
inputStream1.close();
}catch(FileNotFoundException d) {
d.printStackTrace();
}
the first error I get is
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at Verbindung.main(Verbindung.java:61)
So I look at my array which is firmaname at line 61 and I think, well it's impossible that its out of range since in my CSV file there are three columns and at index 2 (which I know is the third column in the CSV file) is my list of company names. I know the array is not empty because when i wrote
`System.out.println(firmanames)`
it would print out three of the first company names. So in order to see if there is something else causing the problem I commented line 61 out and I ran the code again. I get the following error
`Exception in thread "main" java.lang.NumberFormatException: For input
string: "Ridge"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at Verbindung.main(Verbindung.java:58)`
I google these errors and you know it was saying im trying to parse something into an Integer which cannot be an integer, but the only thing that I am trying to parse into an Integer is the code
int LIDnummer = Integer.parseInt(values1[0]);
Which indeed is a column containing only Integers.
My second column is also indeed just a column of city names in the USA. The only thing with that column is that there are spaces in some town names like Middle brook but I don't think that would cause problems for a String type. Also in my company columns there are names like AT&T but i would think that the & symbol would also not cause problems for a string. I don't know where I am going wrong here.
I cant include the csv file but here is a pic of a part of it. The length of each column is a 1000.
A pic of the csv file
Scanner by default splits its input by whitespace (docs). Whitespace means spaces, tabs and newlines.
So your code will, I think, split the whole input file at every space and every newline, which is not what you want.
So, the first three elements your code will read are
5416499,Prairie
Ridge,NIKE
1765368,Edison,Cartier
I suggest using method readLine of BufferedReader then calling split on that.
The alternative is to explicitly tell Scanner how you want it to split the input
Scanner inputStream1 = new Scanner(file1).useDelimiter("\n");
but I think this is not the best use of Scanner when a simpler class (BufferedReader) will do.
First of all, I would highly suggest you try and use an existing CSV parser, for example this one.
But if you really want to use your own, you are going to need to do some simple debugging. I don't know how large your file is, but the symptoms you are describing lead me to believe that somewhere in the csv there may be a missing comma or an accidental escape character. You need to find out what line it is. So run this code and check its output before it crashes:
int line = 1;
try {
Scanner inputStream1 = new Scanner(file1);
while(inputStream1.hasNext()) {
String data1 = inputStream1.next();
String[] values1 = data1.split(",");
int LIDnummer = Integer.parseInt(values1[0]);
String citynames = values1[1];
System.out.println(LIDnummer);
String firmanames = values1[2];
line++;
}
} catch (ArrayIndexOutOfBoundsException e){
System.err.println("The issue in the csv is at line:" + line);
}
Once you find what line it is, the answer should be obvious. If not, post a picture of that line and we'll see...

Line breaks in field treated as end of line while parsing csv file

IN a csv file that I have a record that renders like this:
,"SKYY SPA MARTINI
2 oz. SKYY Vodka
Fresh cucumber
Fresh mint
Splash of simple syrup
Muddle cucumber & mint with syrup.
Add SKYY Vodka and shake with ice.
Strain into a chilled martini glass.
Garnish with a fresh mint sprig and cucumber slice.",
with each line ending with a LF carriage return.
I thought that this would be treated as a string and the carriage returns wouldn't be treated as new lines, but this isn't the case, and is breaking my script. Is there a way to have the reader only have line breaks parsed if they're not flanked by quotes? I'm currently using this as my code, couldn't find a setting for the tokenizer that would allow me to perform this action.
// instantiate description line mapper
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
DefaultLineMapper<LCBOProduct> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
// set description line mapper
reader.setLineMapper(lineMapper);
return reader;
Inspired by this CSV regex post, I have written a quick-and-dirty method for doing this:
public static void main(String[] args) {
String line = "\"BEEP\",\"BOOP\",\"TWO SHOTS\rOF VODKA\"\r\"BOOP\",\"BEEP\",\"LEMON\rWEDGES\"";
String quote = "\"";
String splitter = "\r";
String delimiter = ",";
parse(line, delimiter, quote, splitter);
}
public static void parse(String data, String delimiter, String quote, String splitter) {
String regex = splitter+"(?=(?:[^"+quote+"]*\"[^"+quote+"]*\")*[^"+quote+"]*$)";
String[] lines = data.split(regex, -1);
List<String[]> records = new ArrayList<String[]>();
for(String line : lines) {
records.add(line.split(delimiter, -1));
}
for(String[] line : records) {
for(String record : line) {
System.out.println("RECORD: " + record); //do whatever
}
}
}
Of course, considering the large size of some CSV files, you will need to chug along with a StringBuilder and likely use myStringBuilder.toString().split(regex, -1); for the parse method.
This is likely not the Spring way of doing things. But as Jim Garrison commented, this is an edge case that I'm not sure if Spring has ways of solving.
A more complex regex may be required if the records start using other nasty characters (commas, quotes, etc.). I don't know what the source of these records could be, but some sanitizing may be in order before splitting the file.

Best way to populate a user defined object using the values of string array

I am reading two different csv files and populating data into two different objects. I am splitting each line of csv file based on regex(regex is different for two csv files) and populating the object using each data of that array which is obtained by splitting each line using regex as shown below:
public static <T> List<T> readCsv(String filePath, String type) {
List<T> list = new ArrayList<T>();
try {
File file = new File(filePath);
FileInputStream fileInputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader)
list = bufferedReader.lines().skip(1).map(line -> {
T obj = null;
String[] data = null;
if (type.equalsIgnoreCase("Student")) {
data = line.split(",");
ABC abc = new ABC();
abc.setName(data[0]);
abc.setRollNo(data[1]);
abc.setMobileNo(data[2]);
obj = (T)abc;
} else if (type.equalsIgnoreCase("Employee")) {
data = line.split("\\|");
XYZ xyz = new XYZ();s
xyz.setName(Integer.parseInt(data[0]));
xyz.setCity(data[1]);
xyz.setEmployer(data[2]);
xyz.setDesignation(data[3]);
obj = (T)xyz;
}
return obj;
}).collect(Collectors.toList());} catch(Exception e) {
}}
csv files are as below:
i. csv file to populate ABC object:
Name,rollNo,mobileNo
Test1,1000,8888888888
Test2,1001,9999999990
ii. csv file to populate XYZ object
Name|City|Employer|Designation
Test1|City1|Emp1|SSE
Test2|City2|Emp2|
The issue is there can be a missing data for any of the above columns in the csv file as shown in the second csv file. In that case, I will get ArrayIndexOutOfBounds exception.
Can anyone let me know what is the best way to populate the object using the data of the string array?
Thanks in advance.
In addition to the other mistakes you made and that were pointed out to you in the comments your actual problem is caused by line.split("\\|") calling line.split("\\|", 0) which discards the trailing empty String. You need to call it with line.split("\\|", -1) instead and it will work.
The problem appears to be that one or more of the last values on any given CSV line may be empty. In that case, you run into the fact that String.split(String) suppresses trailing empty strings.
Supposing that you can rely on all the fields in fact being present, even if empty, you can simply use the two-arg form of split():
data = line.split(",", -1);
You can find details in that method's API docs.
If you cannot be confident that the fields will be present at all, then you can force them to be by adding delimiters to the end of the input string:
data = (line + ",,").split(",", -1);
Since you only use the first values few values, any extra trailing values introduced by the extra delimiters would be ignored.

How to write two comma seperated values as one value

I am retrieving the values using regular expression in jmeter and writing those values into a csv file.But one of my value returns values as (value1,value2), how can i add write those 2 values as one value in csv file.Below is my code
String statusvar = vars.get("guid");
String guidstat = vars.get("guidn");
String custstat = vars.get("custType");
String fpath = vars.get("write_file_path");
String newStatus;
FileWriter fstream = new FileWriter(fpath+"new_record.csv", false);
BufferedWriter out = new BufferedWriter(fstream);
out.write(statusvar+","+guidstat+","+custstat);
out.newLine();
out.flush();
Write your values within quotes and it should be OK. If a value contains quotes, then you'd need to escape them. Just replace each " by "", so value"a,valueB is written as "value""a,valueB"
If this becomes too tricky then I suggest getting a CSV parsing/writing library to do the job for you such as univocity-parsers - I'm the author of this one by the way.

Trying to convert CSV to XLSX, but columns are being split up using the wrong commas

I am using the accepted answer from here. Basically, I am converting a csv to .xlsx, and it looks like the solution pulls everything in individual cells into 1 line using the buffered reader, and then using:
String str[] = currentLine.split(",");
.. the string is split up into separate parts of the array for each column. My problem is that in some of my data, there are commas, so the algorithm gets confused and makes more columns than needed, splitting sentences into different columns which doesn't really work for me. Is there another way I can split the sentences up perhaps? I'd happily split the string up using a different unique character (maybe |?), but I don't know how to replace the comma provided by the bufferedreader. Any help would be great. Code I am using below for reference:
public static void csvToXLSX() {
try {
String csvFileAddress = "test.csv"; //csv file address
String xlsxFileAddress = "test.xlsx"; //xlsx file address
XSSFWorkbook workBook = new XSSFWorkbook();
XSSFSheet sheet = workBook.createSheet("sheet1");
String currentLine=null;
int RowNum=0;
BufferedReader br = new BufferedReader(new FileReader(csvFileAddress));
while ((currentLine = br.readLine()) != null) {
String str[] = currentLine.split(",");
RowNum++;
XSSFRow currentRow=sheet.createRow(RowNum);
for(int i=0;i<str.length;i++){
currentRow.createCell(i).setCellValue(str[i]);
}
}
FileOutputStream fileOutputStream = new FileOutputStream(xlsxFileAddress);
workBook.write(fileOutputStream);
fileOutputStream.close();
System.out.println("Done");
} catch (Exception ex) {
System.out.println(ex.getMessage()+"Exception in try");
}
}
Well, CSV is something more than just text file with lines separated with commas.
For example, some fields in CSV can be quoted; this is the way comma is escaped within one field.
Quotes are quoted as well, with double-quotes.
And there also could be newlines within one CSV line, they must also be quoted.
So, to sum up, a CSV lines
1,"2,3","4
5",6,7,""""
should be parsed to array of "1", "2,3", "4\n5", "6", "7","\"" (and that is a single row of a CSV table).
As you can see, you can't just mindlessly split every line by comma. I suggest you to use some library instead of doing this by yourself. http://www.liquibase.org/javadoc/liquibase/util/csv/opencsv/CSVReader.html will work just fine.

Categories