Messed up CSV leads to Exception

Messed up CSV leads to Exception - java

I think i found a bug. Or maybe it isn't, but Super CSV can't handle that well.
I'm parsing a CSV file with 41 Columns with a MapReader. However, i'm getting that CSV - and the Webservice that gives me the CSV messes up one line. The "headline" line is a tab-delimited Row with 41 Cells.
And the "wrong line" is a tab-delimited Row with 36 Cells and the content doesn't make any sense.
This is the code i'm using:
InputStream fis = new FileInputStream(pathToCsv);
InputStreamReader inReader = new InputStreamReader(fis, "ISO-8859-1");
ICsvMapReader mapReader = new CsvMapReader(inReader, new CsvPreference.Builder('"','\t',"\r\n").build());
final String[] headers = mapReader.getHeader(true);
Map<String, String> row;
while( (row = mapReader.read(headers)) != null ) {
// do something
}
I get an exception when executing mapReader.read(headers) in the row i mentioned above. This is the exception:
org.supercsv.exception.SuperCsvException:
the nameMapping array and the sourceList should be the same size (nameMapping length = 41, sourceList size = 36)
context=null
at org.supercsv.util.Util.filterListToMap(Util.java:121)
at org.supercsv.io.CsvMapReader.read(CsvMapReader.java:79)
at test.MyClass.readCSV(MyClass.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
What do you think i should do ?
I don't want the whole application to crash, just because one row is messed up, i'd rather skip that row.

This is a good question! As a Super CSV developer, I'll look into creating some exception handling examples on the website.
You could keep it simple and use CsvListReader (which doesn't care how many columns there are), and then just create the Map yourself:
public class HandlingExceptions {
private static final String INPUT =
"name\tage\nTom\t25\nAlice\nJim\t44\nMary\t33\tInvalid";
public static void main(String[] args) throws IOException {
// use CsvListReader (can't be sure there's the correct no. of columns)
ICsvListReader listReader = new CsvListReader(new StringReader(INPUT),
new CsvPreference.Builder('"', '\t', "\r\n").build());
final String[] headers = listReader.getHeader(true);
List<String> row = null;
while ((row = listReader.read()) != null) {
if (listReader.length() != headers.length) {
// skip row with invalid number of columns
System.out.println("skipping invalid row: " + row);
continue;
}
// safe to create map now
Map<String, String> rowMap = new HashMap<String, String>();
Util.filterListToMap(rowMap, headers, row);
// do something with your map
System.out.println(rowMap);
}
listReader.close();
}
}
Output:
{name=Tom, age=25}
skipping invalid row: [Alice]
{name=Jim, age=44}
skipping invalid row: [Mary, 33, Invalid]
If you were concerned with using Super CSV's Util class (it's possible it could change - it's really an internal utility class), you could combine 2 readers as I've suggested here.
You could try catching SuperCsvException, but you might end up suppressing more than just an invalid number of columns. The only Super CSV exception I'd recommend catching (though not applicable in your situation as you're not using cell processors) is SuperCsvConstraintViolationException, as it's indicates the file is in the correct format, but the data doesn't satisfy your expected constraints.

You have to ask yourself what have to be done if the CSV file contains data which cannot be parsed. How critical would it be to skip those lines. In one scenario it could be ok to just drop it in other scenarios it might be better to stop the whole process and tell the user to fix the file first.
I am sure you can build both scenarios with Super CSV. You definitely have to handle that Exception and react appropriate to the mentioned scenarios.

Well, i came up with some solution, but i don't think it's optimal.
while (true) {
try {
if ((row = mapReader.read(headers)) == null) {
break;
} else {
// do something
}
} catch (SuperCsvException ex) {
continue;
}
}
UPDATE
Changed Exception with SuperCsvException

Related

Importing two CSV files into Java and then parsing them. The first one works the second doesnt

Im working on my code where I am importing two csv files and then parsing them
//Importing CSV File for betreuen
String filename = "betreuen_4.csv";
File file = new File(filename);
//Importing CSV File for lieferant
String filename1 = "lieferant.csv";
File file1 = new File(filename1);
I then proceed to parse them. For the first csv file everything works fine. The code is
try {
Scanner inputStream = new Scanner(file);
while(inputStream.hasNext()) {
String data = inputStream.next();
String[] values = data.split(",");
int PInummer = Integer.parseInt(values[1]);
String MNummer = values[0];
String KundenID = values[2];
//System.out.println(MNummer);
//create the caring object with the required paramaters
//Caring caring = new Caring(MNummer,PInummer,KundenID);
//betreuen.add(caring);
}
inputStream.close();
}catch(FileNotFoundException d) {
d.printStackTrace();
}
I then proceed to parse the other csv file the code is
// parsing csv file lieferant
try {
Scanner inputStream1 = new Scanner(file1);
while(inputStream1.hasNext()) {
String data1 = inputStream1.next();
String[] values1 = data1.split(",");
int LIDnummer = Integer.parseInt(values1[0]);
String citynames = values1[1];
System.out.println(LIDnummer);
String firmanames = values1[2];
//create the suppliers object with the required paramaters
//Suppliers suppliers = new
//Suppliers(LIDnummer,citynames,firmanames);
//lieferant.add(suppliers);
}
inputStream1.close();
}catch(FileNotFoundException d) {
d.printStackTrace();
}
the first error I get is
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at Verbindung.main(Verbindung.java:61)
So I look at my array which is firmaname at line 61 and I think, well it's impossible that its out of range since in my CSV file there are three columns and at index 2 (which I know is the third column in the CSV file) is my list of company names. I know the array is not empty because when i wrote
`System.out.println(firmanames)`
it would print out three of the first company names. So in order to see if there is something else causing the problem I commented line 61 out and I ran the code again. I get the following error
`Exception in thread "main" java.lang.NumberFormatException: For input
string: "Ridge"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at Verbindung.main(Verbindung.java:58)`
I google these errors and you know it was saying im trying to parse something into an Integer which cannot be an integer, but the only thing that I am trying to parse into an Integer is the code
int LIDnummer = Integer.parseInt(values1[0]);
Which indeed is a column containing only Integers.
My second column is also indeed just a column of city names in the USA. The only thing with that column is that there are spaces in some town names like Middle brook but I don't think that would cause problems for a String type. Also in my company columns there are names like AT&T but i would think that the & symbol would also not cause problems for a string. I don't know where I am going wrong here.
I cant include the csv file but here is a pic of a part of it. The length of each column is a 1000.
A pic of the csv file

Scanner by default splits its input by whitespace (docs). Whitespace means spaces, tabs and newlines.
So your code will, I think, split the whole input file at every space and every newline, which is not what you want.
So, the first three elements your code will read are
5416499,Prairie
Ridge,NIKE
1765368,Edison,Cartier
I suggest using method readLine of BufferedReader then calling split on that.
The alternative is to explicitly tell Scanner how you want it to split the input
Scanner inputStream1 = new Scanner(file1).useDelimiter("\n");
but I think this is not the best use of Scanner when a simpler class (BufferedReader) will do.

First of all, I would highly suggest you try and use an existing CSV parser, for example this one.
But if you really want to use your own, you are going to need to do some simple debugging. I don't know how large your file is, but the symptoms you are describing lead me to believe that somewhere in the csv there may be a missing comma or an accidental escape character. You need to find out what line it is. So run this code and check its output before it crashes:
int line = 1;
try {
Scanner inputStream1 = new Scanner(file1);
while(inputStream1.hasNext()) {
String data1 = inputStream1.next();
String[] values1 = data1.split(",");
int LIDnummer = Integer.parseInt(values1[0]);
String citynames = values1[1];
System.out.println(LIDnummer);
String firmanames = values1[2];
line++;
}
} catch (ArrayIndexOutOfBoundsException e){
System.err.println("The issue in the csv is at line:" + line);
}
Once you find what line it is, the answer should be obvious. If not, post a picture of that line and we'll see...

Updating a particular column in csv with huge amount of data with java

I have a csv file 'Master List' with 800 K records, each record have 13 values.
combination of cell[0] and cell[1] give a unique record and I need to update value of cell[12] say status for every record.
I have another csv file say 'Updated subset list'. This is sort of subset of file 'Master list'. For all the records in my 2nd csv which are less in number say 10000, I need to update cell[11] aka status column value of each matching record.
I tried direct BufferedReader, CsvParser from commons-csv and CsvParser from univocity.parsers.
But reading whole file and creating List of 800K is giving out of memory exception.
Same code will be deployed on different servers so I want to have a efficient code for reading huge csv file and updating same file.
Partially reading huge file and writing in same file might corrupt the data.
Any suggestions on how can I do this. ??
File inputF = new File(inputFilePath);
if (inputF.exists()) {
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));
// skip the header of the file
String line = br.readLine();
mandatesList = new ArrayList<DdMandates>();
while ((line = br.readLine()) != null) {
mandatesList.add(mapToItem(line));
}
br.close();
}
Memory issue is resolved via doing it in chunks. reading single line and writing single line might result is taking more time. I didn't tried it as my issue was resolved with using batches of 100k records at time and clearing list after writing 100k records
Now issue is updating status is taking too much looping....
I have two csv's. Master sheet (Master list) have 800 K records then I have a subset csv as well say it have 10 k records. This subset csv is updated from some other system and it have updated status say 'OK' and 'NOT OK'. I need to update this status in Master sheet. How can I do that in best possible way. ??? Dumbest way I am using is follwing : –
// Master list have batches but it contains 800 k records and 12 columns
List<DdMandates> mandatesList = new ArrayList<DdMandates>();
// Subset list have updated status
List<DdMandates> updatedMandatesList = new ArrayList<DdMandates>();
// Read Subset csv file and map DdMandates item and then add to updated mandate list
File inputF = new File(Property.inputFilePath);
if(inputF.exists()) {
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS, "UTF-8"));
checkFilterAndmapToItem(br);
br.close();
In Method checkFilterAndmapToItem(BufferedReader br)
private static void checkFilterAndmapToItem(BufferedReader br) {
FileWriter fileWriter = null;
try {
// skip the header of the csv
String line = br.readLine();
int batchSize = 0, currentBatchNo=0;
fileWriter = new FileWriter(Property.outputFilePath);
//Write the CSV file header
fileWriter.append(FILE_HEADER.toString());
//Add a new line separator after the header
fileWriter.append(NEW_LINE_SEPARATOR);
if( !Property.batchSize.isEmpty()) {
batchSize = Integer.parseInt(Property.batchSize.trim());
}
while ((line = br.readLine()) != null) {
DdMandates item = new DdMandates();
String[] p = line.concat(" ").split(SEPERATOR);
Parse each p[x] and map to item of type DdMandates\
Iterating here on updated mandate list to check if this item is present in updated mandate list
then get that item and update that status to item . so here is a for loop for say 10K elements
mandatesList.add(item);
if (batchSize != 0 && mandatesList.size() == batchSize) {
currentBatchNo++;
logger.info("Batch no. : "+currentBatchNo+" is executing...");
processOutputFile(fileWriter);
mandatesList.clear();
}
}
processing output file here for the last batch ...
}
It will have while loop (800 K iteration) { insider loop 10K iteration for each element )
so at least 800K * 10K loop
Please help in getting its best possible way and reduce iteration .
Thanks in advance

Suppose you are reading 'Main Data File' in batches of 50K:
Store this data in java HashMap using cell[0] and cell[1] as key and rest of the columns as value.
The complexity of get and put is O(1) most of the time. see here
So the complexity for searching 10K records in that particular batch will be O(10K).
HashMap<String, DdMandates> hmap = new HashMap<String, DdMandates>();
Use key=DdMandates.get(0)+DdMandates.get(1)
Note: If 50K records are exceeding the memory limit of HashMap create smaller batches.
For further performance enhancement you can use multi-threading by creating small batches and processing them on different threads.

The first suggestion, when you create the ArrayList, it will make list capacity of 10. So, if you work with large amount of data, initialize it first like:
private static final int LIST_CAPACITY = 800000;
mandatesList = new ArrayList<DdMandates>(LIST_CAPACITY);
The second suggestion, don't store data in the memory, read the data line by line, make your business logic needs, then free up memory, like:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
/* your business rule here */
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}

Best way to populate a user defined object using the values of string array

I am reading two different csv files and populating data into two different objects. I am splitting each line of csv file based on regex(regex is different for two csv files) and populating the object using each data of that array which is obtained by splitting each line using regex as shown below:
public static <T> List<T> readCsv(String filePath, String type) {
List<T> list = new ArrayList<T>();
try {
File file = new File(filePath);
FileInputStream fileInputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader)
list = bufferedReader.lines().skip(1).map(line -> {
T obj = null;
String[] data = null;
if (type.equalsIgnoreCase("Student")) {
data = line.split(",");
ABC abc = new ABC();
abc.setName(data[0]);
abc.setRollNo(data[1]);
abc.setMobileNo(data[2]);
obj = (T)abc;
} else if (type.equalsIgnoreCase("Employee")) {
data = line.split("\\|");
XYZ xyz = new XYZ();s
xyz.setName(Integer.parseInt(data[0]));
xyz.setCity(data[1]);
xyz.setEmployer(data[2]);
xyz.setDesignation(data[3]);
obj = (T)xyz;
}
return obj;
}).collect(Collectors.toList());} catch(Exception e) {
}}
csv files are as below:
i. csv file to populate ABC object:
Name,rollNo,mobileNo
Test1,1000,8888888888
Test2,1001,9999999990
ii. csv file to populate XYZ object
Name|City|Employer|Designation
Test1|City1|Emp1|SSE
Test2|City2|Emp2|
The issue is there can be a missing data for any of the above columns in the csv file as shown in the second csv file. In that case, I will get ArrayIndexOutOfBounds exception.
Can anyone let me know what is the best way to populate the object using the data of the string array?
Thanks in advance.

In addition to the other mistakes you made and that were pointed out to you in the comments your actual problem is caused by line.split("\\|") calling line.split("\\|", 0) which discards the trailing empty String. You need to call it with line.split("\\|", -1) instead and it will work.

The problem appears to be that one or more of the last values on any given CSV line may be empty. In that case, you run into the fact that String.split(String) suppresses trailing empty strings.
Supposing that you can rely on all the fields in fact being present, even if empty, you can simply use the two-arg form of split():
data = line.split(",", -1);
You can find details in that method's API docs.
If you cannot be confident that the fields will be present at all, then you can force them to be by adding delimiters to the end of the input string:
data = (line + ",,").split(",", -1);
Since you only use the first values few values, any extra trailing values introduced by the extra delimiters would be ignored.

Parse a semicolon separated file in Java

I have a excel file and the contents are shown below.
First line is header, second line on wards is the data.
Cell A1 contains line below (Header)
**IsShooting; Velocity; Location_x; Location_y; Location_z; Onslaught_ONSAV ; ***Event***; EventParams...**
Cell A2 contains below
0;0;0;0;0;0;0;0;0;0;0.000;0.000;0;0.000;0.000;None;0;0;0.000;-1983.610;-
Cell A3 Contains below
;0.250;0.000;0.000;0.000;0.000;***BOT_KILLED***;CTF-Geothermal.GBxBot10;XWeapons.DamTypeFlakChunk
Cell A4 Contains below
4.110;161.900;0.000;0.000;0.000;0.000;0.000;0.000;0.000;0.000;0.000;0.000;0.000;0.000;0.000;4.320;0.000;0.260;0.000;0.000;***FLAG_PICKEDUP***;0;CTF-Geothermal.GBxBot10
I want to know if there are any open Source CSV Parsers, I can use so that, I can get the data from the excel.
The above excel file contains 400 lines of data. All I want from this is the COUNT of FLAG_PICKEDUP & BOT_KILLED.
Thanks!

This is the easiest way I can think of. Use a BufferedReader to read each line. For each line split it into a String array, then check each String to see if it equals the constants that define flag pickups or bot kills.
I think Apache makes csv parser, but I've never used it. For something this simple it might just be easier to code it yourself. This is what I came up with in about 5 mins.
NOTE: All due respect, stackoverflow generally asks that you attempt to solve the problem yourself first. Since you didn't post code, we can't help you debug and we don't know if you tried to solve the problem yourself. This was simple, so I helped but you may find it easier to get support if you post your (failed) solution first.
public static void main(String[] args) throws Exception
{
int botKilledCount = 0, flagPickedUpCount = 0;
String line, botLiteral = "***BOT_KILLED***", flagLiteral = "***FLAG_PICKEDUP***";
BufferedReader reader = new BufferedReader(new FileReader(new File("!!YOUR FILE HERE!!")));
while ((line = reader.readLine()) != null)
for (String s : line.split(";"))
if (s.equals(botLiteral))
botKilledCount++;
else if (s.equals(flagLiteral))
flagPickedUpCount++;
System.out.println("Bot Killed Count: " + botKilledCount + ", Flag Pickup Count: " + flagPickedUpCount);
}
This was the output:
Bot Killed Count: 1, Flag Pickup Count: 1

Transposing arrays

I am using the following code to read in a CSV file:
String next[] = {};
List<String[]> dataArray = new ArrayList<String[]>();
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("inputFile.csv")));
for(;;) {
next = reader.readNext();
if(next != null) {
dataArray.add(next);
} else {
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
This turns a CSV file into the array 'dataArray'. My application is for a dictionary type app - the input data's first column is a list of words, and the second column is the definitions of those words. Here is an example of the array loaded in:
Term 1, Definition 1
Term 2, Definition 2
Term 3, Definition 3
In order to access one of the strings in the array, I use the following code:
dataArray.get(rowNumber)[columnNumber]
However, I need to be able to generate a list of all the terms, so that they can be displayed for the dictionary application. As I understand it, accessing the columns by themselves is a much more lengthy process than accessing the rows (I come from a MATLAB background, where this would be simple).
It seems that in order to have ready access to any row of my input data, I would be better off transposing the data and reading it in that way; i.e.:
Term 1, Term 2, Term3
Definition 1, Definition 2, Definition 3
Of course, I could just provide a CSV file that is transposed in the first place - but Excel or OO Calc don't allow more than 256 rows, and my dictionary contains around 2000 terms.
Any of the following solutions would be welcomed:
A way to transpose an array once it has been read in
An alteration to the code posted above, such that it reads in data in the 'transposed' way
A simple way to read an entire column of an array as a whole

You would probably be better served by using a Map data structure (e.g. HashMap):
String next[] = {};
HashMap<String, String> dataMap = new HashMap<String, String>();
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("inputFile.csv")));
for(;;) {
next = reader.readNext();
if(next != null) {
dataMap.put(next[0], next[1]);
} else {
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
Then you can access the first column by
dataMap.keySet();
and the second column by
dataMap.values();
Note one assumption here: that the first column of your input data is all unique values (that is, there are not repeated values in the "Term" column).
To be able to access the keys (terms) as an array, you can simply do as follows:
String[] terms = new String[dataMap.keySet().size()];
terms = dataMap.keySet().toArray(terms);

If each row has two values, where the first one is the term and the second one is the definition, you could build a Map of it like this (Btw, this while loop does the exact same thing as your for loop):
String next[] = {};
Map<String, String> dataMap = new HashMap<String, String>();
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("inputFile.csv")));
while((next = reader.readNext()) != null) {
dataMap.put(next[0], next[1]);
}
} catch (IOException e) {
e.printStackTrace();
}
Then you can get the definition from a term via:
String definition = dataMap.get(term);
or all definitions like this:
for (String term: dataMap.keySet()) {
String definition = dataMap.get(term);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Messed up CSV leads to Exception - java

Well, i came up with some solution, but i don't think it's optimal. while (true) { try { if ((row = mapReader.read(headers)) == null) { break; } else { // do something } } catch (SuperCsvException ex) { continue; } } UPDATE Changed Exception with SuperCsvException

Related

Importing two CSV files into Java and then parsing them. The first one works the second doesnt

Updating a particular column in csv with huge amount of data with java

Best way to populate a user defined object using the values of string array

Parse a semicolon separated file in Java

Transposing arrays

Categories

Resources