Java: Using InputStream and Apache Commons CSV without line numbers - java

This is probably very simple, but I have not been able to find an option to do this. I'm trying to Apache Commons CSV to read a file for later validations. The CSV in question is submitted as an Input Stream, which seems to add an additional column to the file when it reads it, containing the line numbers. I would like to be able to ignore it, if possible, as the header row does not contain a number, which causes an error. Is there an option already in InputStream to do this, or will I have to set up some kind of post processing?
The code I'm using is as follows:
public String validateFile(InputStream filePath) throws Exception{
System.out.println("Sending file to reader");
System.out.println(filePath);
InputStreamReader in = new InputStreamReader(filePath);
//CSVFormat parse needs a reader object
System.out.println("sending reader to CSV parse");
for (CSVRecord record : CSVFormat.DEFAULT.withHeader().parse(in)) {
for (String field : record) {
System.out.print("\"" + field + "\", ");
}
System.out.println();
}
return null;
}
When using withHeader(), I end up with the following error:
java.lang.IllegalArgumentException: A header name is missing in [, Employee_ID, Department, Email]
and I can't simply skip it, as I will need to do some validations on the header row.
Also, here is an example CSV file:
"Employee_ID", "Department", "Email"
"0123456","Department of Hello World","John.Doe#gmail.com"
EDIT: Also, The end goal is to validate the following:
That there are columns called "Employee_ID", "Department", and "Email". For this, I think I'll need to remove .withHeader().
Each line is comma delimited.
There are no empty cells values

Newer versions of Commons-CSV have trouble with empty headers.
Maybe that's the case here as well?
You just mentioned "no empty cell values" not sure if this included headers as well...
Also see: https://issues.apache.org/jira/browse/CSV-257
Setting .setAllowMissingColumnNames(true) did the trick for me.
final CSVFormat csvFormat = CSVFormat.Builder.create()
.setHeader(HEADERS)
.setAllowMissingColumnNames(true)
.build();
final Iterable<CSVRecord> records = csvFormat.parse(reader);

Related

Read specific information from an .xlsx file using java

The xlsx file has contents in the below format enter image description here
I want to capture the information highlighted into different fields that go into the database as a string
Final date would be in the 3rd row highlighted and that would be stored in string finaldate;
Row no :6 that has final status as Fail would go into string Status;
And then,Row 24:DATAID the value before . has to be retrieved like 3ABC36812 has to be stored using string.split(".")[0] into string dataid;
Since these columns might be varying in different rows within the excel sheet,how do i capture these values specifically and accurately using BufferedReader component
String line;
try (BufferedReader br = new BufferedReader(new FileReader(str)))
{
for (int i = 0; i < n; i++)
br.readLine();
line = br.readLine();
System.out.println(line);
if (line.startsWith("FINAL DATE:"))
{
string [] split=line.split(":").[1]
//not sure coz even HH:MM has the colon in it,so how to extract the date value alone
finaldate=split; ///????
}
//so i am checking if the column dataid exists using starts with and then fetch the row below that having the dataid into string data column
if (line.startsWith("DATAID"))
{
needcat=true;
System.out.println( "bye "+needcat);
}
I dont want to use the apache poi since my version of java does not support that and i would prefer to explore using the bufferedreader/filestream components in java
I really don't think you're going to get what you want the way you're trying to do it. Take a look at this page:
https://docs.fileformat.com/spreadsheet/xlsx/
It looks like they're suggesting that .xlsx files are zip files.
If that's true, you're not going to have success the way you're reading it.
I don't understand why you can't use POI. If you need Java prior to 11, maybe you can grab an older copy from 10 years ago or something.
Otherwise, you'll want to use a Zip library to unpack it first.

Using I/O stream to parse CSV file

I have a CSV file of US population data for every county in the US. I need to get each population from the 8th column of the file. I'm using a fileReader() and bufferedStream() and not sure how to use the split method to accomplish this. I know this isn't much information but I know that I'll be using my args[0] as the destination in my class.
I'm at a loss to where to being to be honest.
import java.io.FileReader;
public class Main {
public static void main(String[] args) {
BufferedReader() buff = new BufferedReader(new FileReader(args[0]));
String
}
try {
}
}
The output should be an integer of the total US population. Any help with pointing me in the right direction would be great.
Don't reinvent the wheel, don't parse CSV yourself: use a library. Even such a simple format as CSV has nuances: fields can be escaped with quotes or unescaped, the file can have or have not a header and so on. Besides that you have to test and maintain the code you've wrote. So writing less code and reusing libraries is good.
There are a plenty of libraries for CSV in Java:
Apache Commons CSV
OpenCSV
Super CSV
Univocity
flatpack
IMHO, the first two are the most popular.
Here is an example for Apache Commons CSV:
final Reader in = new FileReader("counties.csv");
final Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (final CSVRecord record : records) { // Simply iterate over the records via foreach loop. All the parsing is handler for you
String populationString = record.get(7); // Indexes are zero-based
String populationString = record.get("population"); // Or, if your file has headers, you can just use them
… // Do whatever you want with the population
}
Look how easy it is! And it will be similar with other parsers.

Univocity - Detect missing column when parsing CSV

I'm using Univocity library to parse CSV and it works perfectly, but I need a way to detect if the file being parsed has less columns than required
For example, if I'm expecting a 3 columns file, with columns mapped to [H1,H2,H3] then I received a file (which has no headers) that looks like
V1_H1,V1_H2
V2_H1,V2_H2
When using
record.getString("H3");
this would return null, instead, I need this file to either fail to be parsed or I can check if it misses a column and stop processing it
Is there any way to achieve this?
So since my main issue here is to make sure that the headers count is the same as the number of columns provided in the CSV file, and since I'm using an iterator to iterate over records, I've added a check like:
CsvParser parser = new CsvParser(settings);
ResultIterator<Record, ParsingContext> iterator = parser.iterateRecords(inputStream).iterator();
if(iterator.getContext().parsedHeaders().length != settings.getHeaders().length){
throw new Exception("Invalid file");
}
It's working for me, not sure if there is a better way to do it.
I've watched Univocity documentation and I've found here that there is a way to add annotations to the destination objects you are going to generate from the CSV input
#Parsed
#Validate
public String notNulNotBlank; //This should fail if the field is null or blank
#Parsed
#Validate(nullable = true)
public String nullButNotBlank;
#Parsed
#Validate(allowBlanks = true)
public String notNullButBlank;
This will also help you to use the objects instead of having to work with fields.
Hope that helps :-)

How to convert exponents in a csv file from Java

I am printing some data into a CSV file using Apache Commns CSV. One of the fields contains 15 digit number and is of type String. This field prints as exponential number in CSV instead of a complete number. I know Excel does that but is there a way in java to print it as a complete number.
I am not doing anything special. Initially I thought that Commons CSV will take care of it.
public void createCSV(){
inputStream = new FileInputStream("fileName");
fileWriter = new FileWriter("fileName");
csvFileFormat = CSVFormat.Excel.withHeader("header1", "header2");
csvFilePrinter = new CSVPrinter(fileWriter, csvFileFormat);
for(List<UiIntegrationDTO dto: myList>){
String csvData = dto.getPolicyNumber();
csvFilePrinter.PrintRecord(csvData);
}
}
Prepend apostrophe
As far as I understand from the discussion in comments, it is a question about Excel interpretation of CSV file, but the file itself contains all necessary data.
I think, csvFilePrinter.PrintRecord("'" + csvData); should help. Apostrophe requires Excel to interpret a field as a string, not as a number.

Invalid char between encapsulated token and delimiter in Apache Commons CSV library

I am getting the following error while parsing the CSV file using the Apache Commons CSV library.
Exception in thread "main" java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
at org.apache.commons.csv.CSVParser.getRecords(CSVParser.java:327)
at parse.csv.file.CSVFileParser.main(CSVFileParser.java:29)
What's the meaning of this error ?
We ran into this issue when we had embedded quote in our data.
0,"020"1,"BS:5252525 ORDER:99999"4
Solution applied was CSVFormat csvFileFormat = CSVFormat.DEFAULT.withQuote(null);
#Cuga tip helped us to resolve. Thanks #Cuga
Full code is
public static void main(String[] args) throws IOException {
FileReader fileReader = null;
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withQuote(null);
String fileName = "test.csv";
fileReader = new FileReader(fileName);
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List<CSVRecord> csvRecords = csvFileParser.getRecords();
for (CSVRecord csvRecord : csvRecords) {
System.out.println(csvRecord);
}
csvFileParser.close();
}
Result is
CSVRecord [comment=null, mapping=null, recordNumber=1, values=[0, "020"1, "BS:5252525 ORDER:99999"4]]
That line in the CSV file contains an invalid character between one of your cells and either the end of line, end of file, or the next cell. A very common cause for this is a failure to escape your encapsulating character (the character that is used to "wrap" each cell, so CSV knows where a cell (token) starts and ends.
I found the solution to the problem.
One of my CSV file has an attribute as follows:
"attribute with nested "quote" "
Due to nested quote in the attribute the parser fails.
To avoid the above problem escape the nested quote as follows:
"attribute with nested """"quote"""" "
This is the one way to solve the problem.
We ran into this in this same error with data containing quotes in otherwise unquoted input. I.e.:
some cell|this "cell" caused issues|other data
It was hard to find, but in Apache's docs, they mention the withQuote() method which can take null as a value.
We were getting the exact same error message and this (thankfully) ended up fixing the issue for us.
I ran into this issue when I forgot to call .withNullString("") on my CSVFormat. Basically, this exception always occurs when:
your quote symbol is wrong
your null string representation is wrong
your column separator char is wrong
Make sure you know the details of your format. Also, some programs use leading byte-order-marks (for example, Excel uses \uFEFF) to denote the encoding of the file. This can also trip up your parser.

Categories