How to handle importing a CSV file with differing column lengths

How to handle importing a CSV file with differing column lengths - java

Im working on a project for school and am having a really hard time figuring out how to import and format a CSV file into a usable format. The CSV contains a movie name in the first column, and showtimes in the rows, so it would look something like this.
movie1, 7pm, 8pm, 9pm, 10pm
movie2, 5pm, 8pm
movie3, 3pm, 7pm, 10pm
I think I want to split each row into its own array, maybe an arraylist of the arrays? I really dont know where to even start so any pointers would be appreciated.
Preferably dont want to use any external libraries.

I would go with a Map having movie name as key and timings as list like the one below:
Map<String, List<String>> movieTimings = new HashMap<>();
It will read through csv file and put the values into this map. If the key already exists then we just need to add the value into the list. You can use computeIfAbsent method of Map (Java 8) to see whether the entry exists or not, e.g.:
public static void main(String[] args) {
Map<String, List<String>> movieTimings = new HashMap<>();
String timing = "7pm";//It will be read from csv
movieTimings.computeIfAbsent("test", s -> new ArrayList<>()).add(timing);
System.out.println(movieTimings);
}
This will populate your map once the file is read. As far as reading of file is concerned, you can use BuffferedReader or OpenCSV (if your project allows you to use third party libraries).

I have no affiliation with Univocity - but their Java CSV parser is amazing and free. When I had a question, one of the developers got back to me immediately. http://www.univocity.com/pages/about-parsers
You read in a line and then cycle through the fields. Since you know the movie name is always there and at least one movie time, you can set it up any way you like including an arraylist of arraylists (so both are variable length arrays).
It works well with our without quotes around the fields (necessary when there are apostrophes or commas in the movie names). In the problem I solved, all rows had the same number of columns, but I did not know the number of columns before I parsed the file and each file often had a different number of columns and column names and it worked perfectly.

You can use opencsv to read the CSV file and add each String[] to an ArrayList. There are examples in the FAQ section of opencsv's website.
Edit: If you don't want to use external libraries you can read the CSV using a BufferedReader and split the lines by commas.
BufferedReader br = null;
try{
List<String[]> data = new ArrayList<String[]>();
br = new BufferedReader(new FileReader(new File("csvfile")));
String line;
while((line = br.readLine()) != null){
String[] lineData = line.split(",");
data.add(lineData);
}
}catch(Exception e){
e.printStackTrace();
}finally{
if(br != null) try{ br.close(); } catch(Exception e){}
}

Related

Using I/O stream to parse CSV file

I have a CSV file of US population data for every county in the US. I need to get each population from the 8th column of the file. I'm using a fileReader() and bufferedStream() and not sure how to use the split method to accomplish this. I know this isn't much information but I know that I'll be using my args[0] as the destination in my class.
I'm at a loss to where to being to be honest.
import java.io.FileReader;
public class Main {
public static void main(String[] args) {
BufferedReader() buff = new BufferedReader(new FileReader(args[0]));
String
}
try {
}
}
The output should be an integer of the total US population. Any help with pointing me in the right direction would be great.

Don't reinvent the wheel, don't parse CSV yourself: use a library. Even such a simple format as CSV has nuances: fields can be escaped with quotes or unescaped, the file can have or have not a header and so on. Besides that you have to test and maintain the code you've wrote. So writing less code and reusing libraries is good.
There are a plenty of libraries for CSV in Java:
Apache Commons CSV
OpenCSV
Super CSV
Univocity
flatpack
IMHO, the first two are the most popular.
Here is an example for Apache Commons CSV:
final Reader in = new FileReader("counties.csv");
final Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (final CSVRecord record : records) { // Simply iterate over the records via foreach loop. All the parsing is handler for you
String populationString = record.get(7); // Indexes are zero-based
String populationString = record.get("population"); // Or, if your file has headers, you can just use them
… // Do whatever you want with the population
}
Look how easy it is! And it will be similar with other parsers.

How do I get a specific line from a text file into an array list?

I want to get specific lines of data from a text file using arraylist.
(more context: student details are stored in Student.txt.
if i have to update a specific student, i want to get that student's line from text file into an arraylist in order to edit it)
Text file looks like this (ID, Name, Degreelevel, Email, Conatctno : courses);
A30, sarah, sarah#gmail.com, +64732 ; Computer Science, Cyber Security, Digital Media
A45,zaha, zaha#gmail.com, +3683: Software Engineering
My code prints all the data in the text file into the arraylist.
try {
BufferedReader reader = new BufferedReader(new FileReader(file));
ArrayList<String> lines = new ArrayList<>();
String line = reader.readLine();
while (line != null)
{
lines.add(line);
line = reader.readLine();
}
reader.close();
System.out.println(lines);
}
current output:[ID, Name, Degreelevel, Email, Conatctno : courses, A30, sarah, sarah#gmail.com, +64732 ; Computer Science, Cyber Security, Digital Media, A45,zaha, zaha#gmail.com, +3683: Software Engineering]
So how can i search a specific student ID and input only that line into arraylist. Any hint helps. Thank you.

You have to go over each line until you find your matching id. This is one of the reasons why we typically like to store this type of information in databases rather than text files.
Simply add an if-statement that checks you got the right line before you add it to the arraylist.
A very simple way of doing this would be:
String myId = "A30";
while (line != null){
if(line.startsWith(myId){
lines.add(line);
break;
}
line = reader.readLine();
}
Normally when parsing text files one would split up each line using some data delimiter, e.g. line.split(","), rather than using startsWith.
However, the data in your example is badly structured, using both colon, semi-colon and comma as delimiters. One common way of dealing with the problem that a string contains a delimiter is to encapsulate all string in quotation marks, and treat any special character found within quotation marks as a regular character.
A semi-formalized structured format for data in text file is csv. Most languages (including Java) has libraries for parsing csv files.

Java Parsing CSV to Array OutofBoundsException

it seems like this is a question a bit commonly asked but I just can't figure out a fix for my issue. I'm attempting to pull the data from a CSV file titled "Movies" which lists data as follows:
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
Up to 3952 movies. I have written the following code to parse the data into an array as follows:
try(BufferedReader br = new BufferedReader(new FileReader(moviesCSV))){
while((line = br.readLine()) != null) {
movies = line.split("::");
}
}
catch (IOException e) {
e.printStackTrace();
}
Which then throws the following exception:
3952Contender, The (2000)Drama|ThrillerException in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
at Assignment7.main(Assignment7.java:39)
Now obviously this is some sort of issue when with my loop hits my final null line, but I can't seem to get it to work. I've tried throwing in if statements to catch that it's on the 3952 and exit after that, but nothing seems to work. Any help is greatly appreciated!
One additional question, as I feel like this is something I'm going to have an issue with later. What I'm trying to do once I have this data (and data from two other CSV files) is load these arrays to an SQL database. I need to separate the movie from the year and create a separate year column. Is there a way to do that while I'm parsing my CSV? I figured that is easier done once I already have the array created. Thanks!

Univocity parser - Handling lines with weird constructs

I am trying to figure the best way to use University parser to handle CSV log file with lines looks like below,
"23.62.3.74",80,"testUserName",147653,"Log Collection Device 100","31/02/15 00:05:10 GMT",-1,"10.37.255.3","TCP","destination_ip=192.62.3.74|product_id=0071|option1_type=(s-dns)|proxy_machine_ip=10.1.255.3"
As you can see this is a comma delimited file but the last column has got bunch of values prefixed with its field names. My requirement is to capture values from normal fields and
selectively from this last big field.
I know the master details row processor in Univocity but I doubt if this fit into that category. Could you guide me to the right direction please?
Note: I can handle the name prefixed fields in rowProcessed(String[] row, ParsingContext context) if I implement a row processor but I am looking for something native to Univocity if possible?
Thanks,
R

There's nothing native in the parser for that. Probably the easiest way to go about it is to have your RowProcessor as you mentioned.
One thing you can try to do to make your life easier is to use another instance of CsvParser to parse that last record:
//initialize a parser for the pipe separated bit
CsvParserSettings detailSettings = new CsvParserSettings();
detailSettings.getFormat().setDelimiter('=');
detailSettings.getFormat().setLineSeparator("|");
CsvParser detailParser = new CsvParser(detailSettings);
//here is the content of the last column (assuming you got it from the parser)
String details = "destination_ip=192.62.3.74|product_id=0071|option1_type=(s-dns)|proxy_machine_ip=10.1.255.3";
//The result will be a list of pairs
List<String[]> pairs = detailParser.parseAll(new StringReader(details));
//You can add the pairs to a map
Map<String, String> map = new HashMap<String, String>();
for (String[] pair : pairs) {
map.put(pair[0], pair[1]);
}
//this should print: {destination_ip=192.62.3.74, product_id=0071, proxy_machine_ip=10.1.255.3, option1_type=(s-dns)}
System.out.println(map);
That won't be extremely fast but at least it's easy to work with a map if that input can have random column names and values associated with them.

Parsing a CSV file for a unique row using the new Java 8 Streams API

I am trying to use the new Java 8 Streams API (for which I am a complete newbie) to parse for a particular row (the one with 'Neda' in the name column) in a CSV file. Using the following article for motivation, I modified and fixed some errors so that I could parse the file containing 3 columns - 'name', 'age' and 'height'.
name,age,height
Marianne,12,61
Julie,13,73
Neda,14,66
Julia,15,62
Maryam,18,70
The parsing code is as follows:
#Override
public void init() throws Exception {
Map<String, String> params = getParameters().getNamed();
if (params.containsKey("csvfile")) {
Path path = Paths.get(params.get("csvfile"));
if (Files.exists(path)){
// use the new java 8 streams api to read the CSV column headings
Stream<String> lines = Files.lines(path);
List<String> columns = lines
.findFirst()
.map((line) -> Arrays.asList(line.split(",")))
.get();
columns.forEach((l)->System.out.println(l));
// find the relevant sections from the CSV file
// we are only interested in the row with Neda's name
int nameIndex = columns.indexOf("name");
int ageIndex columns.indexOf("age");
int heightIndex = columns.indexOf("height");
// we need to know the index positions of the
// have to re-read the csv file to extract the values
lines = Files.lines(path);
List<List<String>> values = lines
.skip(1)
.map((line) -> Arrays.asList(line.split(",")))
.collect(Collectors.toList());
values.forEach((l)->System.out.println(l));
}
}
}
Is there any way to avoid re-reading the file following the extraction of the header line? Although this is a very small example file, I will be applying this logic to a large CSV file.
Is there technique to use the streams API to create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?
How can I return just one row in the form of List<String> (instead of List<List<String>> containing all the rows). I would prefer to just find the row as a mapping between the column names and their corresponding values. (a bit like a result set in JDBC). I see a Collectors.mapMerger function that might be helpful here, but I have no idea how to use it.

Use a BufferedReader explicitly:
List<String> columns;
List<List<String>> values;
try(BufferedReader br=Files.newBufferedReader(path)) {
String firstLine=br.readLine();
if(firstLine==null) throw new IOException("empty file");
columns=Arrays.asList(firstLine.split(","));
values = br.lines()
.map(line -> Arrays.asList(line.split(",")))
.collect(Collectors.toList());
}
Files.lines(…) also resorts to BufferedReader.lines(…). The only difference is that Files.lines will configure the stream so that closing the stream will close the reader, which we don’t need here, as the explicit try(…) statement already ensures the closing of the BufferedReader.
Note that there is no guarantee about the state of the reader after the stream returned by lines() has been processed, but we can safely read lines before performing the stream operation.

First, your concern that this code is reading the file twice is not founded. Actually, Files.lines returns a Stream of the lines that is lazy-populated. So, the first part of the code only reads the first line and the second part of the code reads the rest (it does read the first line a second time though, even if ignored). Quoting its documentation:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Onto your second concern about returning just a single row. In functional programming, what you are trying to do is called filtering. The Stream API provides such a method with the help of Stream.filter. This method takes a Predicate as argument, which is a function that returns true for all the items that should be kept, and false otherwise.
In this case, we want a Predicate that would return true when the name is equal to "Neda". This could be written as the lambda expression s -> s.equals("Neda").
So in the second part of your code, you could have:
lines = Files.lines(path);
List<List<String>> values = lines
.skip(1)
.map(line -> Arrays.asList(line.split(",")))
.filter(list -> list.get(0).equals("Neda")) // keep only items where the name is "Neda"
.collect(Collectors.toList());
Note however that this does not ensure that there is only a single item where the name is "Neda", it collects all possible items into a List<List<String>>. You could add some logic to find the first item or throw an exception if no items are found, depending on your business requirement.
Note still that calling twice Files.lines(path) can be avoided by using directly a BufferedReader as in #Holger's answer.

Using a CSV-processing library
Other Answers are good. But I recommend using a CSV-processing library to read your input files. As others noted, the CSV format is not as simple as it may seem. To begin with, the values may or may not be nested in quote-marks. And there are many variations of CSV, such as those used in Postgres, MySQL, Mongo, Microsoft Excel, and so on.
The Java ecosystem offers several such libraries. I use Apache Commons CSV.
The Apache Commons CSV library does make not use of streams. But you have no need for streams for your work if using a library to do the scut work. The library makes easy work of looping the rows from the file, without loading large file into memory.
create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?
Apache Commons CSV does this automatically when you call withHeader.
return just one row in the form of List
Yes, easy to do.
As you requested, we can fill List with each of the 3 field values for one particular row. This List acts as a tuple.
List < String > tuple = List.of(); // Our goal is to fill this list of values from a single row. Initialize to an empty nonmodifiable list.
We specify the format we expect of our input file: standard CSV (RFC 4180), with the first row populated by column names.
CSVFormat format = CSVFormat.RFC4180.withHeader() ;
We specify the file path where to find our input file.
Path path = Path.of("/Users/basilbourque/people.csv");
We use try-with-resources syntax (see Tutorial) to automatically close our parser.
As we read in each row, we check for the name being Neda. If found, we report file our tuple List with that row's field values. And we interrupt the looping. We use List.of to conveniently return a List object of some unknown concrete class that is unmodifiable, meaning you cannot add nor remove elements from the list.
try (
CSVParser parser =CSVParser.parse( path , StandardCharsets.UTF_8, format ) ;
)
{
for ( CSVRecord record : parser )
{
if ( record.get( "name" ).equals( "Neda" ) )
{
tuple = List.of( record.get( "name" ) , record.get( "age" ) , record.get( "height" ) );
break ;
}
}
}
catch ( FileNotFoundException e )
{
e.printStackTrace();
}
catch ( IOException e )
{
e.printStackTrace();
}
If we found success, we should see some items in our List.
if ( tuple.isEmpty() )
{
System.out.println( "Bummer. Failed to report a row for `Neda` name." );
} else
{
System.out.println( "Success. Found this row for name of `Neda`:" );
System.out.println( tuple.toString() );
}
When run.
Success. Found this row for name of Neda:
[Neda, 14, 66]
Instead of using a List as a tuple, I suggest your define a Person class to represent this data with proper data types. Our code here would return a Person instance rather than a List<String>.

I know I'm responding so late, but maybe it will help someone in the future
I've made a csv parser/writer , easy to use thanks to its builder pattern
For your case: you can filter the lines you want to parse using
csvLineFilter(Predicate<String>)
Hope you find it handy, here is the source code
https://github.com/i7paradise/CsvUtils-Java8/
I've joined a main class Demo.java to display how it works

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to handle importing a CSV file with differing column lengths - java

Related

Using I/O stream to parse CSV file

How do I get a specific line from a text file into an array list?

Java Parsing CSV to Array OutofBoundsException

Univocity parser - Handling lines with weird constructs

Parsing a CSV file for a unique row using the new Java 8 Streams API

Categories

Resources