How to read a file line by line with Java Stream - java

I'm trying to read a long file line by line and trying in the same time to extract some information from the line.
Here in an example of what i'm doing:
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;
public class ReadFile_Files_Lines {
public static void main(String[] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
File file = new File(fileName);
try (Stream<String> linesStream = Files.lines(file.toPath())) {
linesStream.forEach(line -> {
System.out.println(line);
});
}
}
}
One line in my file is devided into three part :
10 1010101 15
I'm looking to read those three informations every time.
Like :
String str1 = line[0];
String str2 = line[1];
String str3 = line[2];
The solution i'm looking for will be better if i should not convert the stream to a collection.
I will use those 3 String to create a graph data structure, something like :
createGraphe(str1,str2,str3);`
I know that i can send the full String, but as i'm learning Stream I'm interested to know how to extract those informations.
Thank you.

You can map to an array by splitting each line, then call the method you want.
Files.lines(filePath)
.map(l -> l.split(" "))
.forEach(a -> createGraphe(a[0], a[1], a[2]));

The method lines() you are using already does what you expect it to do.
Java 8 has added a new method called lines() in Files class which can be used to read a file line by line in Java. The beauty of this method is that it reads all lines from a file as Stream of String, which is populated lazily as the stream is consumed. So, if you have a huge file and you only read first 100 lines then rest of the lines will not be loaded into memory, which results in better performance.
This is slightly different than Files.readAllLines() method (which reads all lines into a List) because this method reads the file lazily, only when a terminal operation is called on Stream e.g. forEach(), count() etc. By using count() method you can actually count a number of lines in files or number of empty lines by filtering empty lines.
Reference: https://javarevisited.blogspot.com/2015/07/3-ways-to-read-file-line-by-line-in.html

Since you want to solve this problem and want to learn how streams can be useful in this situation
Reading a file(Using Java8) , this will fetch you all the lines in the file: Stream lines = Files.lines(Paths.get(filePath))
Reading this file line by line : lines.map(line -> line.split(pattern)) , by splitting the line and you will get three sections from the line
Passing the arguments obtained into the function : forEach(arg -> createGraphe(arg[0], arg[1], arg[2]);
I hope this is pretty elaborate for your answer if you want to achieve this

Related

Using I/O stream to parse CSV file

I have a CSV file of US population data for every county in the US. I need to get each population from the 8th column of the file. I'm using a fileReader() and bufferedStream() and not sure how to use the split method to accomplish this. I know this isn't much information but I know that I'll be using my args[0] as the destination in my class.
I'm at a loss to where to being to be honest.
import java.io.FileReader;
public class Main {
public static void main(String[] args) {
BufferedReader() buff = new BufferedReader(new FileReader(args[0]));
String
}
try {
}
}
The output should be an integer of the total US population. Any help with pointing me in the right direction would be great.
Don't reinvent the wheel, don't parse CSV yourself: use a library. Even such a simple format as CSV has nuances: fields can be escaped with quotes or unescaped, the file can have or have not a header and so on. Besides that you have to test and maintain the code you've wrote. So writing less code and reusing libraries is good.
There are a plenty of libraries for CSV in Java:
Apache Commons CSV
OpenCSV
Super CSV
Univocity
flatpack
IMHO, the first two are the most popular.
Here is an example for Apache Commons CSV:
final Reader in = new FileReader("counties.csv");
final Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (final CSVRecord record : records) { // Simply iterate over the records via foreach loop. All the parsing is handler for you
String populationString = record.get(7); // Indexes are zero-based
String populationString = record.get("population"); // Or, if your file has headers, you can just use them
… // Do whatever you want with the population
}
Look how easy it is! And it will be similar with other parsers.

Parsing entire csv file vs parsing line by line in java

I have somewhat of a larger csv file approximately 80K to 120K rows (depending on the day). I'm successfully running the code which parses the entire csv file into a java object using #CsvBindByName annotation. Sample code:
Reader reader = Files.newBufferedReader(Paths.get(file));
CsvToBean csvToBean = new CsvToBeanBuilder<Object>(reader)
.withType(MyCustomClass.class)
.withIgnoreLeadingWhiteSpace(true)
.build();
List<MyCustomClass> myCustomClass= csvToBean.parse();`
I want to change this code to parse the csv file line by line instead of entire file but retain the neatness of mapping to java bean object. Essentially something like this:
CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(csvFileLoc)));
String[] headerRow = csvReader.readNext(); // save the headerRow
String [] nextLine = null;
MyCustomClass myCustomClass = new MyCustomClass();
while ((nextLine = csvReader.readNext())!=null) {
myCustomClass.setField1(nextLine[0]);
myCustomClass.setField2(nextLine[1]);
//.... so on
}
But the above solution ties me to knowing the column positions for each field. What I would like is to map the string array I get from csv based on the header row similar to what opencsv does while parsing the entire csv file. However, I am not able to do that using opencsv, as far as I can tell. I had assumed this would be a pretty common practice but I am unable to find any references to this online. It could be that I am not understanding the CsvToBean usage correctly for opencsv library. I could use csvToBean.iterator to iterate over the beans but I think entire csv file is loaded in memory with the build method, which kind of defeats the purpose of reading line by line. Any suggestions welcome
Looking at the API docs further, I see that CsvToBean<T> implements Iterable<T> and has an iterator() method that returns an Iterator<T> that is documented as follows:
The iterator returned by this method takes one line of input at a time and returns one bean at a time.
So it looks like you could just write your loop as:
for (MyCustomClass myCustomClass : csvToBean) {
// . . . do something with the bean . . .
}
Just to clear up some potential confusion, you can see in the source code that the build() method of CsvToBeanBuilder just creates the CsvToBean object, and doesn't do the actual input, and that the parse() method and the iterator of the CsvToBean object each do perform input.

Parsing a CSV file for a unique row using the new Java 8 Streams API

I am trying to use the new Java 8 Streams API (for which I am a complete newbie) to parse for a particular row (the one with 'Neda' in the name column) in a CSV file. Using the following article for motivation, I modified and fixed some errors so that I could parse the file containing 3 columns - 'name', 'age' and 'height'.
name,age,height
Marianne,12,61
Julie,13,73
Neda,14,66
Julia,15,62
Maryam,18,70
The parsing code is as follows:
#Override
public void init() throws Exception {
Map<String, String> params = getParameters().getNamed();
if (params.containsKey("csvfile")) {
Path path = Paths.get(params.get("csvfile"));
if (Files.exists(path)){
// use the new java 8 streams api to read the CSV column headings
Stream<String> lines = Files.lines(path);
List<String> columns = lines
.findFirst()
.map((line) -> Arrays.asList(line.split(",")))
.get();
columns.forEach((l)->System.out.println(l));
// find the relevant sections from the CSV file
// we are only interested in the row with Neda's name
int nameIndex = columns.indexOf("name");
int ageIndex columns.indexOf("age");
int heightIndex = columns.indexOf("height");
// we need to know the index positions of the
// have to re-read the csv file to extract the values
lines = Files.lines(path);
List<List<String>> values = lines
.skip(1)
.map((line) -> Arrays.asList(line.split(",")))
.collect(Collectors.toList());
values.forEach((l)->System.out.println(l));
}
}
}
Is there any way to avoid re-reading the file following the extraction of the header line? Although this is a very small example file, I will be applying this logic to a large CSV file.
Is there technique to use the streams API to create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?
How can I return just one row in the form of List<String> (instead of List<List<String>> containing all the rows). I would prefer to just find the row as a mapping between the column names and their corresponding values. (a bit like a result set in JDBC). I see a Collectors.mapMerger function that might be helpful here, but I have no idea how to use it.
Use a BufferedReader explicitly:
List<String> columns;
List<List<String>> values;
try(BufferedReader br=Files.newBufferedReader(path)) {
String firstLine=br.readLine();
if(firstLine==null) throw new IOException("empty file");
columns=Arrays.asList(firstLine.split(","));
values = br.lines()
.map(line -> Arrays.asList(line.split(",")))
.collect(Collectors.toList());
}
Files.lines(…) also resorts to BufferedReader.lines(…). The only difference is that Files.lines will configure the stream so that closing the stream will close the reader, which we don’t need here, as the explicit try(…) statement already ensures the closing of the BufferedReader.
Note that there is no guarantee about the state of the reader after the stream returned by lines() has been processed, but we can safely read lines before performing the stream operation.
First, your concern that this code is reading the file twice is not founded. Actually, Files.lines returns a Stream of the lines that is lazy-populated. So, the first part of the code only reads the first line and the second part of the code reads the rest (it does read the first line a second time though, even if ignored). Quoting its documentation:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Onto your second concern about returning just a single row. In functional programming, what you are trying to do is called filtering. The Stream API provides such a method with the help of Stream.filter. This method takes a Predicate as argument, which is a function that returns true for all the items that should be kept, and false otherwise.
In this case, we want a Predicate that would return true when the name is equal to "Neda". This could be written as the lambda expression s -> s.equals("Neda").
So in the second part of your code, you could have:
lines = Files.lines(path);
List<List<String>> values = lines
.skip(1)
.map(line -> Arrays.asList(line.split(",")))
.filter(list -> list.get(0).equals("Neda")) // keep only items where the name is "Neda"
.collect(Collectors.toList());
Note however that this does not ensure that there is only a single item where the name is "Neda", it collects all possible items into a List<List<String>>. You could add some logic to find the first item or throw an exception if no items are found, depending on your business requirement.
Note still that calling twice Files.lines(path) can be avoided by using directly a BufferedReader as in #Holger's answer.
Using a CSV-processing library
Other Answers are good. But I recommend using a CSV-processing library to read your input files. As others noted, the CSV format is not as simple as it may seem. To begin with, the values may or may not be nested in quote-marks. And there are many variations of CSV, such as those used in Postgres, MySQL, Mongo, Microsoft Excel, and so on.
The Java ecosystem offers several such libraries. I use Apache Commons CSV.
The Apache Commons CSV library does make not use of streams. But you have no need for streams for your work if using a library to do the scut work. The library makes easy work of looping the rows from the file, without loading large file into memory.
create a map between the extracted column names (in the first scan of the file) to the values in the remaining rows?
Apache Commons CSV does this automatically when you call withHeader.
return just one row in the form of List
Yes, easy to do.
As you requested, we can fill List with each of the 3 field values for one particular row. This List acts as a tuple.
List < String > tuple = List.of(); // Our goal is to fill this list of values from a single row. Initialize to an empty nonmodifiable list.
We specify the format we expect of our input file: standard CSV (RFC 4180), with the first row populated by column names.
CSVFormat format = CSVFormat.RFC4180.withHeader() ;
We specify the file path where to find our input file.
Path path = Path.of("/Users/basilbourque/people.csv");
We use try-with-resources syntax (see Tutorial) to automatically close our parser.
As we read in each row, we check for the name being Neda. If found, we report file our tuple List with that row's field values. And we interrupt the looping. We use List.of to conveniently return a List object of some unknown concrete class that is unmodifiable, meaning you cannot add nor remove elements from the list.
try (
CSVParser parser =CSVParser.parse( path , StandardCharsets.UTF_8, format ) ;
)
{
for ( CSVRecord record : parser )
{
if ( record.get( "name" ).equals( "Neda" ) )
{
tuple = List.of( record.get( "name" ) , record.get( "age" ) , record.get( "height" ) );
break ;
}
}
}
catch ( FileNotFoundException e )
{
e.printStackTrace();
}
catch ( IOException e )
{
e.printStackTrace();
}
If we found success, we should see some items in our List.
if ( tuple.isEmpty() )
{
System.out.println( "Bummer. Failed to report a row for `Neda` name." );
} else
{
System.out.println( "Success. Found this row for name of `Neda`:" );
System.out.println( tuple.toString() );
}
When run.
Success. Found this row for name of Neda:
[Neda, 14, 66]
Instead of using a List as a tuple, I suggest your define a Person class to represent this data with proper data types. Our code here would return a Person instance rather than a List<String>.
I know I'm responding so late, but maybe it will help someone in the future
I've made a csv parser/writer , easy to use thanks to its builder pattern
For your case: you can filter the lines you want to parse using
csvLineFilter(Predicate<String>)
Hope you find it handy, here is the source code
https://github.com/i7paradise/CsvUtils-Java8/
I've joined a main class Demo.java to display how it works

JSon to CSV with Java using CDL: possible to replace comma-sep. by semi-colum sep. values?

Everything is in the title :)
I'm using org.json.CDL to convert JSONArray into CSV data but it renders a string with ',' as separator.
I'd like to know if it's possible to replace with ';' ?
Here is a simple example of what i'm doing:
public String exportAsCsv() throws Exception {
return CDL.toString(
new JSONArray(
mapper.writeValueAsString(extractAccounts()))
);
}
Thanks in advance for any advice on that question.
Edit: No replacement solution of course, as this could have impact for large data, and of course the library used enable me to specify the field separator.
Edit2: Finally the solution to extract data as JSONArray (and String...) was not very good, especially for large data file.
So i made the following changes:
use a Java CSV library (for example: http://www.csvreader.com/java_csv_samples.php)
refactor code to stream data from json input source to csv output source
This is nicer for large data treatment. If you have comments do not hesitate.
String output = "Hello,This,is,separated,by,a,comma";
// Simple call the replaceAll method.
output = output.replace(',',';');
I found this in the String documentation.
Example
String value = "Hello,tthis,is,a,string";
value = value.replace(',', ';');
System.out.println(value);
// Outputs: Hello;tthis;is;a;string

Trying to read binary file as text but scanner stops at first line

I'm trying to read a binary file but my program just stops at first line..
I think it's because of the strange characters the file has..I just want to extract some directions from it. Is there a way to do this?..
public static void main(String[] args) throws IOException
{
Scanner readF = new Scanner(new File("D:\\CurrentDatabase_372.txt"));
String line = null;
String newLine = System.getProperty("line.separator");
FileWriter writeF = new FileWriter("D:\\Songs.txt");
while (readF.hasNext())
{
line = readF.nextLine();
if (line.contains("D:\\") && line.contains(".mp3"))
{
writeF.write(line.substring(line.indexOf("D:\\"), line.indexOf(".mp3") + 4) + newLine);
}
}
readF.close();
writeF.close();
}
The file starts like this:
pppppamepD:\Music\Korn\Untouchables\03 Blame.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables003pMetalKornUntouchables003pBlameKornUntouchables003pKornKornUntouchables003pMP3pppppCpppÀppp#ppøp·pppŸú#pdppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒppp’ÍpET?ppppppôpp¼}`Ñ#ãâK†¡H¤*(DppppppppppppppppuÞѤéú:M®$#]jkÝW0ÛœFµú½XVNp`w—wâÊp:ºŽwâÊpppp8Npdpp¡pp{)pppppppppppppppppyY:¸[ªA¥Bi `Û¯pppppppppppp2pppppppppppppppppppppppppppppppppppp¿ÞpAppppppp€ppp€;€?€CpCpC€H€N€S€`€e€y€~p~p~€’€«€Ê€â€Hollow LifepD:\Musica\Korn\Untouchables\04 Hollow Life.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables004pMetalKornUntouchables004pHollow LifeKornUntouchables004pKornKornUntouchables004pMP3pppppCpppÀHppppppøp¸pppǺxp‰ppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒpppŠºppppppppppôpp¼}`Ñ#ãâK†¡H¤*(DpppppppppppppppppãG#™R‚CA—®þ^bN °mbŽ‚^¨pG¦sp;5p5ÓÐùšwâÊp
)ŽwâÊpppp8Npdpp!cpp{pppppppppppppppppyY:¸[ªA¥Bi `ۯǺxp‰pppppp2pppppppppppppppppppppppppppppppppppp¿
I want to extract file directions like "D:\Music\Korn\Untouchables\03 Blame.mp3".
You cannot use a line-oriented scanner to read binary files. You have no guarantee that the binary file even has "lines" delimited by newline characters. For example, what would your scanner do if there were TWO files matching the pattern "D:\.*.mp3" with no intervening newline? You would extract everything between the first "D:\" and the last ".mp3", with all the garbage in between. Extracting file names from a non-delimited stream such as this requires a different strategy.
If i were writing this I'd use a relatively simple finite-state recognizer that processes characters one at a time. When it encounters a "d" it starts saving characters, checking each character to ensure that it matches the required pattern, ending when it sees the "3" in ".mp3". If at any point it detects a character that doesn't fit, it resets and continues looking.
EDIT: If the files to be processed are small (less than 50mb or so) you could load the entire file into memory, which would make scanning simpler.
As was said, since it is a binary file you can't use a Scanner or other character based readers. You could use a regular FileInputStream to read the actual raw bytes of the file. Java's String class has a constructor that will take an array of bytes and turn them into a string. You can then search that string for the file name(s). This may work if you just use the default character set.
String(byte[]):
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html
FileInputStream for reading bytes:
http://download.oracle.com/javase/tutorial/essential/io/bytestreams.html
Use hasNextLine() instead of hasNext() in the while loop check.
while (readF.hasNextLine()) {
String line = readF.nextLine();
//Your code
}

Categories