Currently I am using scanner/filereader and using while hasnextline. I think this method is not highly efficient. Is there any other method to read file with the similar functionality of this?
public void Read(String file) {
Scanner sc = null;
try {
sc = new Scanner(new FileReader(file));
while (sc.hasNextLine()) {
String text = sc.nextLine();
String[] file_Array = text.split(" ", 3);
if (file_Array[0].equalsIgnoreCase("case")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("object")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("classes")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("function")) {
//do something
}
else if (file_Array[0].equalsIgnoreCase("ignore")) {
//do something
}
else if (file_Array[0].equalsIgnoreCase("display")) {
//do something
}
}
} catch (FileNotFoundException e) {
System.out.println("Input file " + file + " not found");
System.exit(1);
} finally {
sc.close();
}
}
You will find that BufferedReader.readLine() is as fast as you need: you can read millions of lines a second with it. It is more probable that your string splitting and handling is causing whatever performance problems you are encountering.
I made a gist comparing different methods:
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.function.Function;
public class Main {
public static void main(String[] args) {
String path = "resources/testfile.txt";
measureTime("BufferedReader.readLine() into LinkedList", Main::bufferReaderToLinkedList, path);
measureTime("BufferedReader.readLine() into ArrayList", Main::bufferReaderToArrayList, path);
measureTime("Files.readAllLines()", Main::readAllLines, path);
measureTime("Scanner.nextLine() into ArrayList", Main::scannerArrayList, path);
measureTime("Scanner.nextLine() into LinkedList", Main::scannerLinkedList, path);
measureTime("RandomAccessFile.readLine() into ArrayList", Main::randomAccessFileArrayList, path);
measureTime("RandomAccessFile.readLine() into LinkedList", Main::randomAccessFileLinkedList, path);
System.out.println("-----------------------------------------------------------");
}
private static void measureTime(String name, Function<String, List<String>> fn, String path) {
System.out.println("-----------------------------------------------------------");
System.out.println("run: " + name);
long startTime = System.nanoTime();
List<String> l = fn.apply(path);
long estimatedTime = System.nanoTime() - startTime;
System.out.println("lines: " + l.size());
System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);
}
private static List<String> bufferReaderToLinkedList(String path) {
return bufferReaderToList(path, new LinkedList<>());
}
private static List<String> bufferReaderToArrayList(String path) {
return bufferReaderToList(path, new ArrayList<>());
}
private static List<String> bufferReaderToList(String path, List<String> list) {
try {
final BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8));
String line;
while ((line = in.readLine()) != null) {
list.add(line);
}
in.close();
} catch (final IOException e) {
e.printStackTrace();
}
return list;
}
private static List<String> readAllLines(String path) {
try {
return Files.readAllLines(Paths.get(path));
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
private static List<String> randomAccessFileLinkedList(String path) {
return randomAccessFile(path, new LinkedList<>());
}
private static List<String> randomAccessFileArrayList(String path) {
return randomAccessFile(path, new ArrayList<>());
}
private static List<String> randomAccessFile(String path, List<String> list) {
try {
RandomAccessFile file = new RandomAccessFile(path, "r");
String str;
while ((str = file.readLine()) != null) {
list.add(str);
}
file.close();
} catch (IOException e) {
e.printStackTrace();
}
return list;
}
private static List<String> scannerLinkedList(String path) {
return scanner(path, new LinkedList<>());
}
private static List<String> scannerArrayList(String path) {
return scanner(path, new ArrayList<>());
}
private static List<String> scanner(String path, List<String> list) {
try {
Scanner scanner = new Scanner(new File(path));
while (scanner.hasNextLine()) {
list.add(scanner.nextLine());
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return list;
}
}
run: BufferedReader.readLine() into LinkedList,
lines: 1000000,
estimatedTime: 0.105118655
run: BufferedReader.readLine() into ArrayList,
lines: 1000000,
estimatedTime: 0.072696934
run: Files.readAllLines(),
lines: 1000000,
estimatedTime: 0.087753316
run: Scanner.nextLine() into ArrayList,
lines: 1000000,
estimatedTime: 0.743121734
run: Scanner.nextLine() into LinkedList,
lines: 1000000,
estimatedTime: 0.867049885
run: RandomAccessFile.readLine() into ArrayList,
lines: 1000000,
estimatedTime: 11.413323046
run: RandomAccessFile.readLine() into LinkedList,
lines: 1000000,
estimatedTime: 11.423862897
BufferedReader is the fastest, Files.readAllLines() is also acceptable, Scanner is slow due to regex, RandomAccessFile is inacceptable
Scanner can't be as fast as BufferedReader, as it uses regular expressions for reading text files, which makes it slower compared to BufferedReader. By using BufferedReader you can read a block from a text file.
BufferedReader bf = new BufferedReader(new FileReader("FileName"));
you can next use readLine() to read from bf.
Hope it serves your purpose.
you can use FileChannel and ByteBuffer from JAVA NIO. ByteBuffer size is the most critical part in reading data faster what i have observed.
Below code will read the content of the file.
static public void main( String args[] ) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(
new File("sample4.txt"));
FileChannel fileChannel = fileInputStream.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
fileChannel.read(byteBuffer);
byteBuffer.flip();
int limit = byteBuffer.limit();
while(limit>0)
{
System.out.print((char)byteBuffer.get());
limit--;
}
fileChannel.close();
}
You can check for '\n' for new line here. Thanks.
Even you can scatter and getter way to read files faster i.e.
fileChannel.get(buffers);
where
ByteBuffer b1 = ByteBuffer.allocate(B1);
ByteBuffer b2 = ByteBuffer.allocate(B2);
ByteBuffer b3 = ByteBuffer.allocate(B3);
ByteBuffer[] buffers = {b1, b2, b3};
This saves the user process to from making several system calls (which can be expensive) and allows kernel to optimize handling of the data because it has information about the total transfer, If multiple CPUs available it may even be possible to fill and drain several buffers simultaneously.
From this book.
Use BufferedReader for high performance file access. But the default buffer size of 8192 bytes is often too small. For huge files you can increase the buffer size by orders of magnitudes to boost your file reading performance. For example:
BufferedReader br = new BufferedReader("file.dat", 1000 * 8192);
while ((thisLine = br.readLine()) != null) {
System.out.println(thisLine);
}
just updating this thread, now we have java 8 to do this job:
List<String> lines = Files.readAllLines(Paths.get(file_path);
You must investigate which part of program is taking time.
As per answer of EJP, you should use BufferedReader.
If really string processing is taking time, then you should consider using threads, one thread will read from file and queues lines. Other string processor threads will dequeue lines and process them. You will need to investigate how many threads to use, the number of threads you should use in application has to be related with number of cores in CPU, in that way will use full CPU.
You can read the file in chunks if there are millions of records. That will avoid potential memory issue. You need to keep last pointer to calculate offset of file.
try (FileReader reader = new FileReader(filePath);
BufferedReader bufferedReader = new BufferedReader(reader);) {
int pageOffset = lastOffset + counter;
int skipRecords = (pageOffset - 1) * batchSize;
bufferedReader.lines().skip(skipRecords).forEach(cline -> {
try {
// PRINT
}
If you wish to read all lines together then you should have a look at the Files API of java 7. Its really simple to use.
But a better approach would be to process this file in a batch. Have a reader which reads chunks of lines from the file and a writer which does the required processing or persists the data. Having abatch will ensure that it will work even if the lines increase to billion in future. Also you can have a batch which uses a multithreading to increase theoverall performance of the batch. I would recpmmend that you have a look at spring batch.
Related
Given there are some files Customer-1.txt, Customer-2.txt and Customer-3.txt and these files have the following content:
Customer-1.txt
1|1|MARY|SMITH
2|1|PATRICIA|JOHNSON
4|2|BARBARA|JONES
Customer-2.txt
1|1|MARY|SMITH
2|1|PATRICIA|JOHNSON
3|1|LINDA|WILLIAMS
4|2|BARBARA|JONES
Customer-3.txt
2|1|PATRICIA|JOHNSON
3|1|LINDA|WILLIAMS
5|2|ALEXANDER|ANDERSON
These files have a lot of duplicate data, but it is possible that each file contains some data that is unique.
And given that the actual files are sorted, big (a few GB each file) and there are many files...
Then what is the:
a) memory cheapest
b) cpu cheapest
c) fastest
way in Java to create one file out of these three files that will contain all the unique data of each file sorted and concatenated like such:
Customer-final.txt
1|1|MARY|SMITH
2|1|PATRICIA|JOHNSON
3|1|LINDA|WILLIAMS
4|2|BARBARA|JONES
5|2|ALEXANDER|ANDERSON
I looked into the following solution https://github.com/upcrob/spring-batch-sort-merge , but I would like to know if its possible to perhaps do it with the FileInputStream and/or a non spring batch solution.
A solution to use an in memory or real database to join them is not viable for my use case due to the size of the files and the absence of an actual database.
Since the input files are already sorted, a simple parallel iteration of the files, merging their content, is the memory cheapest, cpu cheapest, and fastest way to do it.
This is a multi-way merge join, i.e. a sort-merge join without the "sort", with elimination of duplicates, similar to a SQL DISTINCT.
Here is a version that can do unlimited number of input files (well, as many as you can have open files anyway). It uses a helper class to stage the next line from each input file, so the leading ID value only has to be parsed once per line.
private static void merge(StringWriter out, BufferedReader ... in) throws IOException {
CustomerReader[] customerReader = new CustomerReader[in.length];
for (int i = 0; i < in.length; i++)
customerReader[i] = new CustomerReader(in[i]);
merge(out, customerReader);
}
private static void merge(StringWriter out, CustomerReader ... in) throws IOException {
List<CustomerReader> min = new ArrayList<>(in.length);
for (;;) {
min.clear();
for (CustomerReader reader : in)
if (reader.hasData()) {
int cmp = (min.isEmpty() ? 0 : reader.compareTo(min.get(0)));
if (cmp < 0)
min.clear();
if (cmp <= 0)
min.add(reader);
}
if (min.isEmpty())
break; // all done
// optional: Verify that lines that compared equal by ID are entirely equal
out.write(min.get(0).getCustomerLine());
out.write(System.lineSeparator());
for (CustomerReader reader : min)
reader.readNext();
}
}
private static final class CustomerReader implements Comparable<CustomerReader> {
private BufferedReader in;
private String customerLine;
private int customerId;
CustomerReader(BufferedReader in) throws IOException {
this.in = in;
readNext();
}
void readNext() throws IOException {
if ((this.customerLine = this.in.readLine()) == null)
this.customerId = Integer.MAX_VALUE;
else
this.customerId = Integer.parseInt(this.customerLine.substring(0, this.customerLine.indexOf('|')));
}
boolean hasData() {
return (this.customerLine != null);
}
String getCustomerLine() {
return this.customerLine;
}
#Override
public int compareTo(CustomerReader that) {
// Order by customerId only. Inconsistent with equals()
return Integer.compare(this.customerId, that.customerId);
}
}
TEST
String file1data = "1|1|MARY|SMITH\n" +
"2|1|PATRICIA|JOHNSON\n" +
"4|2|BARBARA|JONES\n";
String file2data = "1|1|MARY|SMITH\n" +
"2|1|PATRICIA|JOHNSON\n" +
"3|1|LINDA|WILLIAMS\n" +
"4|2|BARBARA|JONES\n";
String file3data = "2|1|PATRICIA|JOHNSON\n" +
"3|1|LINDA|WILLIAMS\n" +
"5|2|ALEXANDER|ANDERSON\n";
try (
BufferedReader in1 = new BufferedReader(new StringReader(file1data));
BufferedReader in2 = new BufferedReader(new StringReader(file2data));
BufferedReader in3 = new BufferedReader(new StringReader(file3data));
StringWriter out = new StringWriter();
) {
merge(out, in1, in2, in3);
System.out.print(out);
}
OUTPUT
1|1|MARY|SMITH
2|1|PATRICIA|JOHNSON
3|1|LINDA|WILLIAMS
4|2|BARBARA|JONES
5|2|ALEXANDER|ANDERSON
The code merges purely by ID value, and doesn't verify that rest of line is actually equal. Insert code at the optional comment to check for that, if needed.
This might help:
public static void main(String[] args) {
String files[] = {"Customer-1.txt", "Customer-2.txt", "Customer-3.txt"};
HashMap<Integer, String> customers = new HashMap<Integer, String>();
try {
String line;
for(int i = 0; i < files.length; i++) {
BufferedReader reader = new BufferedReader(new FileReader("data/" + files[i]));
while((line = reader.readLine()) != null) {
Integer uuid = Integer.valueOf(line.split("|")[0]);
customers.put(uuid, line);
}
reader.close();
}
BufferedWriter writer = new BufferedWriter(new FileWriter("data/Customer-final.txt"));
Iterator<String> it = customers.values().iterator();
while(it.hasNext()) writer.write(it.next() + "\n");
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
If you have any cquestions ask me.
I have tried doing it like this:
import java.io.*;
public class ConvertChar {
public static void main(String args[]) {
Long now = System.nanoTime();
String nomCompletFichier = "C:\\Users\\aahamed\\Desktop\\test\\test.xml";
Convert(nomCompletFichier);
Long inter = System.nanoTime() - now;
System.out.println(inter);
}
public static void Convert(String nomCompletFichier) {
FileWriter writer = null;
BufferedReader reader = null;
try {
File file = new File(nomCompletFichier);
reader = new BufferedReader(new FileReader(file));
String oldtext = "";
while (reader.ready()) {
oldtext += reader.readLine() + "\n";
}
reader.close();
// replace a word in a file
// String newtext = oldtext.replaceAll("drink", "Love");
// To replace a line in a file
String newtext = oldtext.replaceAll("&(?!amp;)", "&");
writer = new FileWriter(file);
writer.write(newtext);
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
However the code above takes more time to execute than creating two different files:
import java.io.*;
public class ConvertChar {
public static void main(String args[]) {
Long now = System.nanoTime();
String nomCompletFichier = "C:\\Users\\aahamed\\Desktop\\test\\test.xml";
Convert(nomCompletFichier);
Long inter = System.nanoTime() - now;
System.out.println(inter);
}
private static void Convert(String nomCompletFichier) {
BufferedReader br = null;
BufferedWriter bw = null;
try {
File file = new File(nomCompletFichier);
File tempFile = File.createTempFile("buffer", ".tmp");
bw = new BufferedWriter(new FileWriter(tempFile, true));
br = new BufferedReader(new FileReader(file));
while (br.ready()) {
bw.write(br.readLine().replaceAll("&(?!amp;)", "&") + "\n");
}
bw.close();
br.close();
file.delete();
tempFile.renameTo(file);
} catch (IOException e) {
// writeLog("Erreur lors de la conversion des caractères : " + e.getMessage(), 0);
} finally {
try {
bw.close();
} catch (Exception ignore) {
}
try {
br.close();
} catch (Exception ignore) {
}
}
}
}
Is there any way to do the 2nd code without creating a temp file and reducing the execution time? I am doing a code optimization.
The main reason why your first program is slow is probably that it's building up the string oldtext incrementally. The problem with that is that each time you add another line to it it may need to make a copy of it. Since each copy takes time roughly proportional to the length of the string being copied, your execution time will scale like the square of the size of your input file.
You can check whether this is your problem by trying with files of different lengths and seeing how the runtime depends on the file size.
If so, one easy way to get around the problem is Java's StringBuilder class which is intended for exactly this task: building up a large string incrementally.
The main culprit in your first example is that you're building oldtext inefficiently using String concatenations, as explained here. This allocates a new string for every concatenation. Java provides you StringBuilder for building strings:
StringBuilder builder = new StringBuilder;
while(reader.ready()){
builder.append(reader.readLine());
builder.append("\n");
}
String oldtext = builder.toString();
You can also do the replacement when you're building your text in StringBuilder. Another problem with your code is that you shouldn't use ready() to check if there is some content left in the file - check the result of readLine(). Finally, closing the stream should be in a finally or try-with-resources block. The result could look like this:
StringBuilder builder = new StringBuilder();
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
String line = reader.readLine();
while (line != null) {
builder.append(line.replaceAll("&(?!amp;)", "&"));
builder.append('\n');
line = reader.readLine();
}
}
String newText = builder.toString();
Writing to a temporary file is a good solution too, though. The amount of I/O, which is the slowest to handle, is the same in both cases - read the full content once, write result once.
I am messing about with some code and was wondering is there a way to order the output in an ascending/descending order using the fileOutputStream?
code:
public static void main(String[] args) throws IOException
{
String directory = "C:\\Users\\xxxx\\Desktop\\Files\\ex1.txt";
String output = "C:\\Users\\xxxxx\\Desktop\\Files\\ex1_temp.txt";
BufferedInputStream readFile = null;
BufferedOutputStream writeFile = null;
try {
readFile = new BufferedInputStream(new FileInputStream(directory));
writeFile = new BufferedOutputStream(new FileOutputStream(output));
int data;
while ((data = readFile.read()) != -1) {
//System.out.println(data);
//Collections.sort(data);
writeFile.write(data);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if (readFile != null)
readFile.close();
if (writeFile != null)
writeFile.close();
}
}
Generally, you need to have the data in memory to sort them, so you can't use streams well for that.
If you need to sort large data, you can use External sorting. While implementing such algorithm, you'll probably end up using streams (to read the original file in smaller chunks etc.), but streams alone won't help you here, they're merely part of the solution.
It seems there are different ways to read and write data of files in Java.
I want to read ASCII data from a file. What are the possible ways and their differences?
My favorite way to read a small file is to use a BufferedReader and a StringBuilder. It is very simple and to the point (though not particularly effective, but good enough for most cases):
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
} finally {
br.close();
}
Some has pointed out that after Java 7 you should use try-with-resources (i.e. auto close) features:
try(BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
}
When I read strings like this, I usually want to do some string handling per line anyways, so then I go for this implementation.
Though if I want to actually just read a file into a String, I always use Apache Commons IO with the class IOUtils.toString() method. You can have a look at the source here:
http://www.docjar.com/html/api/org/apache/commons/io/IOUtils.java.html
FileInputStream inputStream = new FileInputStream("foo.txt");
try {
String everything = IOUtils.toString(inputStream);
} finally {
inputStream.close();
}
And even simpler with Java 7:
try(FileInputStream inputStream = new FileInputStream("foo.txt")) {
String everything = IOUtils.toString(inputStream);
// do something with everything string
}
ASCII is a TEXT file so you would use Readers for reading. Java also supports reading from a binary file using InputStreams. If the files being read are huge then you would want to use a BufferedReader on top of a FileReader to improve read performance.
Go through this article on how to use a Reader
I'd also recommend you download and read this wonderful (yet free) book called Thinking In Java
In Java 7:
new String(Files.readAllBytes(...))
(docs)
or
Files.readAllLines(...)
(docs)
In Java 8:
Files.lines(..).forEach(...)
(docs)
The easiest way is to use the Scanner class in Java and the FileReader object. Simple example:
Scanner in = new Scanner(new FileReader("filename.txt"));
Scanner has several methods for reading in strings, numbers, etc... You can look for more information on this on the Java documentation page.
For example reading the whole content into a String:
StringBuilder sb = new StringBuilder();
while(in.hasNext()) {
sb.append(in.next());
}
in.close();
outString = sb.toString();
Also if you need a specific encoding you can use this instead of FileReader:
new InputStreamReader(new FileInputStream(fileUtf8), StandardCharsets.UTF_8)
Here is a simple solution:
String content = new String(Files.readAllBytes(Paths.get("sample.txt")));
Or to read as list:
List<String> content = Files.readAllLines(Paths.get("sample.txt"))
Here's another way to do it without using external libraries:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public String readFile(String filename)
{
String content = null;
File file = new File(filename); // For example, foo.txt
FileReader reader = null;
try {
reader = new FileReader(file);
char[] chars = new char[(int) file.length()];
reader.read(chars);
content = new String(chars);
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if(reader != null){
reader.close();
}
}
return content;
}
I had to benchmark the different ways. I shall comment on my findings but, in short, the fastest way is to use a plain old BufferedInputStream over a FileInputStream. If many files must be read then three threads will reduce the total execution time to roughly half, but adding more threads will progressively degrade performance until making it take three times longer to complete with twenty threads than with just one thread.
The assumption is that you must read a file and do something meaningful with its contents. In the examples here is reading lines from a log and count the ones which contain values that exceed a certain threshold. So I am assuming that the one-liner Java 8 Files.lines(Paths.get("/path/to/file.txt")).map(line -> line.split(";")) is not an option.
I tested on Java 1.8, Windows 7 and both SSD and HDD drives.
I wrote six different implementations:
rawParse: Use BufferedInputStream over a FileInputStream and then cut lines reading byte by byte. This outperformed any other single-thread approach, but it may be very inconvenient for non-ASCII files.
lineReaderParse: Use a BufferedReader over a FileReader, read line by line, split lines by calling String.split(). This is approximatedly 20% slower that rawParse.
lineReaderParseParallel: This is the same as lineReaderParse, but it uses several threads. This is the fastest option overall in all cases.
nioFilesParse: Use java.nio.files.Files.lines()
nioAsyncParse: Use an AsynchronousFileChannel with a completion handler and a thread pool.
nioMemoryMappedParse: Use a memory-mapped file. This is really a bad idea yielding execution times at least three times longer than any other implementation.
These are the average times for reading 204 files of 4 MB each on an quad-core i7 and SSD drive. The files are generated on the fly to avoid disk caching.
rawParse 11.10 sec
lineReaderParse 13.86 sec
lineReaderParseParallel 6.00 sec
nioFilesParse 13.52 sec
nioAsyncParse 16.06 sec
nioMemoryMappedParse 37.68 sec
I found a difference smaller than I expected between running on an SSD or an HDD drive being the SSD approximately 15% faster. This may be because the files are generated on an unfragmented HDD and they are read sequentially, therefore the spinning drive can perform nearly as an SSD.
I was surprised by the low performance of the nioAsyncParse implementation. Either I have implemented something in the wrong way or the multi-thread implementation using NIO and a completion handler performs the same (or even worse) than a single-thread implementation with the java.io API. Moreover the asynchronous parse with a CompletionHandler is much longer in lines of code and tricky to implement correctly than a straight implementation on old streams.
Now the six implementations followed by a class containing them all plus a parametrizable main() method that allows to play with the number of files, file size and concurrency degree. Note that the size of the files varies plus minus 20%. This is to avoid any effect due to all the files being of exactly the same size.
rawParse
public void rawParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
overrunCount = 0;
final int dl = (int) ';';
StringBuffer lineBuffer = new StringBuffer(1024);
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileInputStream fin = new FileInputStream(fl);
BufferedInputStream bin = new BufferedInputStream(fin);
int character;
while((character=bin.read())!=-1) {
if (character==dl) {
// Here is where something is done with each line
doSomethingWithRawLine(lineBuffer.toString());
lineBuffer.setLength(0);
}
else {
lineBuffer.append((char) character);
}
}
bin.close();
fin.close();
}
}
public final void doSomethingWithRawLine(String line) throws ParseException {
// What to do for each line
int fieldNumber = 0;
final int len = line.length();
StringBuffer fieldBuffer = new StringBuffer(256);
for (int charPos=0; charPos<len; charPos++) {
char c = line.charAt(charPos);
if (c==DL0) {
String fieldValue = fieldBuffer.toString();
if (fieldValue.length()>0) {
switch (fieldNumber) {
case 0:
Date dt = fmt.parse(fieldValue);
fieldNumber++;
break;
case 1:
double d = Double.parseDouble(fieldValue);
fieldNumber++;
break;
case 2:
int t = Integer.parseInt(fieldValue);
fieldNumber++;
break;
case 3:
if (fieldValue.equals("overrun"))
overrunCount++;
break;
}
}
fieldBuffer.setLength(0);
}
else {
fieldBuffer.append(c);
}
}
}
lineReaderParse
public void lineReaderParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
String line;
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null)
doSomethingWithLine(line);
brd.close();
frd.close();
}
}
public final void doSomethingWithLine(String line) throws ParseException {
// Example of what to do for each line
String[] fields = line.split(";");
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
lineReaderParseParallel
public void lineReaderParseParallel(final String targetDir, final int numberOfFiles, final int degreeOfParalelism) throws IOException, ParseException, InterruptedException {
Thread[] pool = new Thread[degreeOfParalelism];
int batchSize = numberOfFiles / degreeOfParalelism;
for (int b=0; b<degreeOfParalelism; b++) {
pool[b] = new LineReaderParseThread(targetDir, b*batchSize, b*batchSize+b*batchSize);
pool[b].start();
}
for (int b=0; b<degreeOfParalelism; b++)
pool[b].join();
}
class LineReaderParseThread extends Thread {
private String targetDir;
private int fileFrom;
private int fileTo;
private DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private int overrunCounter = 0;
public LineReaderParseThread(String targetDir, int fileFrom, int fileTo) {
this.targetDir = targetDir;
this.fileFrom = fileFrom;
this.fileTo = fileTo;
}
private void doSomethingWithTheLine(String line) throws ParseException {
String[] fields = line.split(DL);
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCounter++;
}
#Override
public void run() {
String line;
for (int f=fileFrom; f<fileTo; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
try {
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null) {
doSomethingWithTheLine(line);
}
brd.close();
frd.close();
} catch (IOException | ParseException ioe) { }
}
}
}
nioFilesParse
public void nioFilesParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
for (int f=0; f<numberOfFiles; f++) {
Path ph = Paths.get(targetDir+filenamePreffix+String.valueOf(f)+".txt");
Consumer<String> action = new LineConsumer();
Stream<String> lines = Files.lines(ph);
lines.forEach(action);
lines.close();
}
}
class LineConsumer implements Consumer<String> {
#Override
public void accept(String line) {
// What to do for each line
String[] fields = line.split(DL);
if (fields.length>1) {
try {
Date dt = fmt.parse(fields[0]);
}
catch (ParseException e) {
}
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
}
}
nioAsyncParse
public void nioAsyncParse(final String targetDir, final int numberOfFiles, final int numberOfThreads, final int bufferSize) throws IOException, ParseException, InterruptedException {
ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(numberOfThreads);
ConcurrentLinkedQueue<ByteBuffer> byteBuffers = new ConcurrentLinkedQueue<ByteBuffer>();
for (int b=0; b<numberOfThreads; b++)
byteBuffers.add(ByteBuffer.allocate(bufferSize));
for (int f=0; f<numberOfFiles; f++) {
consumerThreads.acquire();
String fileName = targetDir+filenamePreffix+String.valueOf(f)+".txt";
AsynchronousFileChannel channel = AsynchronousFileChannel.open(Paths.get(fileName), EnumSet.of(StandardOpenOption.READ), pool);
BufferConsumer consumer = new BufferConsumer(byteBuffers, fileName, bufferSize);
channel.read(consumer.buffer(), 0l, channel, consumer);
}
consumerThreads.acquire(numberOfThreads);
}
class BufferConsumer implements CompletionHandler<Integer, AsynchronousFileChannel> {
private ConcurrentLinkedQueue<ByteBuffer> buffers;
private ByteBuffer bytes;
private String file;
private StringBuffer chars;
private int limit;
private long position;
private DateFormat frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
public BufferConsumer(ConcurrentLinkedQueue<ByteBuffer> byteBuffers, String fileName, int bufferSize) {
buffers = byteBuffers;
bytes = buffers.poll();
if (bytes==null)
bytes = ByteBuffer.allocate(bufferSize);
file = fileName;
chars = new StringBuffer(bufferSize);
frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
limit = bufferSize;
position = 0l;
}
public ByteBuffer buffer() {
return bytes;
}
#Override
public synchronized void completed(Integer result, AsynchronousFileChannel channel) {
if (result!=-1) {
bytes.flip();
final int len = bytes.limit();
int i = 0;
try {
for (i = 0; i < len; i++) {
byte by = bytes.get();
if (by=='\n') {
// ***
// The code used to process the line goes here
chars.setLength(0);
}
else {
chars.append((char) by);
}
}
}
catch (Exception x) {
System.out.println(
"Caught exception " + x.getClass().getName() + " " + x.getMessage() +
" i=" + String.valueOf(i) + ", limit=" + String.valueOf(len) +
", position="+String.valueOf(position));
}
if (len==limit) {
bytes.clear();
position += len;
channel.read(bytes, position, channel, this);
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
#Override
public void failed(Throwable e, AsynchronousFileChannel channel) {
}
};
FULL RUNNABLE IMPLEMENTATION OF ALL CASES
https://github.com/sergiomt/javaiobenchmark/blob/master/FileReadBenchmark.java
Here are the three working and tested methods:
Using BufferedReader
package io;
import java.io.*;
public class ReadFromFile2 {
public static void main(String[] args)throws Exception {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while((st=br.readLine()) != null){
System.out.println(st);
}
}
}
Using Scanner
package io;
import java.io.File;
import java.util.Scanner;
public class ReadFromFileUsingScanner {
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
Scanner sc = new Scanner(file);
while(sc.hasNextLine()){
System.out.println(sc.nextLine());
}
}
}
Using FileReader
package io;
import java.io.*;
public class ReadingFromFile {
public static void main(String[] args) throws Exception {
FileReader fr = new FileReader("C:\\Users\\pankaj\\Desktop\\test.java");
int i;
while ((i=fr.read()) != -1){
System.out.print((char) i);
}
}
}
Read the entire file without a loop using the Scanner class
package io;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ReadingEntireFileWithoutLoop {
public static void main(String[] args) throws FileNotFoundException {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
Scanner sc = new Scanner(file);
sc.useDelimiter("\\Z");
System.out.println(sc.next());
}
}
The methods within org.apache.commons.io.FileUtils may also be very handy, e.g.:
/**
* Reads the contents of a file line by line to a List
* of Strings using the default encoding for the VM.
*/
static List readLines(File file)
I documented 15 ways to read a file in Java and then tested them for speed with various file sizes - from 1 KB to 1 GB and here are the top three ways to do this:
java.nio.file.Files.readAllBytes()
Tested to work in Java 7, 8, and 9.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
public class ReadFile_Files_ReadAllBytes {
public static void main(String [] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt";
File file = new File(fileName);
byte [] fileBytes = Files.readAllBytes(file.toPath());
char singleChar;
for(byte b : fileBytes) {
singleChar = (char) b;
System.out.print(singleChar);
}
}
}
java.io.BufferedReader.readLine()
Tested to work in Java 7, 8, 9.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class ReadFile_BufferedReader_ReadLine {
public static void main(String [] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt";
FileReader fileReader = new FileReader(fileName);
try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
String line;
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}
}
}
java.nio.file.Files.lines()
This was tested to work in Java 8 and 9 but won't work in Java 7 because of the lambda expression requirement.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;
public class ReadFile_Files_Lines {
public static void main(String[] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt";
File file = new File(fileName);
try (Stream linesStream = Files.lines(file.toPath())) {
linesStream.forEach(line -> {
System.out.println(line);
});
}
}
}
What do you want to do with the text? Is the file small enough to fit into memory? I would try to find the simplest way to handle the file for your needs. The FileUtils library is very handle for this.
for(String line: FileUtils.readLines("my-text-file"))
System.out.println(line);
Below is a one-liner of doing it in the Java 8 way. Assuming text.txt file is in the root of the project directory of the Eclipse.
Files.lines(Paths.get("text.txt")).collect(Collectors.toList());
The most intuitive method is introduced in Java 11 Files.readString
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String args[]) throws IOException {
String content = Files.readString(Paths.get("D:\\sandbox\\mvn\\my-app\\my-app.iml"));
System.out.print(content);
}
}
PHP has this luxury for decades! ☺
The buffered stream classes are much more performant in practice, so much so that the NIO.2 API includes methods that specifically return these stream classes, in part to encourage you always to use buffered streams in your application.
Here is an example:
Path path = Paths.get("/myfolder/myfile.ext");
try (BufferedReader reader = Files.newBufferedReader(path)) {
// Read from the stream
String currentLine = null;
while ((currentLine = reader.readLine()) != null)
//do your code here
} catch (IOException e) {
// Handle file I/O exception...
}
You can replace this code
BufferedReader reader = Files.newBufferedReader(path);
with
BufferedReader br = new BufferedReader(new FileReader("/myfolder/myfile.ext"));
I recommend this article to learn the main uses of Java NIO and IO.
Using BufferedReader:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
BufferedReader br;
try {
br = new BufferedReader(new FileReader("/fileToRead.txt"));
try {
String x;
while ( (x = br.readLine()) != null ) {
// Printing out each line in the file
System.out.println(x);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
catch (FileNotFoundException e) {
System.out.println(e);
e.printStackTrace();
}
This is basically the exact same as Jesus Ramos' answer, except with File instead of FileReader plus iteration to step through the contents of the file.
Scanner in = new Scanner(new File("filename.txt"));
while (in.hasNext()) { // Iterates each line in the file
String line = in.nextLine();
// Do something with line
}
in.close(); // Don't forget to close resource leaks
... throws FileNotFoundException
Probably not as fast as with buffered I/O, but quite terse:
String content;
try (Scanner scanner = new Scanner(textFile).useDelimiter("\\Z")) {
content = scanner.next();
}
The \Z pattern tells the Scanner that the delimiter is EOF.
The most simple way to read data from a file in Java is making use of the File class to read the file and the Scanner class to read the content of the file.
public static void main(String args[])throws Exception
{
File f = new File("input.txt");
takeInputIn2DArray(f);
}
public static void takeInputIn2DArray(File f) throws Exception
{
Scanner s = new Scanner(f);
int a[][] = new int[20][20];
for(int i=0; i<20; i++)
{
for(int j=0; j<20; j++)
{
a[i][j] = s.nextInt();
}
}
}
PS: Don't forget to import java.util.*; for Scanner to work.
You can use readAllLines and the join method to get whole file content in one line:
String str = String.join("\n",Files.readAllLines(Paths.get("e:\\text.txt")));
It uses UTF-8 encoding by default, which reads ASCII data correctly.
Also you can use readAllBytes:
String str = new String(Files.readAllBytes(Paths.get("e:\\text.txt")), StandardCharsets.UTF_8);
I think readAllBytes is faster and more precise, because it does not replace new line with \n and also new line may be \r\n. It is depending on your needs which one is suitable.
I don't see it mentioned yet in the other answers so far. But if "Best" means speed, then the new Java I/O (NIO) might provide the fastest preformance, but not always the easiest to figure out for someone learning.
http://download.oracle.com/javase/tutorial/essential/io/file.html
Guava provides a one-liner for this:
import com.google.common.base.Charsets;
import com.google.common.io.Files;
String contents = Files.toString(filePath, Charsets.UTF_8);
Cactoos give you a declarative one-liner:
new TextOf(new File("a.txt")).asString();
This might not be the exact answer to the question. It's just another way of reading a file where you do not explicitly specify the path to your file in your Java code and instead, you read it as a command-line argument.
With the following code,
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
public class InputReader{
public static void main(String[] args)throws IOException{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String s="";
while((s=br.readLine())!=null){
System.out.println(s);
}
}
}
just go ahead and run it with:
java InputReader < input.txt
This would read the contents of the input.txt and print it to the your console.
You can also make your System.out.println() to write to a specific file through the command line as follows:
java InputReader < input.txt > output.txt
This would read from input.txt and write to output.txt.
For JSF-based Maven web applications, just use ClassLoader and the Resources folder to read in any file you want:
Put any file you want to read in the Resources folder.
Put the Apache Commons IO dependency into your POM:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
Use the code below to read it (e.g. below is reading in a .json file):
String metadata = null;
FileInputStream inputStream;
try {
ClassLoader loader = Thread.currentThread().getContextClassLoader();
inputStream = (FileInputStream) loader
.getResourceAsStream("/metadata.json");
metadata = IOUtils.toString(inputStream);
inputStream.close();
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return metadata;
You can do the same for text files, .properties files, XSD schemas, etc.
try {
File f = new File("filename.txt");
Scanner r = new Scanner(f);
while (r.hasNextLine()) {
String data = r.nextLine();
JOptionPane.showMessageDialog(data);
}
r.close();
} catch (FileNotFoundException ex) {
JOptionPane.showMessageDialog("Error occurred");
ex.printStackTrace();
}
Use Java kiss if this is about simplicity of structure:
import static kiss.API.*;
class App {
void run() {
String line;
try (Close in = inOpen("file.dat")) {
while ((line = readLine()) != null) {
println(line);
}
}
}
}
import java.util.stream.Stream;
import java.nio.file.*;
import java.io.*;
class ReadFile {
public static void main(String[] args) {
String filename = "Test.txt";
try(Stream<String> stream = Files.lines(Paths.get(filename))) {
stream.forEach(System.out:: println);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Just use java 8 Stream.
In case you have a large file you can use Apache Commons IO to process the file iteratively without exhausting the available memory.
try (LineIterator it = FileUtils.lineIterator(theFile, "UTF-8")) {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
}
try (Stream<String> stream = Files.lines(Paths.get(String.valueOf(new File("yourFile.txt"))))) {
stream.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
new File(<path_name>)
Creates a new File instance by converting the given pathname string into an abstract pathname. If the given string is the empty string, then the result is the empty abstract pathname.
Params:
pathname – A pathname string
Throws:
NullPointerException – If the pathname argument is null
Files.lines returns a stream of String
Stream<String> stream = Files.lines(Paths.get(String.valueOf(new File("yourFile.txt"))))
can throw nullPointerExcetion , FileNotFoundException so, keepint it inside try will take care of Exception in runtime
stream.forEach(System.out::println);
This is used to iterate over the stream and print in console
If you have different use case you can provide your custome function to manipulate the stream of lines
My new favorite approach to simply read a whole text file from a BufferedReader input goes:
String text = input.lines().collect(Collectors.joining(System.lineSeparator())));
This will read the whole file by adding new line (lineSeparator) behind each line. Without the separator it would join all lines together as one.
This appears to have existed since Java 8.
For Android developers ending up here (who use Kotlin):
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val text = myFileUrl.readText() // Not recommended for huge files
println(text)
Other solution:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
val lines = file.readLines() // Not recommended for huge files
lines.forEach(::println)
Another good solution which can be used for huge files as well:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file
.bufferedReader()
.lineSequence()
.forEach(::println)
Or:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file.useLines { lines ->
lines.forEach(::println)
}
Notes:
The vegetables.txt file should be in your classpath (for example, in src/main/resources directory)
The above solutions all treat the file encodings as UTF-8 by default. You can specify your desired encoding as the argument for the functions.
The above solutions do not need any further action like closing the files or readers. They are automatically taken care of by the Kotlin standard library.
It seems there are different ways to read and write data of files in Java.
I want to read ASCII data from a file. What are the possible ways and their differences?
My favorite way to read a small file is to use a BufferedReader and a StringBuilder. It is very simple and to the point (though not particularly effective, but good enough for most cases):
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
} finally {
br.close();
}
Some has pointed out that after Java 7 you should use try-with-resources (i.e. auto close) features:
try(BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
}
When I read strings like this, I usually want to do some string handling per line anyways, so then I go for this implementation.
Though if I want to actually just read a file into a String, I always use Apache Commons IO with the class IOUtils.toString() method. You can have a look at the source here:
http://www.docjar.com/html/api/org/apache/commons/io/IOUtils.java.html
FileInputStream inputStream = new FileInputStream("foo.txt");
try {
String everything = IOUtils.toString(inputStream);
} finally {
inputStream.close();
}
And even simpler with Java 7:
try(FileInputStream inputStream = new FileInputStream("foo.txt")) {
String everything = IOUtils.toString(inputStream);
// do something with everything string
}
ASCII is a TEXT file so you would use Readers for reading. Java also supports reading from a binary file using InputStreams. If the files being read are huge then you would want to use a BufferedReader on top of a FileReader to improve read performance.
Go through this article on how to use a Reader
I'd also recommend you download and read this wonderful (yet free) book called Thinking In Java
In Java 7:
new String(Files.readAllBytes(...))
(docs)
or
Files.readAllLines(...)
(docs)
In Java 8:
Files.lines(..).forEach(...)
(docs)
The easiest way is to use the Scanner class in Java and the FileReader object. Simple example:
Scanner in = new Scanner(new FileReader("filename.txt"));
Scanner has several methods for reading in strings, numbers, etc... You can look for more information on this on the Java documentation page.
For example reading the whole content into a String:
StringBuilder sb = new StringBuilder();
while(in.hasNext()) {
sb.append(in.next());
}
in.close();
outString = sb.toString();
Also if you need a specific encoding you can use this instead of FileReader:
new InputStreamReader(new FileInputStream(fileUtf8), StandardCharsets.UTF_8)
Here is a simple solution:
String content = new String(Files.readAllBytes(Paths.get("sample.txt")));
Or to read as list:
List<String> content = Files.readAllLines(Paths.get("sample.txt"))
Here's another way to do it without using external libraries:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public String readFile(String filename)
{
String content = null;
File file = new File(filename); // For example, foo.txt
FileReader reader = null;
try {
reader = new FileReader(file);
char[] chars = new char[(int) file.length()];
reader.read(chars);
content = new String(chars);
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if(reader != null){
reader.close();
}
}
return content;
}
I had to benchmark the different ways. I shall comment on my findings but, in short, the fastest way is to use a plain old BufferedInputStream over a FileInputStream. If many files must be read then three threads will reduce the total execution time to roughly half, but adding more threads will progressively degrade performance until making it take three times longer to complete with twenty threads than with just one thread.
The assumption is that you must read a file and do something meaningful with its contents. In the examples here is reading lines from a log and count the ones which contain values that exceed a certain threshold. So I am assuming that the one-liner Java 8 Files.lines(Paths.get("/path/to/file.txt")).map(line -> line.split(";")) is not an option.
I tested on Java 1.8, Windows 7 and both SSD and HDD drives.
I wrote six different implementations:
rawParse: Use BufferedInputStream over a FileInputStream and then cut lines reading byte by byte. This outperformed any other single-thread approach, but it may be very inconvenient for non-ASCII files.
lineReaderParse: Use a BufferedReader over a FileReader, read line by line, split lines by calling String.split(). This is approximatedly 20% slower that rawParse.
lineReaderParseParallel: This is the same as lineReaderParse, but it uses several threads. This is the fastest option overall in all cases.
nioFilesParse: Use java.nio.files.Files.lines()
nioAsyncParse: Use an AsynchronousFileChannel with a completion handler and a thread pool.
nioMemoryMappedParse: Use a memory-mapped file. This is really a bad idea yielding execution times at least three times longer than any other implementation.
These are the average times for reading 204 files of 4 MB each on an quad-core i7 and SSD drive. The files are generated on the fly to avoid disk caching.
rawParse 11.10 sec
lineReaderParse 13.86 sec
lineReaderParseParallel 6.00 sec
nioFilesParse 13.52 sec
nioAsyncParse 16.06 sec
nioMemoryMappedParse 37.68 sec
I found a difference smaller than I expected between running on an SSD or an HDD drive being the SSD approximately 15% faster. This may be because the files are generated on an unfragmented HDD and they are read sequentially, therefore the spinning drive can perform nearly as an SSD.
I was surprised by the low performance of the nioAsyncParse implementation. Either I have implemented something in the wrong way or the multi-thread implementation using NIO and a completion handler performs the same (or even worse) than a single-thread implementation with the java.io API. Moreover the asynchronous parse with a CompletionHandler is much longer in lines of code and tricky to implement correctly than a straight implementation on old streams.
Now the six implementations followed by a class containing them all plus a parametrizable main() method that allows to play with the number of files, file size and concurrency degree. Note that the size of the files varies plus minus 20%. This is to avoid any effect due to all the files being of exactly the same size.
rawParse
public void rawParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
overrunCount = 0;
final int dl = (int) ';';
StringBuffer lineBuffer = new StringBuffer(1024);
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileInputStream fin = new FileInputStream(fl);
BufferedInputStream bin = new BufferedInputStream(fin);
int character;
while((character=bin.read())!=-1) {
if (character==dl) {
// Here is where something is done with each line
doSomethingWithRawLine(lineBuffer.toString());
lineBuffer.setLength(0);
}
else {
lineBuffer.append((char) character);
}
}
bin.close();
fin.close();
}
}
public final void doSomethingWithRawLine(String line) throws ParseException {
// What to do for each line
int fieldNumber = 0;
final int len = line.length();
StringBuffer fieldBuffer = new StringBuffer(256);
for (int charPos=0; charPos<len; charPos++) {
char c = line.charAt(charPos);
if (c==DL0) {
String fieldValue = fieldBuffer.toString();
if (fieldValue.length()>0) {
switch (fieldNumber) {
case 0:
Date dt = fmt.parse(fieldValue);
fieldNumber++;
break;
case 1:
double d = Double.parseDouble(fieldValue);
fieldNumber++;
break;
case 2:
int t = Integer.parseInt(fieldValue);
fieldNumber++;
break;
case 3:
if (fieldValue.equals("overrun"))
overrunCount++;
break;
}
}
fieldBuffer.setLength(0);
}
else {
fieldBuffer.append(c);
}
}
}
lineReaderParse
public void lineReaderParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
String line;
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null)
doSomethingWithLine(line);
brd.close();
frd.close();
}
}
public final void doSomethingWithLine(String line) throws ParseException {
// Example of what to do for each line
String[] fields = line.split(";");
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
lineReaderParseParallel
public void lineReaderParseParallel(final String targetDir, final int numberOfFiles, final int degreeOfParalelism) throws IOException, ParseException, InterruptedException {
Thread[] pool = new Thread[degreeOfParalelism];
int batchSize = numberOfFiles / degreeOfParalelism;
for (int b=0; b<degreeOfParalelism; b++) {
pool[b] = new LineReaderParseThread(targetDir, b*batchSize, b*batchSize+b*batchSize);
pool[b].start();
}
for (int b=0; b<degreeOfParalelism; b++)
pool[b].join();
}
class LineReaderParseThread extends Thread {
private String targetDir;
private int fileFrom;
private int fileTo;
private DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private int overrunCounter = 0;
public LineReaderParseThread(String targetDir, int fileFrom, int fileTo) {
this.targetDir = targetDir;
this.fileFrom = fileFrom;
this.fileTo = fileTo;
}
private void doSomethingWithTheLine(String line) throws ParseException {
String[] fields = line.split(DL);
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCounter++;
}
#Override
public void run() {
String line;
for (int f=fileFrom; f<fileTo; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
try {
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null) {
doSomethingWithTheLine(line);
}
brd.close();
frd.close();
} catch (IOException | ParseException ioe) { }
}
}
}
nioFilesParse
public void nioFilesParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
for (int f=0; f<numberOfFiles; f++) {
Path ph = Paths.get(targetDir+filenamePreffix+String.valueOf(f)+".txt");
Consumer<String> action = new LineConsumer();
Stream<String> lines = Files.lines(ph);
lines.forEach(action);
lines.close();
}
}
class LineConsumer implements Consumer<String> {
#Override
public void accept(String line) {
// What to do for each line
String[] fields = line.split(DL);
if (fields.length>1) {
try {
Date dt = fmt.parse(fields[0]);
}
catch (ParseException e) {
}
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
}
}
nioAsyncParse
public void nioAsyncParse(final String targetDir, final int numberOfFiles, final int numberOfThreads, final int bufferSize) throws IOException, ParseException, InterruptedException {
ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(numberOfThreads);
ConcurrentLinkedQueue<ByteBuffer> byteBuffers = new ConcurrentLinkedQueue<ByteBuffer>();
for (int b=0; b<numberOfThreads; b++)
byteBuffers.add(ByteBuffer.allocate(bufferSize));
for (int f=0; f<numberOfFiles; f++) {
consumerThreads.acquire();
String fileName = targetDir+filenamePreffix+String.valueOf(f)+".txt";
AsynchronousFileChannel channel = AsynchronousFileChannel.open(Paths.get(fileName), EnumSet.of(StandardOpenOption.READ), pool);
BufferConsumer consumer = new BufferConsumer(byteBuffers, fileName, bufferSize);
channel.read(consumer.buffer(), 0l, channel, consumer);
}
consumerThreads.acquire(numberOfThreads);
}
class BufferConsumer implements CompletionHandler<Integer, AsynchronousFileChannel> {
private ConcurrentLinkedQueue<ByteBuffer> buffers;
private ByteBuffer bytes;
private String file;
private StringBuffer chars;
private int limit;
private long position;
private DateFormat frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
public BufferConsumer(ConcurrentLinkedQueue<ByteBuffer> byteBuffers, String fileName, int bufferSize) {
buffers = byteBuffers;
bytes = buffers.poll();
if (bytes==null)
bytes = ByteBuffer.allocate(bufferSize);
file = fileName;
chars = new StringBuffer(bufferSize);
frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
limit = bufferSize;
position = 0l;
}
public ByteBuffer buffer() {
return bytes;
}
#Override
public synchronized void completed(Integer result, AsynchronousFileChannel channel) {
if (result!=-1) {
bytes.flip();
final int len = bytes.limit();
int i = 0;
try {
for (i = 0; i < len; i++) {
byte by = bytes.get();
if (by=='\n') {
// ***
// The code used to process the line goes here
chars.setLength(0);
}
else {
chars.append((char) by);
}
}
}
catch (Exception x) {
System.out.println(
"Caught exception " + x.getClass().getName() + " " + x.getMessage() +
" i=" + String.valueOf(i) + ", limit=" + String.valueOf(len) +
", position="+String.valueOf(position));
}
if (len==limit) {
bytes.clear();
position += len;
channel.read(bytes, position, channel, this);
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
#Override
public void failed(Throwable e, AsynchronousFileChannel channel) {
}
};
FULL RUNNABLE IMPLEMENTATION OF ALL CASES
https://github.com/sergiomt/javaiobenchmark/blob/master/FileReadBenchmark.java
Here are the three working and tested methods:
Using BufferedReader
package io;
import java.io.*;
public class ReadFromFile2 {
public static void main(String[] args)throws Exception {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while((st=br.readLine()) != null){
System.out.println(st);
}
}
}
Using Scanner
package io;
import java.io.File;
import java.util.Scanner;
public class ReadFromFileUsingScanner {
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
Scanner sc = new Scanner(file);
while(sc.hasNextLine()){
System.out.println(sc.nextLine());
}
}
}
Using FileReader
package io;
import java.io.*;
public class ReadingFromFile {
public static void main(String[] args) throws Exception {
FileReader fr = new FileReader("C:\\Users\\pankaj\\Desktop\\test.java");
int i;
while ((i=fr.read()) != -1){
System.out.print((char) i);
}
}
}
Read the entire file without a loop using the Scanner class
package io;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ReadingEntireFileWithoutLoop {
public static void main(String[] args) throws FileNotFoundException {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
Scanner sc = new Scanner(file);
sc.useDelimiter("\\Z");
System.out.println(sc.next());
}
}
The methods within org.apache.commons.io.FileUtils may also be very handy, e.g.:
/**
* Reads the contents of a file line by line to a List
* of Strings using the default encoding for the VM.
*/
static List readLines(File file)
I documented 15 ways to read a file in Java and then tested them for speed with various file sizes - from 1 KB to 1 GB and here are the top three ways to do this:
java.nio.file.Files.readAllBytes()
Tested to work in Java 7, 8, and 9.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
public class ReadFile_Files_ReadAllBytes {
public static void main(String [] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt";
File file = new File(fileName);
byte [] fileBytes = Files.readAllBytes(file.toPath());
char singleChar;
for(byte b : fileBytes) {
singleChar = (char) b;
System.out.print(singleChar);
}
}
}
java.io.BufferedReader.readLine()
Tested to work in Java 7, 8, 9.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class ReadFile_BufferedReader_ReadLine {
public static void main(String [] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt";
FileReader fileReader = new FileReader(fileName);
try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
String line;
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}
}
}
java.nio.file.Files.lines()
This was tested to work in Java 8 and 9 but won't work in Java 7 because of the lambda expression requirement.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;
public class ReadFile_Files_Lines {
public static void main(String[] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt";
File file = new File(fileName);
try (Stream linesStream = Files.lines(file.toPath())) {
linesStream.forEach(line -> {
System.out.println(line);
});
}
}
}
What do you want to do with the text? Is the file small enough to fit into memory? I would try to find the simplest way to handle the file for your needs. The FileUtils library is very handle for this.
for(String line: FileUtils.readLines("my-text-file"))
System.out.println(line);
Below is a one-liner of doing it in the Java 8 way. Assuming text.txt file is in the root of the project directory of the Eclipse.
Files.lines(Paths.get("text.txt")).collect(Collectors.toList());
The most intuitive method is introduced in Java 11 Files.readString
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String args[]) throws IOException {
String content = Files.readString(Paths.get("D:\\sandbox\\mvn\\my-app\\my-app.iml"));
System.out.print(content);
}
}
PHP has this luxury for decades! ☺
The buffered stream classes are much more performant in practice, so much so that the NIO.2 API includes methods that specifically return these stream classes, in part to encourage you always to use buffered streams in your application.
Here is an example:
Path path = Paths.get("/myfolder/myfile.ext");
try (BufferedReader reader = Files.newBufferedReader(path)) {
// Read from the stream
String currentLine = null;
while ((currentLine = reader.readLine()) != null)
//do your code here
} catch (IOException e) {
// Handle file I/O exception...
}
You can replace this code
BufferedReader reader = Files.newBufferedReader(path);
with
BufferedReader br = new BufferedReader(new FileReader("/myfolder/myfile.ext"));
I recommend this article to learn the main uses of Java NIO and IO.
Using BufferedReader:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
BufferedReader br;
try {
br = new BufferedReader(new FileReader("/fileToRead.txt"));
try {
String x;
while ( (x = br.readLine()) != null ) {
// Printing out each line in the file
System.out.println(x);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
catch (FileNotFoundException e) {
System.out.println(e);
e.printStackTrace();
}
This is basically the exact same as Jesus Ramos' answer, except with File instead of FileReader plus iteration to step through the contents of the file.
Scanner in = new Scanner(new File("filename.txt"));
while (in.hasNext()) { // Iterates each line in the file
String line = in.nextLine();
// Do something with line
}
in.close(); // Don't forget to close resource leaks
... throws FileNotFoundException
Probably not as fast as with buffered I/O, but quite terse:
String content;
try (Scanner scanner = new Scanner(textFile).useDelimiter("\\Z")) {
content = scanner.next();
}
The \Z pattern tells the Scanner that the delimiter is EOF.
The most simple way to read data from a file in Java is making use of the File class to read the file and the Scanner class to read the content of the file.
public static void main(String args[])throws Exception
{
File f = new File("input.txt");
takeInputIn2DArray(f);
}
public static void takeInputIn2DArray(File f) throws Exception
{
Scanner s = new Scanner(f);
int a[][] = new int[20][20];
for(int i=0; i<20; i++)
{
for(int j=0; j<20; j++)
{
a[i][j] = s.nextInt();
}
}
}
PS: Don't forget to import java.util.*; for Scanner to work.
You can use readAllLines and the join method to get whole file content in one line:
String str = String.join("\n",Files.readAllLines(Paths.get("e:\\text.txt")));
It uses UTF-8 encoding by default, which reads ASCII data correctly.
Also you can use readAllBytes:
String str = new String(Files.readAllBytes(Paths.get("e:\\text.txt")), StandardCharsets.UTF_8);
I think readAllBytes is faster and more precise, because it does not replace new line with \n and also new line may be \r\n. It is depending on your needs which one is suitable.
I don't see it mentioned yet in the other answers so far. But if "Best" means speed, then the new Java I/O (NIO) might provide the fastest preformance, but not always the easiest to figure out for someone learning.
http://download.oracle.com/javase/tutorial/essential/io/file.html
Guava provides a one-liner for this:
import com.google.common.base.Charsets;
import com.google.common.io.Files;
String contents = Files.toString(filePath, Charsets.UTF_8);
Cactoos give you a declarative one-liner:
new TextOf(new File("a.txt")).asString();
This might not be the exact answer to the question. It's just another way of reading a file where you do not explicitly specify the path to your file in your Java code and instead, you read it as a command-line argument.
With the following code,
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
public class InputReader{
public static void main(String[] args)throws IOException{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String s="";
while((s=br.readLine())!=null){
System.out.println(s);
}
}
}
just go ahead and run it with:
java InputReader < input.txt
This would read the contents of the input.txt and print it to the your console.
You can also make your System.out.println() to write to a specific file through the command line as follows:
java InputReader < input.txt > output.txt
This would read from input.txt and write to output.txt.
For JSF-based Maven web applications, just use ClassLoader and the Resources folder to read in any file you want:
Put any file you want to read in the Resources folder.
Put the Apache Commons IO dependency into your POM:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
Use the code below to read it (e.g. below is reading in a .json file):
String metadata = null;
FileInputStream inputStream;
try {
ClassLoader loader = Thread.currentThread().getContextClassLoader();
inputStream = (FileInputStream) loader
.getResourceAsStream("/metadata.json");
metadata = IOUtils.toString(inputStream);
inputStream.close();
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return metadata;
You can do the same for text files, .properties files, XSD schemas, etc.
try {
File f = new File("filename.txt");
Scanner r = new Scanner(f);
while (r.hasNextLine()) {
String data = r.nextLine();
JOptionPane.showMessageDialog(data);
}
r.close();
} catch (FileNotFoundException ex) {
JOptionPane.showMessageDialog("Error occurred");
ex.printStackTrace();
}
Use Java kiss if this is about simplicity of structure:
import static kiss.API.*;
class App {
void run() {
String line;
try (Close in = inOpen("file.dat")) {
while ((line = readLine()) != null) {
println(line);
}
}
}
}
import java.util.stream.Stream;
import java.nio.file.*;
import java.io.*;
class ReadFile {
public static void main(String[] args) {
String filename = "Test.txt";
try(Stream<String> stream = Files.lines(Paths.get(filename))) {
stream.forEach(System.out:: println);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Just use java 8 Stream.
In case you have a large file you can use Apache Commons IO to process the file iteratively without exhausting the available memory.
try (LineIterator it = FileUtils.lineIterator(theFile, "UTF-8")) {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
}
try (Stream<String> stream = Files.lines(Paths.get(String.valueOf(new File("yourFile.txt"))))) {
stream.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
new File(<path_name>)
Creates a new File instance by converting the given pathname string into an abstract pathname. If the given string is the empty string, then the result is the empty abstract pathname.
Params:
pathname – A pathname string
Throws:
NullPointerException – If the pathname argument is null
Files.lines returns a stream of String
Stream<String> stream = Files.lines(Paths.get(String.valueOf(new File("yourFile.txt"))))
can throw nullPointerExcetion , FileNotFoundException so, keepint it inside try will take care of Exception in runtime
stream.forEach(System.out::println);
This is used to iterate over the stream and print in console
If you have different use case you can provide your custome function to manipulate the stream of lines
My new favorite approach to simply read a whole text file from a BufferedReader input goes:
String text = input.lines().collect(Collectors.joining(System.lineSeparator())));
This will read the whole file by adding new line (lineSeparator) behind each line. Without the separator it would join all lines together as one.
This appears to have existed since Java 8.
For Android developers ending up here (who use Kotlin):
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val text = myFileUrl.readText() // Not recommended for huge files
println(text)
Other solution:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
val lines = file.readLines() // Not recommended for huge files
lines.forEach(::println)
Another good solution which can be used for huge files as well:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file
.bufferedReader()
.lineSequence()
.forEach(::println)
Or:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file.useLines { lines ->
lines.forEach(::println)
}
Notes:
The vegetables.txt file should be in your classpath (for example, in src/main/resources directory)
The above solutions all treat the file encodings as UTF-8 by default. You can specify your desired encoding as the argument for the functions.
The above solutions do not need any further action like closing the files or readers. They are automatically taken care of by the Kotlin standard library.