Optimize Scanner performance for large files - java

I wrote a program in Java using a Scanner to get two elements separated by a space from each line to put it in an object stocked in an ArrayList. It works perfectly, but when it comes to inputs with 10000's of lines, it becomes very long. I read a few topics and websites (such as this) telling BufferedReader would be a lot efficient than Scanner but I did not see any improvement while trying.
Here are the lines I use so far to parse each line of my input:
String charsetName = "UTF-8";
Scanner scanner = new Scanner(new BufferedInputStream(System.in), charsetName);
Then I have a loop running during the number of lines calling:
String[] mid = scanner.nextLine().split(" ");
So I tried to replace the Scanner by:
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String[] base = reader.readLine().split(" ");
It did not change anything (8 seconds for 12000 lines in both case)
Am I going in the right direction to get the program work a lot faster? Or does the problem come from using a loop to go through each line?

I used the following code to read through a file of 280,000 lines (consisting of two words per line separated by a space) and split them on a space. It took 0.105 seconds. So I would like to know more about the line you are parsing and what you are doing with it. Paste more code please.
public static void main(String args[]) throws Exception {
Date start = new Date();
BufferedReader b = new BufferedReader(new FileReader("aa.txt"));
String line;
while ((line = b.readLine())!=null) {
String[] splat = line.split(" ");
}
b.close();
Date end = new Date();
System.out.println("Took " + (end.getTime() - start.getTime()) / 1000.0 + " seconds");
}
I modified the code above to add each splat array to an array list (not sure why you want to do this, but I am guessing this is what you are trying to do, from your OP). The code slowed down to 0.244 seconds. Still way less than a second. More info please.
Supplement - FULL CODE (compile with javac Julien.java). Remember to replace aa.txt with your file name.
import java.util.*;
import java.io.*;
public class Julien {
public static void main(String args[]) throws Exception {
Date start = new Date();
// List arrl = new ArrayList();
BufferedReader b = new BufferedReader(new FileReader("aa.txt"));
String line;
while ((line = b.readLine())!=null) {
String[] splat = line.split(" ");
// arrl.add(splat);
}
b.close();
Date end = new Date();
System.out.println("Took " + (end.getTime() - start.getTime()) / 1000.0 + " seconds");
}
}

Related

Read from file with BufferedReader

Basically I've got an assignment which reads multiple lines from a .txt file.
There are 4 values in the text file per line and each value is separated by 2 spaces.
There are about 10 lines of data in the file.
After taking the input from the file the program then puts it onto a Database. The database connection functionality works fine.
My issue now is with reading from the file using a BufferedReader.
The issue is that if I uncomment any 1 of the 3 lines at the bottom the BufferedReader reads every other line. And if I don't use them then there's an exception as the next input is of type String.
I have contemplated using a Scanner with the .hasNextLine() method.
Any thoughts on what could be the problem and how to fix it?
Thanks.
File file = new File(FILE_INPUT_NAME);
FileReader fr = new FileReader(file);
BufferedReader readFile = new BufferedReader(fr);
String line = null;
while ((line = readFile.readLine()) != null) {
String[] split = line.split(" ", 4);
String id = split[0];
nameFromFile = split[1];
String year = split[2];
String mark = split[3];
idFromFile = Integer.parseInt(id);
yearOfStudyFromFile = Integer.parseInt(year);
markFromFile = Integer.parseInt(mark);
//line = readFile.readLine();
//readFile.readLine();
//System.out.println(readFile.readLine());
}
Edit: There was an error in the formatting of the .txt file. a missing value.
But now I get an ArrayOutOfBoundsException.
Edit edit: Another error in the .txt file! Turns out there was a single space instead of a double. It seems to be working now. But any advice on how to deal with file errors like this in the future?
The issue is that if I uncomment any 1 of the 3 lines at the bottom the BufferedReader reads every other line.
Correct. If you put any of those lines of code in, the line of text read will be thrown away and not processed. You're already reading in the while condition. You don't need another read. If you put any of those lines in, they will be thrown away and not proce
A compilable version of the code posted could be
public void read() throws IOException {
File file = new File(FILE_INPUT_NAME);
FileReader fr = new FileReader(file);
BufferedReader readFile = new BufferedReader(fr);
String line;
while ((line = readFile.readLine()) != null) {
String[] split = line.split(" ", 4);
if (split.length != 4) { // Not enough tokens (e.g., empty line) read
continue;
}
String id = split[0];
String nameFromFile = split[1];
String year = split[2];
String mark = split[3];
int idFromFile = Integer.parseInt(id);
int yearOfStudyFromFile = Integer.parseInt(year);
int markFromFile = Integer.parseInt(mark);
//line = readFile.readLine();
//readFile.readLine();
//System.out.println(readFile.readLine());
}
}
The above uses a single space (" " instead of the original " "). To split on any number of changes, a regular expression can be used, e.g. "\\s+". Of course, exactly 2 spaces can also be used, if that reflects the structure of the input data.
What the method should do with the extracted values (e.g., returning them in an object of some type, or saving them to a database directly), is up to the application using it.

Read different variables from a file [java] and put them into a JList

I have some difficulties writing a code wich should be able to read heterogeneous features from a .txt file.
This is a sample file:
size=1.523763e-13 Type= aBc, KCd, EIf
I need to find this features and then put them in a Jlist on netbeans.
To find the size variable I've thought to use the BufferedReader class, but I don't know what to do next!
Any help?
My code by far:
public String findSize() {
String spec = "";
try {
BufferedReader reader = new BufferedReader(new FileReader("sample.txt"));
String line = reader.readLine();
while(line!=null) {
if (line.contains("size")) {
for(int i = line.indexOf("size")+1, i = line.length(), i++)
spec +=...;
You can do it easily by splitting the string you read via BufferedReader or Scanner.
In the below example I have used Scanner and am reading the lines from System.in. You can replace it to read the lines from your source file.
Here is the code snippet:
public static void main (String[] args)
{
Scanner in = new Scanner(System.in);
List<String> typeString;
while(in.hasNext()) {
String[] str = in.nextLine().split("=");
System.out.println("Size: " + str[1].split(" ")[0] + " Type: " + str[2]);
typeString = new ArrayList<>(Arrays.asList(str[2].split(", ")));
}
}
Please note that this is just for demonstration purpose. You can split the String and play with the sub-strings and store them anyway you want.
Input:
size=1.523763e-13 Type=aBc, KCd, EIf
Output:
Size: 1.523763e-13 Type: aBc, KCd, EIf
typeString --> {aBc, KCd, EIf}

Reading in a file and processing data

I am a noobie at programming and I can't seem to figure out what to do.
I am to write a Java program that reads in any number of lines from a file and generate a report with:
the count of the number of values read
the total sum
the average score (to 2 decimal places)
the maximum value along with the corresponding name.
the minimum value along with the corresponding name.
The input file looks like this:
55527 levaaj01
57508 levaaj02
58537 schrsd01
59552 waterj01
60552 boersm01
61552 kercvj01
62552 buttkp02
64552 duncdj01
65552 beingm01
I program runs fine, but when I add in
score = input.nextInt(); and
player = input.next();
The program stops working and the keyboard input seems to stop working for the filename.
I am trying to read each line with the int and name separately so that I can process the data and complete my assignment. I don't really know what to do next.
Here is my code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Scanner;
public class Program1 {
private Scanner input = new Scanner(System.in);
private static int fileRead = 0;
private String fileName = "";
private int count = 0;
private int score = 0;
private String player = "";
public static void main(String[] args) {
Program1 p1 = new Program1();
p1.getFirstDecision();
p1.readIn();
}
public void getFirstDecision() { //*************************************
System.out.println("What is the name of the input file?");
fileName = input.nextLine(); // gcgc_dat.txt
}
public void readIn(){ //*********************************************
try {
FileReader fr = new FileReader(fileName + ".txt");
fileRead = 1;
BufferedReader br = new BufferedReader(fr);
String str;
int line = 0;
while((str = br.readLine()) != null){
score = input.nextInt();
player = input.next();
System.out.println(str);
line++;
score = score + score;
count++;
}
System.out.println(count);
System.out.println(score);
br.close();
}
catch (Exception ex){
System.out.println("There is no shop named: " + fileName);
}
}
}
The way you used BufferReader with Scanner is totally wrong .
Note: you can use BufferReader in Scanner constructor.
For example :
try( Scanner input = new Scanner( new BufferedReader(new FileReader("your file path goes here")))){
}catch(IOException e){
}
Note: your file reading process or other processes must be in try block because in catch block you cannot do anything because your connection is closed. It is called try catch block with resources.
Note:
A BufferedReader will create a buffer. This should result in faster
reading from the file. Why? Because the buffer gets filled with the
contents of the file. So, you put a bigger chunk of the file in RAM
(if you are dealing with small files, the buffer can contain the whole
file). Now if the Scanner wants to read two bytes, it can read two
bytes from the buffer, instead of having to ask for two bytes to the
hard drive.
Generally speaking, it is much faster to read 10 times 4096 bytes
instead of 4096 times 10 bytes.
Source BufferedReader in Scanner's constructor
Suggestion: you can just read each line of your file by using BufferReader and do your parsing by yourself, or you can use Scanner class that gives you ability to do parsing tokens.
difference between Scanner and BufferReader
As a hint you can use this sample for your parsing goal
Code:
String input = "Kick 20";
String[] inputSplited = input.split(" ");
System.out.println("My splited name is " + inputSplited[0]);
System.out.println("Next year I am " + (Integer.parseInt(inputSplited[1])+1));
Output:
My splited name is Kick
Next year I am 21
Hope you can fixed your program by given hints.

Reading each line of a file and searching for a specific word in java

So I have an assignment that requires me to "Search a file line by line for a given string. The output must contain the line number, and the line itself, for example if the word files was picked the output look something like
5: He had the files
9: the fILEs were his
Code:
void Search(String input) throws IOException {
int x = 1;
FileReader Search = new FileReader(f);
Scanner in = new Scanner(f);
LineNumberReader L = new LineNumberReader(Search, x);
StreamTokenizer token = new StreamTokenizer(Search);
while (in.hasNextLine())
{
try
{
if (!in.findInLine(input).isEmpty())
{
display(Integer.toString(L.getLineNumber()) + ": " + L.readLine(), "\n");
in.nextLine();
}
} catch (NullPointerException e)
{
System.out.println("Something Happened");
in.nextLine();
}
}
}
So far there are 3 issues I need to figure out with my code.
As soon as instance occurs where the searched is not in a line, it immediately displays the next line, even though the searched word is not in the line, and then terminates from there without having displayed the rest of the lines that had the word in it.
It supposed to display lines with the word, regardless of casing, but does not.
Preferably, it's supposed to display all of them at once, but instead is displaying line by line, until it errors out and terminates.
You're main problem is here...
FileReader Search = new FileReader(f);
Scanner in = new Scanner(f);
LineNumberReader L = new LineNumberReader(Search, x);
StreamTokenizer token = new StreamTokenizer(Search);
while (in.hasNextLine())
{
You've basically opened two file readers against the same file, but you seem to be expecting them to know about each other. You advance the Scanner, but that has no effect on the LineNumberReader. This then messes up the reporting and line reading process.
Reading from Scanner should look more like...
while (in.hasNextLine()) {
String text = in.nextLine();
Having said that, I'd actually drop the Scanner in favor of the LineNumberReader as it will provide you with more useful information which you would otherwise have to do yourself.
For example...
FileReader Search = new FileReader(new File("TestFile"));
LineNumberReader L = new LineNumberReader(Search, x);
String text = null;
while ((text = L.readLine()) != null) {
// Convert the two values to lower case for comparison...
if (text.toLowerCase().contains(input.toLowerCase())) {
System.out.println(L.getLineNumber() + ": " + text);
}
}

Java: How to read a text file

I want to read a text file containing space separated values. Values are integers.
How can I read it and put it in an array list?
Here is an example of contents of the text file:
1 62 4 55 5 6 77
I want to have it in an arraylist as [1, 62, 4, 55, 5, 6, 77]. How can I do it in Java?
You can use Files#readAllLines() to get all lines of a text file into a List<String>.
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
// ...
}
Tutorial: Basic I/O > File I/O > Reading, Writing and Creating text files
You can use String#split() to split a String in parts based on a regular expression.
for (String part : line.split("\\s+")) {
// ...
}
Tutorial: Numbers and Strings > Strings > Manipulating Characters in a String
You can use Integer#valueOf() to convert a String into an Integer.
Integer i = Integer.valueOf(part);
Tutorial: Numbers and Strings > Strings > Converting between Numbers and Strings
You can use List#add() to add an element to a List.
numbers.add(i);
Tutorial: Interfaces > The List Interface
So, in a nutshell (assuming that the file doesn't have empty lines nor trailing/leading whitespace).
List<Integer> numbers = new ArrayList<>();
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
for (String part : line.split("\\s+")) {
Integer i = Integer.valueOf(part);
numbers.add(i);
}
}
If you happen to be at Java 8 already, then you can even use Stream API for this, starting with Files#lines().
List<Integer> numbers = Files.lines(Paths.get("/path/to/test.txt"))
.map(line -> line.split("\\s+")).flatMap(Arrays::stream)
.map(Integer::valueOf)
.collect(Collectors.toList());
Tutorial: Processing data with Java 8 streams
Java 1.5 introduced the Scanner class for handling input from file and streams.
It is used for getting integers from a file and would look something like this:
List<Integer> integers = new ArrayList<Integer>();
Scanner fileScanner = new Scanner(new File("c:\\file.txt"));
while (fileScanner.hasNextInt()){
integers.add(fileScanner.nextInt());
}
Check the API though. There are many more options for dealing with different types of input sources, differing delimiters, and differing data types.
This example code shows you how to read file in Java.
import java.io.*;
/**
* This example code shows you how to read file in Java
*
* IN MY CASE RAILWAY IS MY TEXT FILE WHICH I WANT TO DISPLAY YOU CHANGE WITH YOUR OWN
*/
public class ReadFileExample
{
public static void main(String[] args)
{
System.out.println("Reading File from Java code");
//Name of the file
String fileName="RAILWAY.txt";
try{
//Create object of FileReader
FileReader inputFile = new FileReader(fileName);
//Instantiate the BufferedReader Class
BufferedReader bufferReader = new BufferedReader(inputFile);
//Variable to hold the one line data
String line;
// Read file line by line and print on the console
while ((line = bufferReader.readLine()) != null) {
System.out.println(line);
}
//Close the buffer reader
bufferReader.close();
}catch(Exception e){
System.out.println("Error while reading file line by line:" + e.getMessage());
}
}
}
Look at this example, and try to do your own:
import java.io.*;
public class ReadFile {
public static void main(String[] args){
String string = "";
String file = "textFile.txt";
// Reading
try{
InputStream ips = new FileInputStream(file);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
String line;
while ((line = br.readLine()) != null){
System.out.println(line);
string += line + "\n";
}
br.close();
}
catch (Exception e){
System.out.println(e.toString());
}
// Writing
try {
FileWriter fw = new FileWriter (file);
BufferedWriter bw = new BufferedWriter (fw);
PrintWriter fileOut = new PrintWriter (bw);
fileOut.println (string+"\n test of read and write !!");
fileOut.close();
System.out.println("the file " + file + " is created!");
}
catch (Exception e){
System.out.println(e.toString());
}
}
}
Just for fun, here's what I'd probably do in a real project, where I'm already using all my favourite libraries (in this case Guava, formerly known as Google Collections).
String text = Files.toString(new File("textfile.txt"), Charsets.UTF_8);
List<Integer> list = Lists.newArrayList();
for (String s : text.split("\\s")) {
list.add(Integer.valueOf(s));
}
Benefit: Not much own code to maintain (contrast with e.g. this). Edit: Although it is worth noting that in this case tschaible's Scanner solution doesn't have any more code!
Drawback: you obviously may not want to add new library dependencies just for this. (Then again, you'd be silly not to make use of Guava in your projects. ;-)
Use Apache Commons (IO and Lang) for simple/common things like this.
Imports:
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.ArrayUtils;
Code:
String contents = FileUtils.readFileToString(new File("path/to/your/file.txt"));
String[] array = ArrayUtils.toArray(contents.split(" "));
Done.
Using Java 7 to read files with NIO.2
Import these packages:
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
This is the process to read a file:
Path file = Paths.get("C:\\Java\\file.txt");
if(Files.exists(file) && Files.isReadable(file)) {
try {
// File reader
BufferedReader reader = Files.newBufferedReader(file, Charset.defaultCharset());
String line;
// read each line
while((line = reader.readLine()) != null) {
System.out.println(line);
// tokenize each number
StringTokenizer tokenizer = new StringTokenizer(line, " ");
while (tokenizer.hasMoreElements()) {
// parse each integer in file
int element = Integer.parseInt(tokenizer.nextToken());
}
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
To read all lines of a file at once:
Path file = Paths.get("C:\\Java\\file.txt");
List<String> lines = Files.readAllLines(file, StandardCharsets.UTF_8);
All the answers so far given involve reading the file line by line, taking the line in as a String, and then processing the String.
There is no question that this is the easiest approach to understand, and if the file is fairly short (say, tens of thousands of lines), it'll also be acceptable in terms of efficiency. But if the file is long, it's a very inefficient way to do it, for two reasons:
Every character gets processed twice, once in constructing the String, and once in processing it.
The garbage collector will not be your friend if there are lots of lines in the file. You're constructing a new String for each line, and then throwing it away when you move to the next line. The garbage collector will eventually have to dispose of all these String objects that you don't want any more. Someone's got to clean up after you.
If you care about speed, you are much better off reading a block of data and then processing it byte by byte rather than line by line. Every time you come to the end of a number, you add it to the List you're building.
It will come out something like this:
private List<Integer> readIntegers(File file) throws IOException {
List<Integer> result = new ArrayList<>();
RandomAccessFile raf = new RandomAccessFile(file, "r");
byte buf[] = new byte[16 * 1024];
final FileChannel ch = raf.getChannel();
int fileLength = (int) ch.size();
final MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0,
fileLength);
int acc = 0;
while (mb.hasRemaining()) {
int len = Math.min(mb.remaining(), buf.length);
mb.get(buf, 0, len);
for (int i = 0; i < len; i++)
if ((buf[i] >= 48) && (buf[i] <= 57))
acc = acc * 10 + buf[i] - 48;
else {
result.add(acc);
acc = 0;
}
}
ch.close();
raf.close();
return result;
}
The code above assumes that this is ASCII (though it could be easily tweaked for other encodings), and that anything that isn't a digit (in particular, a space or a newline) represents a boundary between digits. It also assumes that the file ends with a non-digit (in practice, that the last line ends with a newline), though, again, it could be tweaked to deal with the case where it doesn't.
It's much, much faster than any of the String-based approaches also given as answers to this question. There is a detailed investigation of a very similar issue in this question. You'll see there that there's the possibility of improving it still further if you want to go down the multi-threaded line.
read the file and then do whatever you want
java8
Files.lines(Paths.get("c://lines.txt")).collect(Collectors.toList());

Categories