Java reading long text file is very slow - java

I have a text file (XML created with XStream) which is 63000 lines (3.5 MB) long. I'm trying to read it using Buffered reader:
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String s = "";
String tempString;
int i = 0;
while ((tempString = br.readLine()) != null) {
s = s.concat(tempString);
// s=s+tempString;
i = i + 1;
if (i % 1000 == 0) {
System.out.println(Integer.toString(i));
}
}
br.close();
Here you can see my attempts to measure reading speed. And it's very low. It takes seconds to read 1000 lines after 10000 line. I'm clearly doing something wrong, but can't understand what. Thanks in advance for your help.

#PaulGrime is right. You are copying the string each time the loop reads a line. Once the string gets big (say 10,000 lines big), it is doing a lot of work to do that copying.
Try this:
StringBuilder sb = new StringBuilder();
while (...reading lines..){
....
sb.append(tempString); //should add newline
...
}
s = sb.toString();
Note: read Paul's answer below on why stripping newlines makes this a bad way to read in a file. Also, as mentioned in the question comments, XStream provides a way to read the file and even if it had not, IOUtils.toString(reader) would be a safer way to read a file.

Some immediate improvements you can do:
Use a StringBuilder instead of concat and +. Using + and concat can really affect the performance, specially when used in loops.
Reduce the access to the disk. You can do it by using a large buffer:
BufferedReader br = new BufferedReader(new FileReader("someFile.txt"), SIZE);

You should use a StringBuilder as String concatenation is extremely slow for even small strings.
Further, try using NIO rather than a BufferedReader.
public static void main(String[] args) throws IOException {
final File file = //some file
try (final FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel()) {
final StringBuilder stringBuilder = new StringBuilder();
final ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
final CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
while (fileChannel.read(byteBuffer) > 0) {
byteBuffer.flip();
stringBuilder.append(charsetDecoder.decode(byteBuffer));
byteBuffer.clear();
}
}
}
You can tune the buffer size if it's still too slow - it's heavily system dependent what buffer size works better. For me it makes very little difference if the buffer is 1K or 4K but on other systems I have know that change to increase speed by an order of magnitude.

In addition to what has already been said, depending on your use of the XML, your code is potentially incorrect as it discards line endings. For example, this code:
package temp.stackoverflow.q15849706;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import com.thoughtworks.xstream.XStream;
public class ReadXmlLines {
public String read1(BufferedReader br) throws IOException {
try {
String s = "";
String tempString;
int i = 0;
while ((tempString = br.readLine()) != null) {
s = s.concat(tempString);
// s=s+tempString;
i = i + 1;
if (i % 1000 == 0) {
System.out.println(Integer.toString(i));
}
}
return s;
} finally {
br.close();
}
}
public static void main(String[] args) throws IOException {
ReadXmlLines r = new ReadXmlLines();
URL url = ReadXmlLines.class.getResource("xml.xml");
String xmlStr = r.read1(new BufferedReader(new InputStreamReader(url
.openStream())));
Object ob = null;
XStream xs = new XStream();
xs.alias("root", Root.class);
// This is incorrectly read/parsed, as the line endings are not
// preserved.
System.out.println("----------1");
System.out.println(xmlStr);
ob = xs.fromXML(xmlStr);
System.out.println(ob);
// This is correctly read/parsed, when passing in the URL directly
ob = xs.fromXML(url);
System.out.println("----------2");
System.out.println(ob);
// This is correctly read/parsed, when passing in the InputStream
// directly
ob = xs.fromXML(url.openStream());
System.out.println("----------3");
System.out.println(ob);
}
public static class Root {
public String script;
public String toString() {
return script;
}
}
}
and this xml.xml file on the classpath (in the same package as the class):
<root>
<script>
<![CDATA[
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
]]>
</script>
</root>
produces the following output. The first two lines shows the line endings have been removed, and thus made the Javascript in the CDATA section invalid (as the first JS comment now comments out the whole JS, because the JS lines have been merged).
----------1
<root> <script><![CDATA[// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then { return 1; }else { return 0; }}]]> </script></root>
// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then { return 1; }else { return 0; }}
----------2
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
...

Related

The process cannot access the file because it is being used by another process (buffer read and try with resource not work)

Hi i try to implement csv edit method using https://stackoverflow.com/a/1377322 as example. everything work except when i try to Files.copy/move/delete it i get this "The process cannot access the file because it is being used by another process" error. i also use (try with resource) and thread implement it still don't work. this error is only base on bufferreader file so may i ask what seem to be the problem.
protected static void edit_csv_data(String columname, String new_data) throws IOException {
try (FileReader fr = new FileReader(accountfile); BufferedReader reader = new BufferedReader(fr)) {
File tempFile = new File(tempfile);
BufferedWriter writer = new BufferedWriter(new FileWriter(tempFile));
String currentLine;
int index = 10;
int line = 0;
while ((currentLine = reader.readLine()) != null) {
if (line == Userprofile.getUserline()) {
currentLine.trim();
String[] data = currentLine.split(",");
for (int x = 0; x < data.length; x++) {
if (index == x) {
writer.write(new_data);
} else writer.write(data[x]);
if (x < data.length - 1) writer.write(",");
}
writer.write(System.getProperty("line.separator"));
continue;
}else {
writer.write(currentLine + System.getProperty("line.separator"));
}
line += 1;
}
writer.flush();
writer.close();
reader.close();
System.err.println("write fin");
return;
}
I also try resource monitor in window 11 .the process java.exe and I also cant delete file when java is running or even after error was thrown and I can edit accountfile after close java or leave program open long enough i can delete file even java is run so i believe BufferReader is Stuck and not close properly.
CSV File
'''
Username,Password,Account_Type,Name,Surname,ID,Email,Picture_name,Ban_status,Attemp_Login_during_Baned,Last_Login
admin,1234,admin,Chicken,Little,62001,ASD.l#hotmail.com,picture.jpg,false,0,132
'''
protected static void update_user_data(String columname, String new_data) throws IOException {
DataEdit.edit_csv_data(columname,new_data);
System.err.println(Files.isWritable(Path.of(accountfile)));
Files.copy(Path.of(tempfile), Path.of(accountfile_back),StandardCopyOption.REPLACE_EXISTING);
Files.copy(Path.of(accountfile), Path.of(accountfile_back), StandardCopyOption.REPLACE_EXISTING);
Files.copy(Path.of(tempfile), Path.of(accountfile),StandardCopyOption.REPLACE_EXISTING);
return;
}
Error
java.nio.file.FileSystemException: src\main\java\allaccount\data\Account.csv: The process cannot access the file because it is being used by another process
FYI I use Intellij IDE,maven for build,java 17
The code contains only one serious bug, continue causes the replacement of all lines after Userprofile.getUserline().
And String objects are immutable. So you need to do s = s.trim() as trim() delivers a new String value not altering the original string.
Furthermore I used Files with its utility functions.
protected static void editCsvData(String columname, String newData) throws IOException {
Path accountPath = accountfile.toPath();
// When File. Or when String: Paths.get(accountfile).
Path tempPath = tempFile.toPath();
try (BufferedReader reader = Files.newBufferedReader(accountPath,
Charset.defaultCharset());
BuffedWriter writer = Files.newBufferedWriter(tempPath,
Charset.defaultCharset())) {
String currentLine;
int index = 10;
int line = 0;
while ((currentLine = reader.readLine()) != null) {
if (line == Userprofile.getUserline()) {
currentine = currentLine.trim();
String[] data = currentLine.split(",");
for (int x = 0; x < data.length; x++) {
if (x > 0) {
writer.write(",");
}
writer.write(index == x ? newData : data[x]);
}
} else {
writer.write(currentLine);
}
writer.write(System.getProperty("line.separator"));
++line;
}
System.err.println("write fin");
}
}
What concerns the error: there are two files here: accountFile and tempFile. Either one could not have been closed - kept in use - and cause the mentioned error.

how can I write a code to read each name in a file

I typed 3 names in the file, and I wanted to write a code to count how many times each name was repeated (Example: Alex was repeated in the file 3 times..and so on). The code I wrote only counted each name once, and this is wrong because the names were repeated more than once. Can you help me with the part that could be the cause of this problem?
public class MainClass {
public static void readFile() throws IOException {
//File file;
FileWriter writer=null;
String name, line;
List <String> list = new ArrayList <>();
int countM = 0, countAl = 0, countAh = 0;
try
{
File file = new File("\\Users\\Admin\\Desktop\\namesList.txt");
Scanner scan = new Scanner(file);
while(scan.hasNextLine()) {
line = scan.nextLine();
list.add(line);
}
for (int i=0; i<list.size(); i++)
{
name=list.get(i);
if (name.equals("Ali"))
{
countAl= +1;
}
if (name.equals("Ahmed"))
{
countAh= +1;
}
if (name.equals("Muhammad"))
{
countM = +1;
}
}
Collections.sort(list);
writer = new FileWriter("\\Users\\Admin\\Desktop\\newNameList");
for(int i=0; i<list.size(); i++)
{
name = list.get(i);
writer.write(name +"\n");
}
writer.close();
System.out.println("How many times is the name (Ali) in the file? " + countAl);
System.out.println("How many times is the name (Ahmed) in the file? " + countAh);
System.out.println("How many times is the name (Muhammad) in the file? " + countM);
}
catch(IOException e) {
System.out.println(e.toString());
}
}
public static void main(String[] args) throws IOException {
readFile();
}
}
You an do this much simpler:
//Open a reader, this is autoclosed so you don't need to worry about closing it
try (BufferedReader reader = new BufferedReader(new FileReader("path to file"))) {
//Create a map to hold the counts
Map<String, Integer> nameCountMap = new HashMap<>();
//read all of the names, this assumes 1 name per line
for (String name = reader.readLine(); name != null; name = reader.readLine()) {
//merge the value into the count map
nameCountMap.merge(name, 1, (o, n) -> o+n);
}
//Print out the map
System.out.println(nameCountMap);
} catch (IOException e) {
e.printStackTrace();
}
try:
for (int i=0; i<list.size(); i++)
{
name=list.get(i);
if (name.equals("Ali"))
{
countAl += 1;
}
if (name.equals("Ahmed"))
{
countAh += 1;
}
if (name.equals("Muhammad"))
{
countM += 1;
}
}
This works with me.
+= is not same =+
You need to process each line bearing in mind that the file may be very large in some cases. Better safe than sorry. You need to consider a solution that does not take up so much resources.
Streaming Through the File
I'm going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(file_path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory:
Streaming With Apache Commons IO
The same can be achieved using the Commons IO library as well, by using the custom LineIterator provided by the library:
LineIterator it = FileUtils.lineIterator(your_file, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
LineIterator.closeQuietly(it);
}
Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers.
BufferedReader
try (BufferedReader br = Files.newBufferedReader(Paths.get("file_name"), StandardCharsets.UTF_8)) {
for (String line = null; (line = br.readLine()) != null;) {
// Do something with the line
}
}
ByteBuffer
try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
ByteBuffer bb = ByteBuffer.allocateDirect(1000);
for(;;) {
StringBuilder line = new StringBuilder();
int n = ch.read(bb);
// Do something with the line
}
}
The above examples will process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

Reading large log files in real time in Java

What can I use to read log file in real time in Java 8?
I read blogs to understand BufferedReader is a good option for reading fine.
I tried below:
BufferedReader reader = new
BufferedReader(new
InputStreamReader(inputStream));
String line;
while(true) {
line = reader.readLine(); // blocks until next line
available
// do whatever You want with line
}
However it keeps printing null irrespective of file is updated or not. Any idea what can be going wrong.
Any other options?
Details are as below :
I am trying to create an utility in Java 8 or above, where I need to read log file of an application at real time (as live transactions are occurring and getting printed in logs).
I can access log file as I am on sme server, so that is not an issue.
So some of the specifics are below
-> I don't want to poll the log files for Changes, I want to keep it the bridge open to read log file in "while true" loop. So ideally i want to block my reader if there are no new lines getting printed.
-> I don't want to store the entire content of the file in memory at all time as I want it to be memory efficient.
-> my code will run as a separate application to read log file of another application.
-> only job of my code is to read log, match against a pattern, if matched then send a message with log content.
Kindly let me know if any detail is ambiguous.
Any help is appericiated, thanks.
For this to work, your inputStream must block until new data becomes available, which a standard FileInputStream does not when reaching the end-of-file.
I suppose, you initialize inputStream to just new FileInputStream("my-logfile.log");. This stream will only read to the current end of the log file and signal the "end of file" condition to the BufferedReader. This in turn will signal "end of file" by returning null from readLine().
Have a look at the utility org.apache.commons.io.input.Tailer. This allows to write programs like the Unix utility tail -f.
To make your code work, you would have to use an "infinite" input stream that could be realized using a RandomAccessFile as in the following example:
package test;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.io.RandomAccessFile;
import java.nio.file.Files;
import java.nio.file.StandardOpenOption;
public class TestRead {
public static void main(String[] args) throws IOException, InterruptedException {
File logFile = new File("my-log.log");
// Make sure to start form a defined condition.
logFile.delete();
try (OutputStream out = Files.newOutputStream(logFile.toPath(), StandardOpenOption.CREATE)) {
// Just create an empty file to append later on.
}
Thread analyzer = Thread.currentThread();
// Simulate log file writing.
new Thread() {
#Override
public void run() {
try {
for (int n = 0; n < 16; n++) {
try (OutputStream out = Files.newOutputStream(logFile.toPath(), StandardOpenOption.APPEND)) {
PrintWriter printer = new PrintWriter(out);
String line = "Line " + n;
printer.println(line);
printer.flush();
System.out.println("wrote: " + line);
}
Thread.sleep(1000);
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
analyzer.interrupt();
}
}
}.start();
// The original code reading the log file.
try (InputStream inputStream = new InfiniteInputStream(logFile);) {
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream), 8);
String line;
while (true) {
line = reader.readLine();
if (line == null) {
System.out.println("End-of-file.");
break;
}
System.out.println("read: " + line);
}
}
}
public static class InfiniteInputStream extends InputStream {
private final RandomAccessFile _in;
public InfiniteInputStream(File file) throws IOException {
_in = new RandomAccessFile(file, "r");
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
int c = read();
if (c == -1) {
return -1;
}
b[off] = (byte)c;
int i = 1;
try {
for (; i < len ; i++) {
c = readDirect();
if (c == -1) {
break;
}
b[off + i] = (byte)c;
}
} catch (IOException ee) {
}
return i;
}
#Override
public int read() throws IOException {
int result;
while ((result = readDirect()) < 0) {
// Poll until more data becomes available.
try {
Thread.sleep(500);
} catch (InterruptedException ex) {
return -1;
}
}
return result;
}
private int readDirect() throws IOException {
return _in.read();
}
}
}

Input/Output Runtime Errors

I have three programs to write for my Object Oriented Programming course, all involving file input/output, each of which contain no compile errors, yet they do not do what they are supposed to in run time (they don't print to the outFile like they're supposed to).
I know that the input file is being read and saved in the correct location, because Eclipse would indicate if either of these was not the case.
Furthermore, I have not (to my knowledge) committed any of the common errors involving not including throws exceptions of closing the read/write files.
I am attaching the first of my i/o assignments here with the hopes that the other files have similar errors that I can fix as soon as I can figure out what's wrong with this one.
import java.io.*;
public class GreenK4_Lab8 {
public static void main(String[] args) throws IOException {
int[] numbers = new int[countLines()];
int i = 0;
for(i = 0; i < numbers.length; i++) {
numbers[i] = readValues(i);
}
printOdd(numbers);
}
public static int countLines() throws IOException {
BufferedReader inFile = new BufferedReader(
new FileReader( "Lab8_TestFile.txt" ) );
int lineNumber = 1;
String nextLine = inFile.readLine();
while( nextLine != null ) {
lineNumber ++;
}
inFile.close();
return lineNumber;
}
public static int readValues(int number) throws IOException {
BufferedReader inFile = new BufferedReader(
new FileReader( "Lab8_TestFile.txt" ) );
int value = 0;
for(int i = 0; i < number; i++) {
String nextLine = inFile.readLine();
value = Integer.parseInt( nextLine );
}
inFile.close();
return value;
}
public static void printOdd(int[] array) throws IOException {
PrintWriter outFile = new PrintWriter( "results.out" );
for(int i = 0; i < array.length; i++) {
int value = array[i];
if( value % 2 != 0)
outFile.println( value );
}
outFile.close();
}
}
The following are the contents of the Lab8_TestFile.txt
4
6
2
10
8
1
-1
-2147483648
2147483647
5
9
3
7
-7
As other commenters pointed out, change your code in countLines function from
String nextLine = inFile.readLine();
while( nextLine != null ) {
lineNumber ++;
}
to
while (inFile.readLine() != null) {
lineNumber ++;
}
With this change your program works as expected.
There are multiple things wrong with your code. Let´s start from the beginning: your countLines method does not work as intended and will create a infinite loop because your while-condition will never be evaluated to false (unless your file is empty):
// String nextLine = inFile.readLine();
// while(nextLine != null) {
while (inFile.readLine() != null) {
lineNumber++;
}
You may want to check Number of lines in a file in Java for a faster and better performing version of retrieving the line count of a file.
Additionally your readValues function opens the file for every line it wants to read, reads the file until that line and closes the file again -> BAD. What you should do instead is the following:
public static void readValues(int[] contentsOfFile) throws IOException {
BufferedReader inFile = new BufferedReader(new FileReader("Lab8_TestFile.txt"));
for(int i = 0; i < contentsOfFile.length; i++) {
String nextLine = inFile.readLine();
contentsOfFile[i] = Integer.parseInt( nextLine );
}
inFile.close();
}
However that is not pretty as well since you rely on a adequately sized int array to be passed in. If you still want to get the line count separately from reading the values, do so, but let the readValues handle the appropriate reading by itself. That could result in something like:
public static ArrayList<Integer> readValues() throws IOException {
BufferedReader inFile = new BufferedReader(new FileReader("Lab8_TestFile.txt"));
ArrayList<Integer> integerContents = new ArrayList<>();
String nextLine = null;
while ((nextLine = inFile.readLine()) != null) {
integerContents.add(Integer.parseInt(nextLine));
}
inFile.close();
return integerContents;
}
That way you parse the file only once for reading the values. If you need to get a int[] back, take a look at How to convert an ArrayList containing Integers to primitive int array? to get an idea on how to extract that from the given data structure.
Your main function might result in something like:
public static void main(String[] args) throws IOException {
int numberOfLines = countLines(); // technically no longer needed.
int[] intContents = convertIntegers(readValues());
printOdd(intContents);
}

Java: Read up to x chars from a file into array

I want to read a text file and store its contents in an array where each element of the array holds up to 500 characters from the file (i.e. keep reading 500 characters at a time until there are no more characters to read).
I'm having trouble doing this because I'm having trouble understanding the difference between all of the different ways to do IO in Java and I can't find any that performs the task I want.
And will I need to use an array list since I don't initially know how many items are in the array?
It would be hard to avoid using ArrayList or something similar. If you know the file is ASCII, you could do
int partSize = 500;
File f = new File("file.txt");
String[] parts = new String[(f.length() + partSize - 1) / partSize];
But if the file uses a variable-width encoding like UTF-8, this won't work. This code will do the job.
static String[] readFileInParts(String fname) throws IOException {
int partSize = 500;
FileReader fr = new FileReader(fname);
List<String> parts = new ArrayList<String>();
char[] buf = new char[partSize];
int pos = 0;
for (;;) {
int nRead = fr.read(buf, pos, partSize - pos);
if (nRead == -1) {
if (pos > 0)
parts.add(new String(buf, 0, pos));
break;
}
pos += nRead;
if (pos == partSize) {
parts.add(new String(buf));
pos = 0;
}
}
return parts.toArray(new String[parts.size()]);
}
Note that FileReader uses the platform default encoding. To specify a specific encoding, replace it with new InputStreamReader(new FileInputStream(fname), charSet). It bit ugly, but that's the best way to do it.
An ArrayList will definitely be more suitable as you don't know how many elements you're going to have.
There are many ways to read a file, but as you want to keep the count of characters to get 500 of them, you could use the read() method of the Reader object that will read character by character. Once you collected the 500 characters you need (in a String I guess), just add it to your ArrayList (all of that in a loop of course).
The Reader object needs to be initialized with an object that extends Reader, like an InputStreamReader (this one take an implementation of an InputStream as parameter, a FileInputStream when working with a file as input).
Not sure if this will work, but you might want to try something like this (Caution: untested code):
private void doStuff() {
ArrayList<String> stringList = new ArrayList<String>();
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("file.txt"));
String str;
int count = 0;
while ((str = in.readLine()) != null) {
String temp = "";
for (int i = 0; i <= str.length(); i++) {
temp += str.charAt(i);
count++;
if(count>500) {
stringList.add(temp);
temp = "";
count = 0;
}
}
if(count>500) {
stringList.add(temp);
temp = "";
count = 0;
}
}
} catch (IOException e) {
// handle
} finally {
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Categories