Java: Pause thread and get position in file

Java: Pause thread and get position in file - java

I'm writing an application in Java with multithreading which I want to pause and resume.
The thread is reading a file line by line while finding matching lines to a pattern. It has to continue on the place I paused the thread. To read the file I use a BufferedReader in combination with an InputStreamReader and FileInputStream.
fip = new FileInputStream(new File(*file*));
fileBuffer = new BufferedReader(new InputStreamReader(fip));
I use this FileInputStream because I need the filepointer for the position in the file.
When processing the lines it writes the matching lines to a MySQL database. To use a MySQL-connection between the threads I use a ConnectionPool to make sure just one thread is using one connection.
The problem is when I pause the threads and resume them, a few matching lines just disappear. I also tried to subtract the buffersize from the offset but it still has the same problem.
What is a decent way to solve this problem or what am I doing wrong?
Some more details:
The loop
// Regex engine
RunAutomaton ra = new RunAutomaton(this.conf.getAuto(), true);
lw = new LogWriter();
while((line=fileBuffer.readLine()) != null) {
if(line.length()>0) {
if(ra.run(line)) {
// Write to LogWriter
lw.write(line, this.file.getName());
lw.execute();
}
}
}
// Loop when paused.
while(pause) { }
}
Calculating place in file
// Get the position in the file
public long getFilePosition() throws IOException {
long position = fip.getChannel().position() - bufferSize + fileBuffer.getNextChar();
return position;
}
Putting it into the database
// Get the connector
ConnectionPoolManager cpl = ConnectionPoolManager.getManager();
Connector con = null;
while(con == null)
con = cpl.getConnectionFromPool();
// Insert the query
con.executeUpdate(this.sql.toString());
cpl.returnConnectionToPool(con);

Here's an example of what I believe you're looking for. You didn't show much of your implementation so it's hard to debug what might be causing gaps for you. Note that the position of the FileInputStream is going to be a multiple of 8192 because the BufferedReader is using a buffer of that size. If you want to use multiple threads to read the same file you might find this answer helpful.
public class ReaderThread extends Thread {
private final FileInputStream fip;
private final BufferedReader fileBuffer;
private volatile boolean paused;
public ReaderThread(File file) throws FileNotFoundException {
fip = new FileInputStream(file);
fileBuffer = new BufferedReader(new InputStreamReader(fip));
}
public void setPaused(boolean paused) {
this.paused = paused;
}
public long getFilePos() throws IOException {
return fip.getChannel().position();
}
public void run() {
try {
String line;
while ((line = fileBuffer.readLine()) != null) {
// process your line here
System.out.println(line);
while (paused) {
sleep(10);
}
}
} catch (IOException e) {
// handle I/O errors
} catch (InterruptedException e) {
// handle interrupt
}
}
}

I think the root of the problem is that you shouldn't be subtracting bufferSize. Rather you should be subtracting the number of unread characters in the buffer. And I don't think there's a way to get this.
The easiest solution I can think of is to create a custom subclass of FilterReader that keeps track of the number of characters read. Then stack the streams as follows:
FileReader
< BufferedReader
< custom filter reader
< BufferedReader(sz == 1)
The final BufferedReader is there so that you can use readLine ... but you need to set the buffer size to 1 so that the character count from your filter matches the position that the application has reached.
Alternatively, you could implement your own readLine() method in the custom filter reader.

After a few days searching I found out that indeed subtracting the buffersize and adding the position in the buffer wasn't the right way to do it. The position was never right and I was always missing some lines.
When searching a new way to do my job I didn't count the number of characters because it are just too many characters to count which will decrease my performance a lot. But I've found something else. Software engineer Mark S. Kolich created a class JumpToLine which uses the Apache IO library to jump to a given line. It can also provide the last line it has readed so this is really what I need.
There are some examples on his homepage for those interested.

Related

Unexpected amount of lines when writing to a csv file

A part of my application writes data to a .csv file in the following way:
public class ExampleWriter {
public static final int COUNT = 10_000;
public static final String FILE = "test.csv";
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream(FILE)){
os.write(239);
os.write(187);
os.write(191);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8));
for (int i = 0; i < COUNT; i++) {
writer.write(Integer.toString(i));
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(checkLineCount(COUNT, new File(FILE)));
}
public static String checkLineCount(int expectedLineCount, File file) throws Exception {
BufferedReader expectedReader = new BufferedReader(new FileReader(file));
try {
int lineCount = 0;
while (expectedReader.readLine() != null) {
lineCount++;
}
if (expectedLineCount == lineCount) {
return "correct";
} else {
return "incorrect";
}
}
finally {
expectedReader.close();
}
}
}
The file will be opened in excel and all kind of languages are present in the data. The os.write parts are for prefixing the file with a byte order mark as to enable all kinds of characters.
Somehow the amount of lines in the file do not match the count in the loop and I can not figure out how. Any help on what I am doing wrong here would be greatly appreciated.

You simply need to flush and close your output stream (forcing fsync) before opening the file for input and counting. Try adding:
writer.flush();
writer.close();
inside your try-block. after the for-loop in the main method.

(As a side note).
Note that using a BOM is optional, and (in many cases) reduces the portability of your files (because not all consuming app's are able to handle it well). It does not guarantee that the file has the advertised character encoding. So i would recommend to remove the BOM. When using Excel, just select the file and and choose UTF-8 as encoding.

You are not flushing the stream,Refer oracle docs for more info
which says that
Flushes this output stream and forces any buffered output bytes to be
written out. The general contract of flush is that calling it is an
indication that, if any bytes previously written have been buffered by
the implementation of the output stream, such bytes should immediately
be written to their intended destination. If the intended destination
of this stream is an abstraction provided by the underlying operating
system, for example a file, then flushing the stream guarantees only
that bytes previously written to the stream are passed to the
operating system for writing; it does not guarantee that they are
actually written to a physical device such as a disk drive.
The flush method of OutputStream does nothing.
You need to flush as well as close the stream. There are 2 ways
manually call close() and flush().
use try with resource
As I can see from your code that you have already implemented try with resource and also BufferedReader class also implements Closeable, Flushable so use code as per below
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream(FILE); BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8))){
os.write(239);
os.write(187);
os.write(191);
for (int i = 0; i < COUNT; i++) {
writer.write(Integer.toString(i));
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(checkLineCount(COUNT, new File(FILE)));
}

When COUNT is 1, the code in main() will write a file with two lines, a line with data plus an empty line afterwards. Then you call checkLineCount(COUNT, file) expecting that it will return 1 but it returns 2 because the file has actually two lines.
Therefore if you want the counter to match you must not write a new line after the last line.

(As another side note).
Notice that writing CSV-files the way you are doing is really bad practice. CSV is not so easy as it may look at first sight! So, unless you really know what you are doing (so being aware of all CSV quirks), use a library!

Resume read of huge text file in Java

I am reading a huge text file of words (one word per line) but I have to stop it from time to time to resume the read the next day. Right now I'm using Apache's lineiterator but it's totally the wrong solution. My file is 7Gb and I had to interrupt reading it around at 1Gb. To resume the read I saved the number of line already read. This means that I have an if statement on the while loop. Apache's FileUtils doesn't allow to seek so that was my solution.
What is the best/fastest solution? I thought to use RandomAccessfile to get to the right line and continue reading, but I'm not sure if I can go to the right place AND how do I save the correct place I read last. I can reead again a couple of lines, so the precision is not so important, but I haven't found a way to get the pointer. I have a BufferedReader to read the File and a RandomAccessFile to seek to the right place, but I don't know how to periodically save a position with the BufferedReader.
Any hints?
Code: (note the "SOMETHING" where I should print the value I can use on the seekToByte )
try {
RandomAccessFile rand = new RandomAccessFile(file,"r");
rand.seek(seekToByte);
startAtByte = rand.getFilePointer();
rand.close();
} catch(IOException e) {
// do something
}
// Do it using the BufferedReader
BufferedReader reader = null;
FileReader freader = null;
try {
freader = new FileReader(file);
reader = new BufferedReader(freader);
reader.skip(startAtByte);
long i=0;
for(String line; (line = reader.readLine()) != null; ) {
lines.add(line);
System.out.print(i+" ");
if (lines.size()>1000) {
commit(lines);
System.out.println("");
lines.clear();
System.out.println(SOMETHING?);
}
}
} catch(Exception e) {
// handle this
} finally {
if (reader != null) {
try {reader.close();} catch(Exception ignore) {}
}
}

RandomAccessfile is indeed one way to go. Use
long position = file.getFilePointer();
When you stop reading to save where you are in the file, and then restore with:
file.seek(position);
To resume reading at the same place.
However, be careful when using RandomAccessfile, as its readLine method does not completely support Unicode.

Can you somehow use predetermined offsets, for instance chop the file into four pieces (offset0, offset1) (offset1, offset2)..etc, and use RecursiveAction (ForkJoin API) to take advantage of parallelism.

How would I receive input from console without blocking?

Note: I understand that the console is for debugging and games should use GUI. This is for testing/experience
I'm writing a game that runs at 60fps. Every update, I check to see if the user has entered a String command. If so, it gets passed through, but if not, null is paas through, and the null is ignored.
Scanner is out of the question, since hasNext(), the method used to see if there is data to read, can potentially block and causes problems.
I've tried using BufferedReader.ready(). Not only did I have problems (never returned true), but I've read that it's not recommended for a few reasons.
BufferedReader.available() always returned 0, which in the documentation, it state's that InputStream.available() will always return 0 unless overriden. Here is my attempt:
class Game {
public static void main(String[] args) {
InputReader reader = new InputReader(System.in);
int timePerLoop = 1000/30;
Game game = new Game();
while(true) {
long start = System.nanoTime();
game.update(reader.next());
long end = System.nanoTime();
long sleepTime = timePerLoop + ((start - end) / 10000000);
if(sleepTime > 0)
try {
Thread.sleep(sleepTime);
}catch(InterruptedException e) {
e.printStackTrace();
}
else
Thread.yield();
}
}
public void update(String command) {
if(commands != null) {
//handle command
}
//update game
}
}
InputReader.java
public class InputReader {
private InputStream in;
public InputReader(InputStream stream) {
in = stream;
}
public String next() {
String input = null;
try {
while(in.available > 0) {
if(input == null)
input = "";
input += (char) in.read();
}
}catch(IOException e) {
e.printStackTrace();
}
return input;
}
}
InputStream by itself has the same problem as above. I'm not completely sure what type the object stored in System.in, but using available() yields the same results.
I've tried using the reader() from System.console(), but console() returns null. I've read into the subject, and I am not confused why. This is not the way to do it.
The goal is to check the stream to see if it contains data to read, so I can read the data knowing it won't block.
I do not want to use a separate Thread to handle user input, so please don't recommend or ask why.
The input has to be from the console. No new sockets are to be created in the process. I have read a few topics about it, but none of them clearly states a solution. Is this possible?

As you have said yourself, a custom GUI or an additional thread is the correct way to do this. However in absence of that, have you tried using readLine() for example: String inputR = System.console().readLine();
Some alterations to main():
Replace: InputReader reader = new InputReader(System.in); with:
Console c = System.console();
Replace: game.update(reader.next());
with: game.update(c.readLine());
Edit: This thread could also be helpful: Java: How to get input from System.console()

Heap size issue - Memory management using java

I have the following code in my application which does two things:
Parse the file which has 'n' number of data.
For each data in the file, there will be two web service calls.
public static List<String> parseFile(String fileName) {
List<String> idList = new ArrayList<String>();
try {
BufferedReader cfgFile = new BufferedReader(new FileReader(new File(fileName)));
String line = null;
cfgFile.readLine();
while ((line = cfgFile.readLine()) != null) {
if (!line.trim().equals("")) {
String [] fields = line.split("\\|");
idList.add(fields[0]);
}
}
cfgFile.close();
} catch (IOException e) {
System.out.println(e+" Unexpected File IO Error.");
}
return idList;
}
When i try parse the file having 1 million lines of record, the java process fails after processing certain amount of data. I got java.lang.OutOfMemoryError: Java heap space error. I can partly figure out that the java process stops because of this huge data being provided. Kindly suggest me how to proceed with this huge data.
EDIT: Will this part of code new BufferedReader(new FileReader(new File(fileName))); parse the whole file and gets affected to the size of the file.

The problem you have is you are accumulating all the data on the list. The best way to approach this is to do it on a streaming fashion. This means do not accumulate all the ids on the list, but call your web service on each row or accumulate a smaller buffer and then do the call.
Opening the file and creating the BufferedReader will have no impact on memory consumption, as the bytes from the file will be read (more or less) line by line. The problem is at this point in the code idList.add(fields[0]);, the list will grow as large as the file as you keep accumulating all of the file data into it.
Your code should do something like this:
while ((line = cfgFile.readLine()) != null) {
if (!line.trim().equals("")) {
String [] fields = line.split("\\|");
callToRemoteWebService(fields[0]);
}
}

Increase your java heap memory size using the -Xms and -Xmx options. If not set explicitly, the jvm sets the heap size to the ergonomic defaults which in your case is not enough. Read this paper to find out more about tuning the memory in jvm: http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf
EDIT: Alternative way on doing this in a producer-consumer way to exploit parallel processing. The general idea is to create a producer thread that reads the file and queues tasks for processing and n consumer threads that consume them. A very general idea (for illustrative purposes) is the following:
// blocking queue holding the tasks to be executed
final SynchronousQueue<Callable<String[]> queue = // ...
// reads the file and submit tasks for processing
final Runnable producer = new Runnable() {
public void run() {
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(new File(fileName)));
String line = null;
while ((line = file.readLine()) != null) {
if (!line.trim().equals("")) {
String[] fields = line.split("\\|");
// this will block if there are not available consumer threads to process it...
queue.put(new Callable<Void>() {
public Void call() {
process(fields);
}
});
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt());
} finally {
// close the buffered reader here...
}
}
}
// Consumes the tasks submitted from the producer. Consumers can be pooled
// for parallel processing.
final Runnable consumer = new Runnable() {
public void run() {
try {
while (true) {
// this method blocks if there are no items left for processing in the queue...
Callable<Void> task = queue.take();
taks.call();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
Of course you have to write code that manages the lifecycle of the consumer and producer threads. The right way to do this would be by implementing it using an Executor.

When you want to work with big data, you have 2 choices:
use a big enough heap to fit all the data. this will "work" for a while, but if your data size is unbounded, it will eventually fail.
work with the data incrementally. only keep part of the data (of a bounded size) in memory at any one time. this is the ideal solution as it will scale to any amount of data.

Why do I get exception error while trying to reset Reader to 0 position?

I'm trying to read a webpage using following code :
URL url = new URL("somewebsitecomeshere");
URLConnection c = url.openConnection();
if(getHttpResponseCode(c) == 200)
{
if (isContentValid(c))//accept html/xml only!
{
InputStream is = c.getInputStream();
Reader r = new InputStreamReader(is);
System.out.println(r.toString());
//after commenting this everything works great!
setHTMLString(getStringFromReader(r));
System.out.println(getHTMLString());
ParserDelegator parser = new ParserDelegator();
parser.parse(r, new Parser(url), true);
r.close();
is.close();
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
else
log("content is not valid!");
}
else
{
System.out.println("ERROR" + c.getContentType() + c.getURL());
}
//---------------------------------------------------
private String getStringFromReader(Reader reader) throws IOException {
char[] arr = new char[8*1024]; // 8K at a time
StringBuffer buf = new StringBuffer();
int numChars;
while ((numChars = reader.read(arr, 0, arr.length)) > 0) {
buf.append(arr, 0, numChars);
}
//Reset position to 0
reader.reset();
return buf.toString();
}
if try to read string using getStringFromReader() the rest of the code will be ignored due to changing position of Reader to EOF so I tried to reset the position to 0 but I got the following error :
java.io.IOException: reset() not supported
at java.io.Reader.reset(Unknown Source)
at sample.getStringFromReader(Spider.java:248)
at default(sample.java:286)
at default.main(sample.java:130)
How can I reset the Reader position to 0?

Short answer, your stream doesn't support reset or mark methods. Check the result of:
is.markSupported()
Long answer, an InputStream is a flow of bytes. Bytes can come from a file, a network resource, a string, etc. So basically, there are streams that don't support resetting the reader position to the start of the stream, while others do (random access file).
A stream from a web site will normally use underlying network connection to provide the data. It means that it's up to the underlying network protocol (TCP/IP for example) to support or not resetting the stream, and normally they don't.
In order to reset any stream you would have to know the entire flow, from start to end. Network communications send a bunch of packages (which may be in order or not) to transfer data. Packages may get lost or even be duplicated, so normally information is buffered and interpreted as it is received. It would be very expensive to reconstruct all messages at network level. So that is normally up to the receiver, if it wants to do that.
In your case If what you want is print the input stream I would recommend creating a custom InputStream, which receives the original InputStream and whenever it is read it prints the read value and returns it at the same time. For example:
class MyInputStream extends InputStream {
InputStream original = null;
public MyInputStream(InputStream original) {
this.original = original;
}
#Override
public int read() throws IOException {
int c = original.read();
System.out.printf("%c", c);
return c;
}
}
Then wrap your original InputStream with that:
.
.
.
InputStream myIs = new MyInputStream(is);
Reader r = new InputStreamReader(myIs);
.
.
.
Hope it helps.

InputStreamReader does not support reset(). Also, you did not call mark(0) before.
What you could do is wrap your reader in a BufferedReader of a sufficient size so that reset is supported. If you cannot do that, then you should try to open a new connection to your URL.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Pause thread and get position in file - java

Related

Unexpected amount of lines when writing to a csv file

Resume read of huge text file in Java

How would I receive input from console without blocking?

Heap size issue - Memory management using java

Why do I get exception error while trying to reset Reader to 0 position?

Categories

Resources