I want to read a file incrementally in java while the file is being modified/written by some other process. So suppose Process "A" is writing/logging a file "X" and another process "B" wants to incrementally read the file "X", say every 1 sec (or even continuously) to find a particular pattern. What's the best way to do this in java? I know I can use RandomAccessFile's 'seek' method but will that interfere with the writing of the file? Is there a better way to do this?
Poll the data modified of the file. Opening it up for reading can prevent other programs from writing to the file at the same time.
If you're able to use Java 7, you could take advantage of the WatchService ... but it doesn't solve having to parse the whole file.
The only thing I can think off is maintaining some kind of "marker" that would indicate the last position you were up to. The next time you came to read the file, you could skip to this point and read from there (updating the marker when you're done)
Related
The current documentation for StandardOpenOption.Append says:
If the file is opened for WRITE access then bytes will be written to the end of the file rather than the beginning.````
However, I can't seem to find any further information on how this works internally.
My use case involves appending data to a huge file. I currently use BufferedWriter, but my understanding is that, if I have some way to maintain a pointer to the end of the file, I can easily append to it, without first traversing from start of file, till end of file.
So, my question is: Does StandardOpenOption.Append actually work in a similar method? Or does this also, internally, move to end of file and perform the append?
I was writing a program that implements a dictionary.
Actually what I did is just to write a java applet to show the words which is defined in a .xml file. And I did that with the org.w3c.dom package.
Now, I want to add a new feature that users can modify a word in the dictionary in the the program then the modification will be saved to the original .xml file.
Here is my question: what should I do to save the changes? Note that users can only modify one word a time so I don't want to load the whole file and modify the certain part and re-write the whole file to the disk. Is there a novel way to do that?
An XML file is a sequential text file. This means that there is no formula or other convenient way to locate the n-th word in a dictionary stored in XML. Elements need to be written one after the other, character by character (and one character may or may not result in a byte). Thus, what is called a random update, is out.
Look at JAXB for a most convenient way to read and write XML, and invest some work so that a user cannot update in memory and terminate the program without saving.
Reading and writing files in specific formats is a little bit trickier that what you portray.
Seen with "XML eyes" you are only changing a portion of the file - but to do that on the file level you need to seek to the position of change and write new bytes from there. The problem with that is that the content after that position won't adjust according to the new portion you write.
TL;DR - no - you need to read+write the complete XML file when making changes.
I have a need for my application to be able to read large (very large, 100GB+) text files and process the content in these files potentially at different times. For instance, it might run for an hour and finish processing a few GBs, and then I shut it down and come back to it a few days later to resume processsing the same file.
To do this I will need to read the files into memory-friendly chunks; each chunk/page/block/etc will be read in, one at a time, processed, before then next chunk is read into memory.
I need the program to be able to mark where it is inside the input file, so if it shuts down, or if I need to "replay" the last chunk being processed, I can jump right to the point in the file where I am and continue processing. Specifically, I need to be able to do the following things:
When the processing begings, scan a file for a "MARKER" (some marker that indicates where we left off processing the last time)
If the MARKER exists, jump to it and begin processing from that point
Else, if the MARKER doesn't exist, then place a MARKER after the first chunk (for now, let's say that a "chunk" is just a line-of-text, as BufferedReader#readLine() would read in) and begin processing the first chunk/line
For each chunk/line processed, move the MARKER after the next chunk (thus, advancing the MARKER further down the file)
If we reach a point where there are no more chunks/lines after the current MARKER, we've finished processing the file
I tried coding this up myself and notice that BufferedReader has some interesting methods on it that sounds like they're suited for this very purpose: mark(), reset(), etc. But the Javadocs on them are kind of vague and I'm not sure that these "File Marker" methods will accomplish all the things I need to be able to do. I'm also completely open to a 3rd party JAR/lib that has this capability built into it, but Google didn't turn anything up.
Any ideas here?
Forget about markers. You cannot "insert" text without rewritting the whole file.
Use a RandomAccessFile and store the current position you are reading. When you need to open again the file, just use seek to find the position.
A Reader's "mark" is not persistent; it solely forms part of the state of the Reader itself.
I suggest that you not store the state information in the text file itself; instead, have a file alongside which stores the byte offset of the most recently processed chunk. That'll eliminate the obvious problems involving overwriting data in the original text file.
The marker of the buffered reader is not persisted after different runs of your application. I would neither change the content of that huge file to mark a position, since this can lead to significant IO and/or filesystem fragmentation, depending on your OS.
I would use a properties file to store the configuration of the program externally. Have a look at the documentation, the API is straight forward:
http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html
I have a set of text files providing informations that are parsed, analysed and allow building a model. Sometime, the user of this model wants to know which part of a text file was used to generate a given model item.
For that I am thinking of keeping track of the range of lines (or bytes) ids to be able to read the appropriate text part once required.
My question is: I wonder if it their exists any java Reader able to read a file by using a start and stop line (or byte) id instead of reading the file from the begining and counting the lines (bytes)?
Best regards
If you know exactly amount of bytes, that should be skipped, you can use seek method method of RandomAccessFile
To read from the certain byte - SeekableByteChannel. Of cause, there aren't any Readers able to start from the line id - because positions of line separators are unknown.
You can use InputStream.mark() and InputStream.skip() to navigate to concrete position into the file.
But are you sure you really have to implement this yourself? Take a look on Lucine - the indexing service that probably will help you.
Now I'm making a little program in Java which must read a really big file. Due to this thing, I want to access to the file but not read completely each time, then my question is the following: can I change the offset of the file descriptor with a simple instruction or the only solution that I have is read all the previous lines which I don't need?
In other words, can I simulate the lseek command in my input file?
I think it's not necessary this time, but if someone wants code, I'll post it.
Regards!
I think you probably want RandomAccessFile.
Specifically, you want the seek(long) method.