I have a zipped CSV file.
I have some quartz job schedular which reads the file. But sometimes the user can click and read the file also. Is it possible that if during the user operation file is open and then the quartz job schedular also starts and it also starts reading the file and as it's a zipped CSV file something may become corrupted.
Special Note : There is no write operation to the file.
You can read the same file in as many threads as you like.
It will only be corrupted if you write it in one thread and try to use it in another at the same time.
There is no problem to read the same file simultaneously from several threads. Just create separate FileInputStream for each thread.
If you have no write operation on your file then you will have no problem in reading it.
You can read it in all your threads.
Problem will come if you try to write it.
Even you can avoid this problem by making your File object Synchronized.
Related
I am trying to consume (stream) a big zip file with Apache Camel. The streaming should begin as soon as the file is being written to. Below is the file consumer code.
rest("/api/request/{Id}/")
.get()
.produces(MediaType.APPLICATION_OCTET_STREAM_VALUE)
.process(new FindFileName)
.pollEnrich().simple("file:" + outputDir + "?fileName=${property.filnavn}&noop=false&readLock=none&delete=true").timeout(pollTimeout)
Claus Ibsen suggested using readLock=none to get the stream.
When I use the option the stream closes right away and I only get the 0 byte file with the correct filename.
How do I configure camel's file endpoint to use readLock=none and consume the file until it is completed?
A seperate route writes the file.
There is no safe way to know when a file is completed written by a 3rd party. What you do there, is that you get a hold of a java.io.File in the poll enrich to the file. Which Camel can convert to a FileInputStream to read from. But that stream has no way of knowing when the 3rd party if finished writing the file.
There its really a bad practice to read files that are currently in progress of being written.
To know when a file is complete written then 3rd parties may use a strategy to
write a 2nd dummy marker file to tell its finished
write a 2nd in-progress dummy file to tell the file is currently being written and delete this file when its finished
write the file using a temporary name and rename when done
write the file in another folder and move when done
monitor the file for modified timestamp and if the timestamp doesnt change after X period then assume its finished written
attempt to rename the file and assuming if the OS fails doing this then the 3rd party is still writing to the file
etc...
The JDK File Lock API does not work acrosss file systems and is generally not very useable to get file locks - it may work from within the same JVM, but not when its 2 different systems.
I have a requirement where in I write a file to server. Another application has a scheduled job which reads the file at a specific interval. The file shouldn't be readable till my write is complete. I have tried using
File.isReadable(false)
But this is not working. And the scheduler is picking up the incomplete data from the file, if I am still writing to it.
Any Solutions?
Write to a different file name and then when the write is complete rename the file to the name the scheduler expects. If you're running on Linux or similar then file renames within the same file system are atomic.
File tempFile = new File("/path/to/file.tmp");
// write to tempFile
tempFile.renameTo(new File("/path/to/file"));
You can use another file with same name as marker. You will start writing into FileName.txt and when is finished, create file FileName.rdy
And your application will check only for *.rdy files, if found - read FileName.txt.
You can use the FileLock API.
I explained briefly how it works here.
the better option would be to synchronize the read and write procedures...
put your code to read file and write file in synchornized {} blocks....such that one can wait till other completes
In Linux when you open an input stream of a file, another process can rename that file. So when a file is rolled over, you still can read from the stream. In Windows, when you open an input stream, that file cannot be renamed until the input stream is closed. How can I read a file without affecting the 'rename' process?
I have tried using java.nio.FileChannel. It works for reading and writing at the same time to a file by different process (E.g Java process reads and notepad writes), but not for renaming a file (E.g. Java process reads but the rename command does not work).
The simplest of solutions would be the following (I am just copying from one of my comments)
After you have read the newly appended lines close the reader, this way the other process trying to do a rotation will succeed. But the process doing log rotation has to try several times, until it sees no other process is reading from it.
The apache commons IO Tailer can do this.
I have a java thread A that continously polls a folder RESULTFOLDER and checks if there are some new files present in it.
Now the files are posted by some other program running on another machine into RESULTFOLDER.Now the files posted are all xml files (only xml).so at any point RESULTFOLDEr can hold only xml files.
Now my thread A continiously polls the RESULTFOLDER and parses the xml files one at a time and then deletes it.
Now sometimes what happenes is that if thread A tries to read and parse the file A at the time the other program is posting the file A .In this case i get exception in parsing file.Saying pre mature end of file.
How can i resolve the problem?
One way i think is to check date time of file creation and ensure that file is presnt at least for 1 minute or so.But i dont think java provides such API.How can i go about solving this problem?
You can write the .xml file to the folder, and then write a separate control file written after that. The control file would have zero bytes, and have a different extension, such as .ctl, but would have the same first part of the name.
When the thread polling the result folder finds the .ctl file, it knows it is safe to open the same-named file with a .xml extension.
This approach has the added benefit that it will work even when the writing task is on another computer.
Have the creating thread call setWritable(true, true) and setReadable(true, true) on the file at creation time. This will prevent non-creating thread from accessing that file when it is being created by the creating thread. After file creation, setWritable(true, false) and setReadable(true, false). The polling thread will need to check write Ability at polling time to ensure that the file should be read from.
Alternatively, you could provide a mutex for the directory. Have the thread that is creating the file acquire the mutex for the directory, create and populate the file, then release the mutex. When the polling thread needs to do its check, grab the mutex, check the directory, process the files, then release the mutex.
Three approaches:
As the file is being written, is has the name foo.tmp. Once it is completely written, it is renamed by the producer to foo.xml. Thus the consumer won't see the XML file until is is completely written by the producer.
(Same answer as #aaaa bbbb).
Once the file foo.xml is completely written, another file is created (which can be empty) named foo.ctl. The consumer does not process the XML file until it sees the CTL file, after which it can remove both.
(Same answer as #tafoo85).
The consumer cannot read the file until is is comlpetely written and made readable by the producer.
These approaches have the added benefit of working correctly even if the producer thread dies while in the middle of writing an incomplete XML file.
I'm working on a small Java application (Java 1.6, Solaris) that will use multiple background threads to monitor a series of text files for output lines that match a particular regex pattern and then make use of those lines. I have one thread per file; they write the lines of interest into a queue and another background thread simply monitors the queue to collect all the lines of interest across the whole collection of files being monitored.
One problem I have is when one of the files I'm monitoring is reopened. Many of the applications that create the files I'm monitoring will simply restart their logfile when they are restarted; they don't append to what's already there.
I need my Java application to detect that the file has been reopened and restart following the file.
How can I best do this?
Could you keep a record of each of the length of each file? When the current length subsequently goes back to zero or is smaller than the last time you recorded the length, you know the file has been restarted by the app?
using a lockfile is a solution as Jurassic mentioned.
Another way is to try and open the file while you're reading the pattern and find if the file has a new size and create time. If the create time is NOT same as when you found it, then you can be sure that it has been recreated.
You could indicate somewhere on the filesystem that indicates you are reading a given file. Suppose next to the file being read (a.txt), you create a file next to it (a.txt.lock) that indicates a.txt is being read. When your process is done with it, a.txt.lock is deleted. Every time a process goes to open a file to read it, it will check for the lock file beforehand. If there is no lockfile, its not being used. I hope that makes sense and answers your question. cheers!