I am trying to consume (stream) a big zip file with Apache Camel. The streaming should begin as soon as the file is being written to. Below is the file consumer code.
rest("/api/request/{Id}/")
.get()
.produces(MediaType.APPLICATION_OCTET_STREAM_VALUE)
.process(new FindFileName)
.pollEnrich().simple("file:" + outputDir + "?fileName=${property.filnavn}&noop=false&readLock=none&delete=true").timeout(pollTimeout)
Claus Ibsen suggested using readLock=none to get the stream.
When I use the option the stream closes right away and I only get the 0 byte file with the correct filename.
How do I configure camel's file endpoint to use readLock=none and consume the file until it is completed?
A seperate route writes the file.
There is no safe way to know when a file is completed written by a 3rd party. What you do there, is that you get a hold of a java.io.File in the poll enrich to the file. Which Camel can convert to a FileInputStream to read from. But that stream has no way of knowing when the 3rd party if finished writing the file.
There its really a bad practice to read files that are currently in progress of being written.
To know when a file is complete written then 3rd parties may use a strategy to
write a 2nd dummy marker file to tell its finished
write a 2nd in-progress dummy file to tell the file is currently being written and delete this file when its finished
write the file using a temporary name and rename when done
write the file in another folder and move when done
monitor the file for modified timestamp and if the timestamp doesnt change after X period then assume its finished written
attempt to rename the file and assuming if the OS fails doing this then the 3rd party is still writing to the file
etc...
The JDK File Lock API does not work acrosss file systems and is generally not very useable to get file locks - it may work from within the same JVM, but not when its 2 different systems.
Related
I have a requirement where in I write a file to server. Another application has a scheduled job which reads the file at a specific interval. The file shouldn't be readable till my write is complete. I have tried using
File.isReadable(false)
But this is not working. And the scheduler is picking up the incomplete data from the file, if I am still writing to it.
Any Solutions?
Write to a different file name and then when the write is complete rename the file to the name the scheduler expects. If you're running on Linux or similar then file renames within the same file system are atomic.
File tempFile = new File("/path/to/file.tmp");
// write to tempFile
tempFile.renameTo(new File("/path/to/file"));
You can use another file with same name as marker. You will start writing into FileName.txt and when is finished, create file FileName.rdy
And your application will check only for *.rdy files, if found - read FileName.txt.
You can use the FileLock API.
I explained briefly how it works here.
the better option would be to synchronize the read and write procedures...
put your code to read file and write file in synchornized {} blocks....such that one can wait till other completes
In Linux when you open an input stream of a file, another process can rename that file. So when a file is rolled over, you still can read from the stream. In Windows, when you open an input stream, that file cannot be renamed until the input stream is closed. How can I read a file without affecting the 'rename' process?
I have tried using java.nio.FileChannel. It works for reading and writing at the same time to a file by different process (E.g Java process reads and notepad writes), but not for renaming a file (E.g. Java process reads but the rename command does not work).
The simplest of solutions would be the following (I am just copying from one of my comments)
After you have read the newly appended lines close the reader, this way the other process trying to do a rotation will succeed. But the process doing log rotation has to try several times, until it sees no other process is reading from it.
The apache commons IO Tailer can do this.
I have a file scanner application in Java, that keeps scanning a directory on a server using FTP. gets list of files of the directory and downloads them one by one. on the other side, on the server, there's a process that writes these files. if I'm lucky I wouldn't try to download an incomplete file but how can I make sure if the write process on the server is complete and the file handle is closed, and file is ready to be downloaded?
I have no control on the write process which is on the server. moreover, I don't have write permission on the directory to try to get a write-handle in order to check if there's already a write handle open, so this option is off the table.
Is there an FTP function addressing this problem?
This is a very old and well-known problem.
There is no way to be absolutely certain a file being written by the FTP daemon is complete. It's even possible that the file transfer failed and then gets restarted and completed. You must poll the file's size and set a time limit, say 5 minutes. If the size does not change during that time you assume the file is complete.
If possible, the program that processes the file should be able to deal with partial files.
A much better alternative is rsync, which is much more robust and deterministic. It can even be configured (via command-line option) to write the data initially to a temporary location and move it to its final destination path upon successful completion. If the file exists where you expect it, then it is by definition complete.
A possible solution would be first uploading the file with a different filename (e.g. adding ".partial") and then renaming it to its final name.
If the server finds the final name then the upload has been completed.
If you cannot control the upload process then what you are asking is impossible by definition: the file upload could stop because of a network problem or because the sending process is stopped for whatever reason.
What the receiving end will observe is just a closing of the incoming stream; there is no way to guarantee that the data will not be a partial transfer.
Other workarounds could be checking for an end-of-data marker or using a request to the sending server to check if (in their view) the transfer has been completed.
This is more fundamental than FTP: you'd have a similar problem reading those files even if they were being created on the local machine.
If you can't modify the writing process, you'll need to jump through some hoops. None are great, but some are safer than others.
Keep reading until nothing changes for some window (maybe a minute, like David Schwartz suggests). You could optimize this a bit by watching the file size.
Figure out if the files are written serially in a reliable order. When you see file N appear, you know that file N-1 is ready. (Assumes that the directory is empty before the files are written, though you could also look at timestamps.) The downside is that your logic will break if the writer ever changes order or starts writing in parallel.
The reliable, safe solutions require improving the writer process.
Writer can write the files to hidden or temporary locations and only make them visible once the entire file (or directory) is ready, using symlinks or file-moving or chmod.
Writer creates a special file (e.g., "./DONE") only after all other files have been written, and reader doesn't read any files until that file is present.
Depending on the file type, the writer could add some kind of end-of-file record/line at the end of the file, and the reader could ensure that it's present.
You can use Ftp library from Apache common API
get more information
boolean flag = retrieveFile(String remote, OutputStream local);
This flag check output stream is available of the current file.
my java application is supposed to read logging data of a Snort application on a Debian server.
The Snort application runs independent from my evaluation app and writes his logs into a file.
My evaulation app is supposed to check just the new content every 5 minutes. That's why I will move the logfile, so that the Snort application has to create a new file while my app can check the already written data from the old one.
Now the question: How can I ensure that I don't destroy the file in the case, that I move it in the moment the Snort application is writing on it? Has Java a functionality to check the current actions for the file so that no data can get lost? Does the OS lock the file while writing?
Thanks for your help, Kn0rK3
Not exactly what you are looking for, but I would do this in a very different way. Either by recording the line number / timestamp of the last entry read from the log file or the position in a RandomAccessFile (the second option is more efficient for obvious reasons), and, the next time you read the file, only do it from the recorded position to the EOF (at which you can record the last read position again).
Also, you can replace the "pool every 5 minutes" to a "pool every time I get a update notification" for this file strategy.
Since I assume that you don't have control of the code of the "Snort" application, I don't think that NIO FileLocks will help you.
It should not be an issue. Typically a logging application has some sort of file-descriptor or stream open to a file. If the file gets renamed, that doesn't affect the writing application in any way -- the name is independent to the contents of the file or its location on disk. Snort should continue to write to the new file-name until it notices that the file has been renamed at which point it re-opens a new log file to the old-name and switches to writing to that one.
That's the whole reason why it reopens in the first place. To support this sort of mechanism.
Now the question: How can I ensure that I don't destroy the file in the case...
The only thing you have to worry about is that you are renaming the file to a file-name that does not already exist. I would recommend moving it to a .YYYYMMDD.HHMMSS extension or something.
NOTE: In threaded logging operations, even if the new file has been opened, you may have to wait a bit for all of the threads to switch to the new logging stream. I'm not sure how Snort works but I have seen the log.YYYYMMDD file growing even after the log file was re-opened. I just wait a minute before I consume the renamed logfile. FYI.
I have a java thread A that continously polls a folder RESULTFOLDER and checks if there are some new files present in it.
Now the files are posted by some other program running on another machine into RESULTFOLDER.Now the files posted are all xml files (only xml).so at any point RESULTFOLDEr can hold only xml files.
Now my thread A continiously polls the RESULTFOLDER and parses the xml files one at a time and then deletes it.
Now sometimes what happenes is that if thread A tries to read and parse the file A at the time the other program is posting the file A .In this case i get exception in parsing file.Saying pre mature end of file.
How can i resolve the problem?
One way i think is to check date time of file creation and ensure that file is presnt at least for 1 minute or so.But i dont think java provides such API.How can i go about solving this problem?
You can write the .xml file to the folder, and then write a separate control file written after that. The control file would have zero bytes, and have a different extension, such as .ctl, but would have the same first part of the name.
When the thread polling the result folder finds the .ctl file, it knows it is safe to open the same-named file with a .xml extension.
This approach has the added benefit that it will work even when the writing task is on another computer.
Have the creating thread call setWritable(true, true) and setReadable(true, true) on the file at creation time. This will prevent non-creating thread from accessing that file when it is being created by the creating thread. After file creation, setWritable(true, false) and setReadable(true, false). The polling thread will need to check write Ability at polling time to ensure that the file should be read from.
Alternatively, you could provide a mutex for the directory. Have the thread that is creating the file acquire the mutex for the directory, create and populate the file, then release the mutex. When the polling thread needs to do its check, grab the mutex, check the directory, process the files, then release the mutex.
Three approaches:
As the file is being written, is has the name foo.tmp. Once it is completely written, it is renamed by the producer to foo.xml. Thus the consumer won't see the XML file until is is completely written by the producer.
(Same answer as #aaaa bbbb).
Once the file foo.xml is completely written, another file is created (which can be empty) named foo.ctl. The consumer does not process the XML file until it sees the CTL file, after which it can remove both.
(Same answer as #tafoo85).
The consumer cannot read the file until is is comlpetely written and made readable by the producer.
These approaches have the added benefit of working correctly even if the producer thread dies while in the middle of writing an incomplete XML file.