How to check if a file exists in Java without blocking? - java

If calling new File(path).exists() a call to the file system is generally needed, so in my understanding this is not an appropriate call to make from a single-threaded event loop. All the nio-based file classes such as AsynchronousFileChannel appear to do non-blocking reads or writes, but checking for the existence of the file appears to be blocking. Is there a way to check a file exists, and/or get metadata such as file size in a non-blocking fashion?

As per the comment from #davmac the answer is "you can't".

so in my understanding this is not an appropriate call to make from a single-threaded event loop such as in Netty.
Your understanding is not correct, and this is a non sequitur.
All the nio-based file classes such as FileChannel appear to do non-blocking reads or writes
That's not correct either. FileChannel does not perform non-blocking I/O.
but opening the file appears to be blocking.
You will havrno explain how you arrive at that conclusion and how it is adversely affecting you.
Is there a way to check a file exists, and/or get metadata such as file size in a non-blocking fashion?
There are several ways to check for file existence in Java, such as File.exists(), whatever the corresponding method inFiles is, and simply trying to open it and catching the FileNotFoundException that results if it doesn't.
None of them blocks, at least not for any appreciable amount of time

Related

Fastest way to write file with multiple threads:FileChannel vs multiple RandomAccessFiles [duplicate]

I am trying to write a single huge file in Java using multiple threads.
I have tried both FileWriter and bufferedWriter classes in Java.
The content being written is actually an entire table (Postgres) being read using CopyManager and written. Each line in the file is a single tuple from the table and I am writing 100s of lines at a time.
Approach to write:
The single to-be-written file is opened by multiple threads in append mode.
Each thread thereafter tries writing to the file file.
Following are the issues I face:
Once a while, the contents of the file gets overwritten i.e: One line remains incomplete and the next line starts from there itself. My assumption here is that the buffers for writer are getting full. This forces the writer to immediately write the data onto the file. The data written may not be a complete line and before it can write the remainder, the next thread writes its content onto the file.
While using Filewriter, once a while I see a single black line in the file.
Any suggestions, how to avoid this data integrity issue?
Shared Resource == Contention
Writing to a normal file by definition is a serialized operation. You gain no performance by trying to write to it from multiple threads, I/O is a finite bounded resource at orders of magnitude less bandwidth than even the slowest or most overloaded CPU.
Concurrent access to a shared resource can be complicated ( and slow )
If you have multiple threads that are doing expensive calculations then you have options, if you are just using multiple threads because you think you are going to speed something up, you are just going to do the opposite. Contention for I/O always slows down access to the resource, it never speeds it up because of the lock waits and other overhead.
You have to have a critical section that is protected and allows only a single writer at a time. Just look up the source code for any logging writer that supports concurrency and you will see that there is only a single thread that writes to the file.
If your application is primarily:
CPU Bound: You can use some locking mechanism/data construct to only let one thread out of many write to the file at a time, which will be useless from a concurrency standpoint as a naive solution; If these threads are CPU bound with little I/O this might work.
I/O Bound: This is the most common case, you must use a messaging passing system with a queue of some sort and have all the threads post to a queue/buffer and have a single thread pull from it and write to the file. This will be the most scalable and easiest to implement solution.
Journaling - Async Writes
If you need to create a single super large file where order of writes are unimportant and the program is CPU bound you can use a journaling technique.
Have each process write to a separate file and then concat the multiple files into a single large file at the end. This is a very old school low tech solution that works well and has for decades.
Obviously the more storage I/O you have the better this will perform on the end concat.
I am trying to write a single huge file in Java using multiple threads.
I would recommend that you have X threads reading from the database and a single thread writing to your output file. This is going to be much easier to implement as opposed to doing file locking and the like.
You could use a shared BlockingQueue (maybe ArrayBlockingQueue) so the database readers would add(...) to the queue and your writer would be in a take() loop on the queue. When the readers finish, they could add some special IM_DONE string constant and as soon as the writing thread sees X of these constants (i.e. one for each reader), it would close the output file and exit.
So then you can use a single BufferedWriter without any locks and the like. Chances are that you will be blocked by the database calls instead of the local IO. Certainly the extra thread isn't going to slow you down at all.
The single to-be-written file is opened by multiple threads in append mode. Each thread thereafter tries writing to the file file.
If you are adamant to have your reading threads also do the writing then you should add a synchronized block around the access to a single shared BufferedWriter -- you could synchronize on the BufferedWriter object itself. Knowing when to close the writer is a bit of an issue since each thread would have to know if the other one has exited. Each thread could increment a shared AtomicInteger when they run and decrement when they are done. Then the thread that looks at the run-count and sees 0 would be the one that would close the writer.
Instead of having a synchronized methods, the better solution would be to have a threadpool with single thread backed by a blocking queue. The message application would be writing will be pushed to blocking queue. The log writer thread would continue to read from blocking queue (will be blocked in case queue is empty) and would continue to write it to single file.

Writing a file using multiple threads

I am trying to write a single huge file in Java using multiple threads.
I have tried both FileWriter and bufferedWriter classes in Java.
The content being written is actually an entire table (Postgres) being read using CopyManager and written. Each line in the file is a single tuple from the table and I am writing 100s of lines at a time.
Approach to write:
The single to-be-written file is opened by multiple threads in append mode.
Each thread thereafter tries writing to the file file.
Following are the issues I face:
Once a while, the contents of the file gets overwritten i.e: One line remains incomplete and the next line starts from there itself. My assumption here is that the buffers for writer are getting full. This forces the writer to immediately write the data onto the file. The data written may not be a complete line and before it can write the remainder, the next thread writes its content onto the file.
While using Filewriter, once a while I see a single black line in the file.
Any suggestions, how to avoid this data integrity issue?
Shared Resource == Contention
Writing to a normal file by definition is a serialized operation. You gain no performance by trying to write to it from multiple threads, I/O is a finite bounded resource at orders of magnitude less bandwidth than even the slowest or most overloaded CPU.
Concurrent access to a shared resource can be complicated ( and slow )
If you have multiple threads that are doing expensive calculations then you have options, if you are just using multiple threads because you think you are going to speed something up, you are just going to do the opposite. Contention for I/O always slows down access to the resource, it never speeds it up because of the lock waits and other overhead.
You have to have a critical section that is protected and allows only a single writer at a time. Just look up the source code for any logging writer that supports concurrency and you will see that there is only a single thread that writes to the file.
If your application is primarily:
CPU Bound: You can use some locking mechanism/data construct to only let one thread out of many write to the file at a time, which will be useless from a concurrency standpoint as a naive solution; If these threads are CPU bound with little I/O this might work.
I/O Bound: This is the most common case, you must use a messaging passing system with a queue of some sort and have all the threads post to a queue/buffer and have a single thread pull from it and write to the file. This will be the most scalable and easiest to implement solution.
Journaling - Async Writes
If you need to create a single super large file where order of writes are unimportant and the program is CPU bound you can use a journaling technique.
Have each process write to a separate file and then concat the multiple files into a single large file at the end. This is a very old school low tech solution that works well and has for decades.
Obviously the more storage I/O you have the better this will perform on the end concat.
I am trying to write a single huge file in Java using multiple threads.
I would recommend that you have X threads reading from the database and a single thread writing to your output file. This is going to be much easier to implement as opposed to doing file locking and the like.
You could use a shared BlockingQueue (maybe ArrayBlockingQueue) so the database readers would add(...) to the queue and your writer would be in a take() loop on the queue. When the readers finish, they could add some special IM_DONE string constant and as soon as the writing thread sees X of these constants (i.e. one for each reader), it would close the output file and exit.
So then you can use a single BufferedWriter without any locks and the like. Chances are that you will be blocked by the database calls instead of the local IO. Certainly the extra thread isn't going to slow you down at all.
The single to-be-written file is opened by multiple threads in append mode. Each thread thereafter tries writing to the file file.
If you are adamant to have your reading threads also do the writing then you should add a synchronized block around the access to a single shared BufferedWriter -- you could synchronize on the BufferedWriter object itself. Knowing when to close the writer is a bit of an issue since each thread would have to know if the other one has exited. Each thread could increment a shared AtomicInteger when they run and decrement when they are done. Then the thread that looks at the run-count and sees 0 would be the one that would close the writer.
Instead of having a synchronized methods, the better solution would be to have a threadpool with single thread backed by a blocking queue. The message application would be writing will be pushed to blocking queue. The log writer thread would continue to read from blocking queue (will be blocked in case queue is empty) and would continue to write it to single file.

Safely getting the file pointer of the end of a file

I want to append a line of text into a file - however, I want to get the position of the string in the file, such that I can access the string directly using a RandomAccessFile and file.seek() (or similar)
The issue is that alot of file i/o operations are asynch, and the write operations can happen within very short time intervals - suggesting a asynch write, since everything else is inefficient. How do I make sure the filepointer is calculated correctly? I am a newcomer to Java and dont yet understand the details of the different methods of File I/O, so excuse my Question if using a BufferedWriter is exactly what I am looking for, but how do you get the current length of that?
EDIT: Reading the entire file is NOT an option. The file is large and as I said, the write operations happen often, several hundred every second in peak times.
Refer to the FileChannel class: http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html
Relevant snippets from the link:
The file itself contains a variable-length sequence of bytes that can be read and written and whose current size can be queried. The size of the file increases when bytes are written beyond its current size;
...
File channels are safe for use by multiple concurrent threads. The close method may be invoked at any time, as specified by the Channel interface. Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes. Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.

Is Opening Java FileOutputStream efficient?

I'm writing a singleton logger for my program right now, and I was wondering whether it would be better to open and close it everytime I log something, or to open the stream at creation of the singleton and close it at the termination of the program? And if I were to do that, how would I close it at termination?
The main advantage of opening the file once is performance. You save yourself the penalty of calling an open each time, and seek to the end of the file for appending; this get worse if the file is big (and some logs tend to be).
The cons are:
You might not be able to read the last log line inmmediately, if there is some buffering in the writer (delayed writes). Howeever, this can be fixed by flushing after each write (you might lose some performance, but this is not usually relevant).
You cannot simultaneously write to the same log from different processes. But you probably don't need this - and if you need it, the open-and-close solution still needs to deal with concurrency.
Some external log processing (typically, log rotation with renaming) becomes problematic. To allow for this, you might need to implement some signalling that closes and reopens the file.
Typically, the advantage outweights the cons, so the general rule is to keep the log file open. But that depends on the scenario.
(As other answers point out, normally you'd prefer to use some standard logging library instead of implementing this on your own. But it's instructive to give it a try, or at least to think of all the issues involved).
Do not close it, just flush, this is what Log4j FileAppender does by default.
You should open once (and close once). If you do nothing, Java will close it for you. You may prefer to explicitly override Object.finalize().

Java: A FileInputStream that blocks in read() while other thread downloads remainder of file?

I have an FFmpeg-based video-playing app which is able to play content from any arbitrary InputStream.
It is important that the app is able to play a video file which is in the process of being downloaded. What I seem to need for this is a special kind of FileInputStream that will (a) share file access with the downloading thread, and (b) if it reaches the end of the downloaded portion, will quietly block until more content becomes available.
(a) seems easy enough thanks to RandomAccessFile, but I'm a bit puzzled about (b). I can probably hack something up that will work, but I am wondering if there's a standard approach to implementing this. Thinking about it in detail gives me a feeling that I may be missing something obvious.
Any thoughts? How would you guys do this?
If you can push the data not in the file but into a OutputStream (or maybe write simulataneously to both FileOutputStream and other shared PipedOutputStream), this would be the easiest solution:
Use PipedOutputStream and PipedInputStream. This will allow you to implement both A and B, however you will need to somehow implement video buffering on the viewer side.
Basically your downloader thread will write every bit of data it gets to the PipedOutputStream. The write() method is not blocking, as the data is pushed to the internal buffer of the pipe.
Your viewer thread will simply read() from the pipedInputStream, as here is what the API says: This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
You have to poll the length of the file. There is no way to block waiting for the length of the file to change using the file alone. You can busy poll, or poll every 10 or 100 ms.
If the writer and reader are in the same process, you can use locking/synchronized blocks to notify the reader when more data has been added.
With multiple processes, you could use a socket to either send the data, or at least notify when the length has changed allowing the reader to block.
In case you do not control the download process, and want to play just ANY downloading file (even by some other downloaders) then you can Watch directory for changes.
It needs to be mentioned that this method is cross-platform and cross-filesystem. Here's quote from the same article:
Most file system implementations have native support for file change notification. The Watch Service API takes advantage of this support where available. However, when a file system does not support this mechanism, the Watch Service will poll the file system, waiting for events.
I believe there is no real answer to this question. I've got something which works, but it looks like inelegant hacking to me. Perhaps sometimes that's inevitable.

Categories