FileInput/OutputStream versus FileChannels -- which gives better performance

FileInput/OutputStream versus FileChannels -- which gives better performance - java

I am writing a program that has to copy a sizeable, but not huge amount of data from folder to folder (in the range of several dozen photos at once). Originally I was using java.io.FileOutputStream to simply read to buffer and write out, but then I heard about potential performance increases using java.nio.FileChannel.
I don't have the resources to run a serious, controlled test with the data I have, but there seems to be no consensus on what the advantages of each are (other than FileChannel being thread safe). Some users report FileChannel being great for smaller files, others report huge speed increases with larger files.
I am wondering if anyone knows exactly what the intent of creating FileChannel was in the first place: was it designed for better performance? In what cases? And is there a definitive performance increase for general kinds of data, or are the differences I should expect to see trivial because I am not working with data that is specialized enough?
EDIT: Assume my data does not need to be thread safe.

FileChannel.transferFrom/To should be faster than IO stream for file copying.
Or you can simply use Java 7's java.nio.file.Files.copy(source, target). That should be as fast as it can get.
However, in the end, performance won't be noticeably different - hard disk speed is the bottleneck.
FileChannel is not non-blocking, and it is not selectable. Not sure if they are going to add these features in future. Java 7 has AsynchronousFileChannel though.

Input and Output Streams assume a stream styled access to the file or resource. There are a few extra items which help (array reads) but the basic idea is that of a stream where you read in one or more characters at a time (possibly blocking until you have more characters available).
Channels are the means to copy information into Buffers. This provides a lower level of access to input and output routines. With thoughtful buffer sizing, the speed-ups can be impressive. Structuring your code around buffers can reduce the time spent in a read loop (also increasing performance). Finally, while it is possible to do pre-checking of input stream state in an attempt to avoid blocking, Channels and Buffers allow operations to perform in a non-blocking manner (even in the worst conditions).

Have you take a look at commons-io?
FileUtils.copyFileToDirectory(srcFile, destDir);

Related

can reading from disk from difference threads optimize program?

I am wondering is there a way to optimize reading from disk in java. I mean for example I want to print the contains of all text files in some directory, but after uppercase them. I can create another thread do uppercase them, but can I optimize reading by adding another(thread(s)) to read files too? I mean 2,3 or more threads to read difference files from disk. Is there some optimization for doing this or not? I hope that I explain the problem clearly.

I want to print the contains of all text files
This is most likely your bottleneck. If not, you should focus on what you bottleneck is as optimising anything else is likely to complicate your code for no benefit.
I can create another thread do uppercase them,
You can, though passing the work to another thread could be more expensive than making it uppercase depending on how your do this.
can I optimize reading by adding another(thread(s)) to read files too?
Possibly. How many disks do you have. If you have one disk, it can usually only do one thing at a time.
I mean 2,3 or more threads to read difference files from disk.
Most desktop drives can only do one operation at a time.
Is there some optimization for doing this or not?
Yes, but as I said, until you know what your bottleneck is, it's hard to jump to a solution.

I can create another thread do uppercase them
That's actually going in the right direction, but simply making all letters uppercase doesn't take enough time to really matter unless you're processing really large chunks of the file.
Because the standard single-threaded model of read-then-process means you're either reading data or processing it, when you could be doing both at the same time.
For example, you could be creating a series of highly compressed (say, JPEG2000 because it's so CPU intensive) images from a large video stream file. You could have one thread reading frames from the stream, placing them into a queue to process, and then have N threads each processing a frame into an image.
You'd tune the number of threads reading data and the number of threads processing data to keep both your disks and CPUs maximally busy without excess contention.
There are some cases where you can use multiple threads to read from a single file to get better performance. But you need a system designed from the ground up to do that. You need lots of disks (less so if they're SSDs), a pretty substantial IO infrastructure along with a system that has a lot of IO bandwidth, and then you need a file system that can handle multiple simultaneous access to a single file. Then the code you have to write to get better performance from reading using more than one thread has to match things like the physical layout of your files on disk.
That works best if you're doing lots of random reads from a file spread over multiple devices. Like a large, high-powered database server.
For example, lets say I have a huge data file spread over four or five disks (or even RAID arrays), with the file spread out over the disks in 64KB chunks. A handful of threads doing 64KB reads would be ideal to read or write such a file in a random-access mode. Let's say everything is really fast and you can read or write 1 GB/sec from such a file.
But if you turn around and just try to copy that data in a stream, you can still use multiple threads to get maximum performance - say 1 GB/sec - but if you just used a single thread to do read() calls in 1 MB chunks you'd probably get 950 MB/sec - or 95% or maximum multithreaded read performance.
I've actually benchmarked such systems and most of the time, multithreaded IO isn't worth the trouble unless you've invested a lot of money in your hardware and software (opensource file systems tend not to do this very well - you need to get into the realm of IBM's GPFS and Oracle's (nee LSC's then Sun's) QFS) and you know exactly what you're doing when you set it up.

Using NIO, do I need to care about R/W on block-boundaries?

Background
A lot of work has gone into optimizing database design, especially in the realm of the most optimal ways to read and write data from disks (both spindle and SSD).
The knowledge that has come out of the work suggests that reading and writing on block boundaries, matching the block sizes of the filesystem you are running on, is the most optimal approach.
Question
Say I am operating in a relatively low-memory environment and want to use a small 32MB memory-mapped file to read and write the contents of a huge 500GB file.
If I were using Java's NIO mechanisms, specifically the MappedByteBuffer (Java's memory-mapped file mechanism), would I need to take care to execute READ and WRITE operations on block boundaries (e.g. 4KB) into memory before pairing out the data I needed, or can I just issue R/W ops at any location I want and allow the operating system, VM paging logic, filesystem and storage firmware handle the optimization of the operations and culling of additional block data I didn't need as-needed?
Additional Detail
The reason for the question is in database design, I see this obsessive focus on block-optimization to the point that there doesn't seem to exist a world where you would ever just read and write data without the concept of a block.
What confuses me is that the filesystem is the one enforcing the block units of operation, why would my higher level app need to worry about this then? If I want the 17,631 bytes at offset 71, can't I just grab them and read them in, or is it really faster for me to figure out that
the read operation starts at block 0 and falls across the boundaries of blocks 0, 1 and 2... read all of those 3 blocks in to an internal byte[], then cull out the 17,631 bytes I wanted in the first place?
If the literature on DB design wasn't so religious about this block idea, the question would have never come up in my mind, but because it is, I am wondering if I am missing a critical detail here WRT filesystems and optimal block device I/O.
Thank you for reading.

I think part of the reason databases have awareness of a block size (which may not be exactly the same as the fs block size, but of course should align) is not just to perform block-aligned I/O, but also to manage how the disk data is cached in memory rather than just relying on the OS caching. Some databases bypass the OS filesystem cache completely, in fact. Having the database manage the cache sometimes allows greater intelligence as to how that cache is utilised, that the OS might not be able to provide.
An rdbms will typically take account of the number of blocks that could be read/written during a query in order to compare different execution plans: and the possibilities for all the data to be fetched from the same block can be a useful optimisation to take note of.
Most databases I'm familiar with have the concept of a block cache/buffer where some portion of the working set of the database lives. Managing a cache entirely made up of arbitrary extents could potentially be quite a bit harder to manage. Also many databases actually arrange their stored data as a sequence of blocks, so the I/O pattern grows out of that. Of course, this might simply be a legacy of databases originally written for platforms that didn't have rich OS caching facilities...
Trying to conclude this ramble with some sort of answer to your question... my feeling would be that reading from arbitrary extents within the mapped file and letting the OS deal with the extra slop should be fine. Performance-wise, it's probably more important to try and let the OS do read-ahead: e.g. using the "advise" calls so the OS can start reading the next extent from disk while you process the current one. And, of course, a way to advise the OS to uncache extents you've finished with.

4KB blocks are important because it's typically the granularity of the MMU and hence the OS virtual memory manager. When items are frequently used together, it's important to design your database layout so that these items end up in the same page. This way, a page fault will page in all the items in the page.

What about buffering FileInputStream?

I have a piece of code that reads hell of a lot (hundreds of thousand) of relatively small files (couple of KB) from the local file system in a loop. For each file there is a java.io.FileInputStream created to read the content. The process its very slow and take ages.
Do you think that wrapping the FIS into java.io.BufferedInputStream would make a significant difference?

If you aren't already using a byte[] buffer of a decent size in the read/write loop (the latest implementation of BufferedInputStream uses 8KB), then it will certainly make difference. Give it a try yourself. Don't forget to make any OutputStream a BufferedOutputStream as well.
But if you already have buffered it using a byte[] and/or it after all makes only little difference, then you've hit the harddisk and I/O controller speed as the bottleneck.

I very much doubt whether that will make any difference.
Your fundamental problem is the hundreds of throusands of tiny files. Reading those is going to make the disk thrash and take forever, no matter how you do it, you'll spend 99,9% of the time waiting on mechanical movement inside the harddisk.
There are two ways to fix this:
Save your data on an SSD - they have much lower (as in five orders of magnitude less) latency.
Rearrange your data into few large files and read those sequentially

That depends on how you're reading the data. If you're reading from the FileInputStream in a very inefficient way (for example, calling read() byte-by-byte), then using a BufferedInputStream could improve things dramatically. But if you're already using a reasonable-sized buffer with FileInputStream, switching to a BufferedInputStream won't matter.
Since you're talking a large number of very small files, there's a strong possibility that a lot of the delay is due to directory operations (open, close), not the actual reading of bytes from the files.

Buffer a large file; BufferedInputStream limited to 2gb; Arrays limited to 2^31 bytes

I am sequentially processing a large file and I'd like to keep a large chunk of it in memory, 16gb ram available on a 64 bit system.
A quick and dirty way is to do this, is simply wrap the input stream into a buffered input stream, unfortunately, this only gives me a 2gb buffer. I'd like to have more of it in memory, what alternatives do I have?

How about letting the OS deal with the buffering of the file? Have you checked what the performance impact of not copying the whole file into JVMs memory is?
EDIT: You could then use either RandomAccessFile or the FileChannel to efficiently read the necessary parts of the file into the JVMs memory.

Have you considered the MappedByteBuffer in java.nio? It's over my head but maybe it is what you are looking for.

I doubt that buffering more than 2gb at a time is going to be a huge win anyway. Depending on the amount of processing you're doing, you might be able to read in nearly as fast as you process. To speed it up, you might try using a two-threaded producer-consumer model (one thread reads the file and hands the data off to the other thread for processing).

The OS is going to cache as much of the file as it can, so trying to outsmart the cache manager probably isn't going to get you very much.
From a performance perspective, you will be much better served by keeping the bytes outside the JVM (transferring huge chunks of data between the OS and JVM is relatively slow). You can achieve this goal by using a MappedByteBuffer backed by a direct memory block.
Here's a pertinent how-to type of article: article

I think there are 64 bit JVMs that will support nonstandard limits.
You might try buffering chunks.

How to avoid OutOfMemoryError when using Bytebuffers and NIO?

I'm using ByteBuffers and FileChannels to write binary data to a file. When doing that for big files or successively for multiple files, I get an OutOfMemoryError exception.
I've read elsewhere that using Bytebuffers with NIO is broken and should be avoided. Does any of you already faced this kind of problem and found a solution to efficiently save large amounts of binary data in a file in java?
Is the jvm option -XX:MaxDirectMemorySize the way to go?

I would say don't create a huge ByteBuffer that contains ALL of the data at once. Create a much smaller ByteBuffer, fill it with data, then write this data to the FileChannel. Then reset the ByteBuffer and continue until all the data is written.

Check out Java's Mapped Byte Buffers, also known as 'direct buffers'. Basically, this mechanism uses the OS's virtual memory paging system to 'map' your buffer directly to disk. The OS will manage moving the bytes to/from disk and memory auto-magically, very quickly, and you won't have to worry about changing your virtual machine options. This will also allow you to take advantage of NIO's improved performance over traditional java stream-based i/o, without any weird hacks.
The only two catches that I can think of are:
On 32-bit system, you are limited to just under 4GB total for all mapped byte buffers. (That is actually a limit for my application, and I now run on 64-bit architectures.)
Implementation is JVM specific and not a requirement. I use Sun's JVM and there are no problems, but YMMV.
Kirk Pepperdine (a somewhat famous Java performance guru) is involved with a website, www.JavaPerformanceTuning.com, that has some more MBB details: NIO Performance Tips

If you access files in a random fashion (read here, skip, write there, move back) then you have a problem ;-)
But if you only write big files, you should seriously consider using streams. java.io.FileOutputStream can be used directly to write file byte after byte or wrapped in any other stream (i.e. DataOutputStream, ObjectOutputStream) for convenience of writing floats, ints, Strings or even serializeable objects. Similar classes exist for reading files.
Streams offer you convenience of manipulating arbitrarily large files in (almost) arbitrarily small memory. They are preferred way of accessing file system in vast majority of cases.

Using the transferFrom method should help with this, assuming you write to the channel incrementally and not all at once as previous answers also point out.

This can depend on the particular JDK vendor and version.
There is a bug in GC in some Sun JVMs. Shortages of direct memory will not trigger a GC in the main heap, but the direct memory is pinned down by garbage direct ByteBuffers in the main heap. If the main heap is mostly empty they many not be collected for a long time.
This can burn you even if you aren't using direct buffers on your own, because the JVM may be creating direct buffers on your behalf. For instance, writing a non-direct ByteBuffer to a SocketChannel creates a direct buffer under the covers to use for the actual I/O operation.
The workaround is to use a small number of direct buffers yourself, and keep them around for reuse.

The previous two responses seem pretty reasonable. As for whether the command line switch will work, it depends how quickly your memory usage hits the limit. If you don't have enough ram and virtual memory available to at least triple the memory available, then you will need to use one of the alternate suggestions given.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.