Background:
I have a Java application which does intensive IO on quite large
memory mapped files (> 500 MB). The program reads data, writes data,
and sometimes does both.
All read/write functions have similar computation complexity.
I benchmarked the IO layer of the program and noticed strange
performance characteristics of memory mapped files:
It performs 90k reads per second (read 1KB every iteration at random position)
It performs 38k writes per second (write 1KB every iteration sequentially)
It performs 43k writes per second (write 4 bytes every iteration at random position)
It performs only 9k read/write combined operation per second (read 12 bytes then write 1KB every iteration, at random position)
The programs on 64-bit JDK 1.7, Linux 3.4.
The machine is an ordinary Intel PC with 8 threads CPU and 4GB physical memory. Only 1 GB was assigned to JVM heap when conducting the benchmark.
If more details are needed, here is the benchmark code: https://github.com/HouzuoGuo/Aurinko2/blob/master/src/test/scala/storage/Benchmark.scala
And here is the implementation of the above read, write, read/write functions: https://github.com/HouzuoGuo/Aurinko2/blob/master/src/main/scala/aurinko2/storage/Collection.scala
So my questions are:
Given fixed file size and memory size, what factors affect memory mapped file random read performance?
Given fixed file size and memory size, what factors affect memory mapped file random write performance?
How do I explain the benchmark result of read/write combined operation? (I was expecting it to perform over 20K iterations per second).
Thank you.
The memory mapped file performance depends on disk performance, file system type, free memory available for file system cache and read/write block size. The page size on the linux is 4K. So you should expect most performance with 4k read/writes. An access at random position causes page fault if page is not mapped and will pull a new page read. Usually, you want memory mapped file if you want to see the files as a one memory array ( or ByteBuffer in Java ).
Related
I have several sorted binary files which store information in some variable length format (meaning one of the segments contains the length of the variable length segment).
I need to merge them into one sorted file. I can do so with BufferedInputStream successfully. Nevertheless, it takes very long time on a mechanical disk. On a machine with SSD its much faster, as expected.
What bothers me is the fact that even on SSD, the CPU utilization is very low, and makes me suspect there's a way to improve the speed. I assume this happens because most of the time the CPU waits on the disk. I tried to increase the buffers to hundreds of MBs to no avail.
I have tried to use MemoryMapped buffer and file channel but it didn't improve the runtime.
Any ideas?
Edit: Using MemoryMappedByteBuffer failed because the merged file size is over 2 GB, which is the size limitation of MemoryMappedByteBuffer. But even before having merged the smaller files into GB files, I didn't notice an improvement in speed or CPU utilization.
Thanks
Perhaps you can compress the files better or is that not an option? If the bottleneck is I/O then reducing the amount is a good attack angle.
http://www.oracle.com/technetwork/articles/java/compress-1565076.html
I have a file of size 5 gb. I like to do memory map that file in Java. I understand one memory mapped portion can not be > 2 gb.
My question is, Is it possible to create 5 x 1 gb memory mapped portions to map the complete 5 gb file and access them in the same Java application.
No, it's not possible.
There are two issues here:
First of all, a 32-bit machine (or 32-bit OS on 64-bit machine) only has an address space of 4 GB (32 bits), so you can't map a 5 GB file all at the same time even from C.
The other issue is the limitation of Java's implementation of memory mapping which is handled via a MappedByteBuffer. Even though the method FileChannel.map() takes longs for offset and size, it returns a MappedByteBuffer which can only use ints for its limit and position. This means that even on a 64-bit machine and OS where you can map the whole 5 GB file as a single area from C, in Java you will have to manually create a series of mapped regions, each no larger than 2 GB. Still, you will at least be able to map the 5 GB in chunks while on a 32 bit OS you can't have them mapped at the same time. And given that in Java unmapping a file region requires some ugly tricks, it's not convenient (though possible) to map and unmap regions as needed in order to keep them within the limit. You can have a look at the source code of Lucene or Cassandra. As far as I remember they also use libraries with native code when possible in order to handle mapping and unmapping in a more efficient way than pure Java allows.
To make things even more complicated, 2 GB is the theoretical limit which may not be reachable on a 32-bit OS due to memory fragmentation. Some OS-es may also be configured with a 3-1 memory split which leaves just 1 GB of address space available to user-space programs, with the rest going to the OS address space. So, in practice, the chunks you should try mapping should be much smaller than 2 GB, you are more likely to succeed in mapping 4-6 chunks of 250 MB than in mapping a single 2 GB chunk.
Please see MappedByteBuffer and FileChannel.map() javadocs.
I'm not an expert in Java NIO, so I'm not sure if the byte buffer handles chunks automatically or if you have to use multiple MappedByteBuffers. Feel free to code a simple class to test and play around with your huge file.
I have a program in Java that creates a log file about 1K in size. If I run a test that deletes the old log, and creates a new log, then saves it, repeated a million times, if the size of the file grows over time (up to a few mb's), will I risk damage to my SSD? Is there a size limit for the log file that could avoid this risk, or can anyone help me understand the mechanics of the risk?
In the case of constant same file open/close with gradual file size increase there are 2 protection mechanisms at File System and SSD levels that will prevent early disk failure.
First, on every file delete, File System will initiate Trim (aka Discard aka Logical Erase) command to the SSD. Trim address range will cover entire size of deleting file. Trim greatly helps SSD to reclaim free space for the new data. Using Trim in combination with Writes when accessing the same data range is the best operational mode for SSD in terms of saving its endurance. Just make sure that your OS has Trim operation enabled (usually it is by default). All modern SSDs should support it as well. Important notice, Trim is logical erase, it will not initiate immediate Physical Media erase. Physical Erase will be initiated later indirectly as a part of SSD internal Garbage Collection.
Second, when accessing the same file, most likely File System will issue Writes to the SSD at the same address. Just amount of writes will grow as file size grows. Such pattern is known as Hot Range access. It is nasty pattern for SSD in terms of endurance. SSD has to allocated free resources (physical pages) on every file write but lifetime of the data is very short as the data is deleted almost immediately. Overall, amount of unique data is very low in SSD Physical Media, but amount of allocated and processed resources (physical pages) is huge. Modern SSDs has protection from Hot Range access by using Physical Media Units in round robin manner that evens the wear.
I advise to monitor SSD SMART Health Data (Life-time left parameter), for example by using https://www.smartmontools.org/ or Software provided by SSD Vendor. I will help to see how your access pattern is affecting endurance.
Like with any file, if the disk doesn't contain enough space to write to a file, the OS (or Java) won't allow the file to be written until space is cleared. The only way you can "screw up" a disk in this manner is if you mess around with addresses at the kernel level.
The screenshot of VisualVM was taken when I ran an IO intensive JVM program (written in Scala), heap size was 4 GB and only 2 GB were in-use. The JVM program uses memory mapped file.
What does "mapped buffer pool" and "direct buffer pool" mean?
Those pools seem to be very full. Since the JVM program uses memory mapped file, will I see increased performance if the pools were larger? If so, how to increase their size?
The size of all mapped files are about 1.1GB in size.
Direct Buffer
A direct buffer is a chunk of memory typically used to interface Java to the OS I/O subsystems, for example as a place where the OS writes data as it receives it from a socket or disk, and from which Java can read directly.
Sharing the buffer with the OS is much more efficient than the original approach of copying data from the OS into Java's memory model, which then makes the data subject to Garbage Collection and inefficiencies such as the re-copying of data as it migrates from eden -> survivor -> tenured -> to the permanent generation.
In the screenshot you have just one buffer of 16KB of direct buffer. Java will grow this pool as required so the fact the blue area is at the top of the block is merely a statement that all buffer memory allocated so far is in use. I don't see this as an issue.
Mapped buffer pool
The mapped buffer pool is all the memory used by Java for its FileChannel instances.
Each FileChannel instance has a buffer shared with the OS (similar to the direct buffer with all the efficiency benefits). The memory is essentially an in-RAM window onto a portion of the file. Depending on the mode (read, write or both), Java can either read and/or modify the file's contents directly and the OS can directly supply data to or flush modified data to disk.
Additional advantages of this approach is that the OS can flush this buffer directly to the disk as it sees fit, such as when the OS is shutting down, and the OS can lock that portion of the file from other processes on the computer.
The screenshot indicates you have about 680MB in use by 12 FileChannel objects. Again, Java will grow this is Scala needs more (and the JVM can get additional memory from the OS), so the fact that all 680MB is all in use is not important. Given their size, it certainly seems to me that the program has already been optimized to use these buffers effectively.
Increasing the size of the mapped buffer pool
Java allocates memory outside the Garbage Collection space for the FileChannel buffers. This means the normal heap size parameters such as -Xmx are not important here
The size of the buffer in a FileChannel is set with the map method. Changing this would entail changing your Scala program
Once the buffer has reached a threshold size, of the order 10s-100s of KB, increasing FileChannel buffer size may or may not increase performance - it depends on how the program uses the buffer:
No: If the file is read precisely once from end to end: Almost all the time is either waiting for the disk or the processing algorithm
Maybe: If, however, the algorithm frequently scans the file revisiting portions many times, increasing the size might improve performance:
If modifying or writing the file, a larger buffer can consolidate more writes into a single flush.
If reading the file, the operating system will likely have already cached the file (the disk cache) and so any gains are likely marginal. Perversely increasing the size of the JVM might decrease performance by shrinking the effective disk cache size
In any case the application would have to be specifically coded to get any benefits, for example by implementing its own logical record pointer onto the cache.
Try profiling the application and look for I/O waits (Jprofiler and YourKit are good at this). It may be that file I/O is not actually a problem - don't be a victim of premature optimization. If I/O waits are a significant portion of the total elapsed time, then it might be worth trying out a larger buffer size
Further information
https://blogs.oracle.com/alanb/entry/monitoring_direct_buffers
Also be aware that there is a bug reported on the JVM saying that FileChannel is not good at releasing memory. It's detailed in Prevent OutOfMemory when using java.nio.MappedByteBuffer
I have arrays like
byte[] b = new byte[10];
byte[] b1 = new byte[1024*1024];
I populate them with some values. Say,
for(i=0;i<10;i++){
b[i]=1;
}
for(i=0;i<1024*1024;i++){
b1[i]=1;
}
Then I write it to a RandomAccessFile and read again from that file into the same array using,
randomAccessFile.write(arrayName);
and
randomAccessFile.read(arrayName);
When I try to calculate the throughput of both these arrays(using the time calculated for file read and write) of varying sizes(10 bytes and 1Mb), throughput appears to be more for 1MB array.
Sample Output:
Throughput of 10kb array: 0.1 Mb/sec.
Throughput of 1Mb array: 1000.0 Mb/sec.
Why does this happen? I have Intel i7 with quad core processor. Will my hardware configuration be responsible for this? If not what could be the possible reason?
The reason for the big difference is the overheads involved in I/O that occur no matter what the size of data being transferred - it's like the flag fall of a taxi ride. Overheads, which are not restricted to java and includes many O/S operations, include:
Finding the file on disk
Checking O/S permissions on the file
Opening the file for I/O
Closing the file
Updating file info in the file system
Many other tasks
Also, disk I/O is performed in pages (size depends on O/S, but usually 2K), so I/O of 1 byte probably costs the same as I/O of 2048 bytes: A slightly fairer comparison would be a 2048 byte array with a 1Mb array.
If you are using buffered I/O, that can further speed up larger I/O tasks.
Finally, what you report as "10Kb" is in fact just 10 bytes, so your calculation is possibly incorrect.