How does MappedByteBuffer get garbage-collected?

How does MappedByteBuffer get garbage-collected? - java

From the Java docs,
The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious
Also from the Java docs,
MappedByteBuffer: A direct byte buffer whose content is a memory-mapped region of a file.
and
A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected.
I believe that off the heap memory allocations cannot be garbage-collected by the GC. In this case, these statements make me curious about memory management of a MappedByteBuffer. What happens if the direct ByteBuffer backing a MappedByteBuffer sits outside the normal heap?

Related

Is Off-heap memory is a Java/JVM standard?

I am reading HBase docs and came across the Off-heap read path
As far as I understand this Off-heap is a place in memory where Java stores bytes/objects outside the reach of the Garbage Collector. I also went to search for some libs that facilitate using the off-heap memory and found Ehcatche However, I could not find any official docs from oracle or JVM about his. So is this a standard functionality of JVM or it is some kind of a hack and if it is what are the underlying classes and techniques used to do this?

You should look for ByteBuffer
Direct vs. non-direct buffers
A byte buffer is either direct or non-direct. Given a direct byte
buffer, the Java virtual machine will make a best effort to perform
native I/O operations directly upon it. That is, it will attempt to
avoid copying the buffer's content to (or from) an intermediate buffer
before (or after) each invocation of one of the underlying operating
system's native I/O operations.
A direct byte buffer may be created by invoking the allocateDirect
factory method of this class. The buffers returned by this method
typically have somewhat higher allocation and deallocation costs than
non-direct buffers. The contents of direct buffers may reside outside
of the normal garbage-collected heap, and so their impact upon the
memory footprint of an application might not be obvious. It is
therefore recommended that direct buffers be allocated primarily for
large, long-lived buffers that are subject to the underlying system's
native I/O operations. In general it is best to allocate direct
buffers only when they yield a measureable gain in program
performance.
A direct byte buffer may also be created by mapping a region of a file
directly into memory. An implementation of the Java platform may
optionally support the creation of direct byte buffers from native
code via JNI. If an instance of one of these kinds of buffers refers
to an inaccessible region of memory then an attempt to access that
region will not change the buffer's content and will cause an
unspecified exception to be thrown either at the time of the access or
at some later time.
Whether a byte buffer is direct or non-direct may be determined by
invoking its isDirect method. This method is provided so that explicit
buffer management can be done in performance-critical code.
It's up to JVM implementation how it handles direct ByteBuffers, but at least OpenJDK JVM is allocating memory off-heap.
The JEP 383: Foreign-Memory Access API (Second Incubator) feature is incubating in Java 15. This feature will make accessing off-heap memory standard by providing public API.

It is possible to expand a ByteBuffer created through the allocateDirect method?

Is there a way to increase allocated memory of off-heap ByteBuffer once it has created?

A direct byte buffer may also be created by mapping a region of a file directly into memory. An implementation of the Java platform may optionally support the creation of direct byte buffers from native code via JNI. If an instance of one of these kinds of buffers refers to an inaccessible region of memory then an attempt to access that region will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time.
The API has no provisions, but there might be a JVM that allows it via JNI.
I would say NO.

What is the purpose to use direct memory in Java?

Direct memory was introduced since java 1.4. The new I/O (NIO) classes introduced a new way of performing I/O based on channels and buffers. NIO added support for direct ByteBuffers, which can be passed directly to native memory rather than Java heap. Making them significantly faster in some scenarios because they can avoid copying data between Java heap and native heap.
I never understand why do we use direct memory. Can someone help to give an example?

I never understand why do we use direct memory. can someone help to give an example?
All system calls such as reading and writing sockets and files only use native memory. They can't use the heap. This means while you can copy to/from native memory from the heap, avoiding this copy can improve efficiency.
We use off-heap/native memory for storing most of our data which has a number of advantages.
it can be larger than the heap size.
it can be larger than main memory.
it can be shared between JVMs. i.e. one copy for multiple JVMs.
it can be persisted and retained across restarts of the JVM or even machine.
it has little to no impact on GC pause times.
depending on usage it can be faster
The reason it is not used more is that it is harder to make it both efficient and work like normal Java objects. For this reason, we have libraries such as Chronicle Map which act as a ConcurrentMap but using off-heap memory, and Chronicle Queue which is a journal, logger and persisted IPC between processes.

The JVM relies on the concept of garbage collection for reclaiming memory that is no longer used. This allows JVM language developers (e.g., Java, Scala, etc) to not have to worry about memory allocation and deallocation. You simply ask for memory, and let the JVM worry about when it will be reclaimed, or garbage collected.
While this is extremely convenient, it comes with the added overhead of a separate thread, consuming CPU and having to go through the JVM heap constantly, reclaiming objects that are not reachable anymore. There's entire books written about the topic, but if you want to read a bit more about JVM garbage collection, there's a ton of references out there, but this one is decent: https://dzone.com/articles/understanding-the-java-memory-model-and-the-garbag
Anyway, if in your app, you know you're going to be doing massive amounts of copying, updating objects and values, you can elect to handle those objects and their memory consumption yourself. So, regardless of how much churn there is in those objects, those objects will never be moved around in the heap, they will never be garbage collected, and thus, won't impact garbage collection in the JVM. There's a bit more detail in this answer: https://stackoverflow.com/a/6091680/236528
From the Official Javadoc:
Direct vs. non-direct buffers
A byte buffer is either direct or non-direct. Given a direct byte
buffer, the Java virtual machine will make a best effort to perform
native I/O operations directly upon it. That is, it will attempt to
avoid copying the buffer's content to (or from) an intermediate buffer
before (or after) each invocation of one of the underlying operating
system's native I/O operations.
A direct byte buffer may be created by invoking the allocateDirect
factory method of this class. The buffers returned by this method
typically have somewhat higher allocation and deallocation costs than
non-direct buffers. The contents of direct buffers may reside
outside of the normal garbage-collected heap, and so their impact upon
the memory footprint of an application might not be obvious. It is
therefore recommended that direct buffers be allocated primarily for
large, long-lived buffers that are subject to the underlying system's
native I/O operations. In general it is best to allocate direct
buffers only when they yield a measureable gain in program
performance.
https://download.java.net/java/early_access/jdk11/docs/api/java.base/java/nio/ByteBuffer.html

Java, buffer lazily allocated, free-able on demand and addressed as byte []

I am trying to implement a proof-of-concept memory-aware scheduling functionality by extending an existing Java program. The program uses buffers under the form of byte []. For my purpose byte [] are problematic because
they are garbage collected
they are allocated upfront instead of lazily (the JVM seems to touch all pages it has allocated when creating the buffer)
they make the JVM allocate more and more memory which is not given back the OS.
To achieve my goal I would like buffers to be lazily allocated memory (pages allocated only when written to) and free-able on demand. This is similar to how it would happen in C++.
In addition, as much as possible, I would like to minimize the changes to the existing code-base.
I looked at nio.ByteBuffer and at the Unsafe classes. Neither fits my case because
java.nio.ByteBuffers don't seem to be lazily allocated. When I allocate an empty 1GB buffer the RSS of the program immediately goes to 1GB.
Unsafe.allocateMemory is lazily allocated but I do not know how to reference it as byte [].
Is there any way to solve this?
Any way to view memory allocated with Unsafe.allocateMemory() as a byte []?
Or change an existing byte [] to point to memory allocated with Unsafe?
Thank you

Java is designed to have seperate regions of memory.
on heap like byte[] which is designed to be re-allocable and is zero'ed out and not lazy. The reason this is so is that the memory is managed.
off heap which has the advantage it can be lazy but it can't pretend to be managed data types like byte[].

What is Mapped Buffer Pool / Direct Buffer Pool and how to increase their size?

The screenshot of VisualVM was taken when I ran an IO intensive JVM program (written in Scala), heap size was 4 GB and only 2 GB were in-use. The JVM program uses memory mapped file.
What does "mapped buffer pool" and "direct buffer pool" mean?
Those pools seem to be very full. Since the JVM program uses memory mapped file, will I see increased performance if the pools were larger? If so, how to increase their size?
The size of all mapped files are about 1.1GB in size.

Direct Buffer
A direct buffer is a chunk of memory typically used to interface Java to the OS I/O subsystems, for example as a place where the OS writes data as it receives it from a socket or disk, and from which Java can read directly.
Sharing the buffer with the OS is much more efficient than the original approach of copying data from the OS into Java's memory model, which then makes the data subject to Garbage Collection and inefficiencies such as the re-copying of data as it migrates from eden -> survivor -> tenured -> to the permanent generation.
In the screenshot you have just one buffer of 16KB of direct buffer. Java will grow this pool as required so the fact the blue area is at the top of the block is merely a statement that all buffer memory allocated so far is in use. I don't see this as an issue.
Mapped buffer pool
The mapped buffer pool is all the memory used by Java for its FileChannel instances.
Each FileChannel instance has a buffer shared with the OS (similar to the direct buffer with all the efficiency benefits). The memory is essentially an in-RAM window onto a portion of the file. Depending on the mode (read, write or both), Java can either read and/or modify the file's contents directly and the OS can directly supply data to or flush modified data to disk.
Additional advantages of this approach is that the OS can flush this buffer directly to the disk as it sees fit, such as when the OS is shutting down, and the OS can lock that portion of the file from other processes on the computer.
The screenshot indicates you have about 680MB in use by 12 FileChannel objects. Again, Java will grow this is Scala needs more (and the JVM can get additional memory from the OS), so the fact that all 680MB is all in use is not important. Given their size, it certainly seems to me that the program has already been optimized to use these buffers effectively.
Increasing the size of the mapped buffer pool
Java allocates memory outside the Garbage Collection space for the FileChannel buffers. This means the normal heap size parameters such as -Xmx are not important here
The size of the buffer in a FileChannel is set with the map method. Changing this would entail changing your Scala program
Once the buffer has reached a threshold size, of the order 10s-100s of KB, increasing FileChannel buffer size may or may not increase performance - it depends on how the program uses the buffer:
No: If the file is read precisely once from end to end: Almost all the time is either waiting for the disk or the processing algorithm
Maybe: If, however, the algorithm frequently scans the file revisiting portions many times, increasing the size might improve performance:
If modifying or writing the file, a larger buffer can consolidate more writes into a single flush.
If reading the file, the operating system will likely have already cached the file (the disk cache) and so any gains are likely marginal. Perversely increasing the size of the JVM might decrease performance by shrinking the effective disk cache size
In any case the application would have to be specifically coded to get any benefits, for example by implementing its own logical record pointer onto the cache.
Try profiling the application and look for I/O waits (Jprofiler and YourKit are good at this). It may be that file I/O is not actually a problem - don't be a victim of premature optimization. If I/O waits are a significant portion of the total elapsed time, then it might be worth trying out a larger buffer size
Further information
https://blogs.oracle.com/alanb/entry/monitoring_direct_buffers
Also be aware that there is a bug reported on the JVM saying that FileChannel is not good at releasing memory. It's detailed in Prevent OutOfMemory when using java.nio.MappedByteBuffer

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.