I am reading HBase docs and came across the Off-heap read path
As far as I understand this Off-heap is a place in memory where Java stores bytes/objects outside the reach of the Garbage Collector. I also went to search for some libs that facilitate using the off-heap memory and found Ehcatche However, I could not find any official docs from oracle or JVM about his. So is this a standard functionality of JVM or it is some kind of a hack and if it is what are the underlying classes and techniques used to do this?
You should look for ByteBuffer
Direct vs. non-direct buffers
A byte buffer is either direct or non-direct. Given a direct byte
buffer, the Java virtual machine will make a best effort to perform
native I/O operations directly upon it. That is, it will attempt to
avoid copying the buffer's content to (or from) an intermediate buffer
before (or after) each invocation of one of the underlying operating
system's native I/O operations.
A direct byte buffer may be created by invoking the allocateDirect
factory method of this class. The buffers returned by this method
typically have somewhat higher allocation and deallocation costs than
non-direct buffers. The contents of direct buffers may reside outside
of the normal garbage-collected heap, and so their impact upon the
memory footprint of an application might not be obvious. It is
therefore recommended that direct buffers be allocated primarily for
large, long-lived buffers that are subject to the underlying system's
native I/O operations. In general it is best to allocate direct
buffers only when they yield a measureable gain in program
performance.
A direct byte buffer may also be created by mapping a region of a file
directly into memory. An implementation of the Java platform may
optionally support the creation of direct byte buffers from native
code via JNI. If an instance of one of these kinds of buffers refers
to an inaccessible region of memory then an attempt to access that
region will not change the buffer's content and will cause an
unspecified exception to be thrown either at the time of the access or
at some later time.
Whether a byte buffer is direct or non-direct may be determined by
invoking its isDirect method. This method is provided so that explicit
buffer management can be done in performance-critical code.
It's up to JVM implementation how it handles direct ByteBuffers, but at least OpenJDK JVM is allocating memory off-heap.
The JEP 383: Foreign-Memory Access API (Second Incubator) feature is incubating in Java 15. This feature will make accessing off-heap memory standard by providing public API.
Related
Is there a way to increase allocated memory of off-heap ByteBuffer once it has created?
A direct byte buffer may also be created by mapping a region of a file directly into memory. An implementation of the Java platform may optionally support the creation of direct byte buffers from native code via JNI. If an instance of one of these kinds of buffers refers to an inaccessible region of memory then an attempt to access that region will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time.
The API has no provisions, but there might be a JVM that allows it via JNI.
I would say NO.
Direct memory was introduced since java 1.4. The new I/O (NIO) classes introduced a new way of performing I/O based on channels and buffers. NIO added support for direct ByteBuffers, which can be passed directly to native memory rather than Java heap. Making them significantly faster in some scenarios because they can avoid copying data between Java heap and native heap.
I never understand why do we use direct memory. Can someone help to give an example?
I never understand why do we use direct memory. can someone help to give an example?
All system calls such as reading and writing sockets and files only use native memory. They can't use the heap. This means while you can copy to/from native memory from the heap, avoiding this copy can improve efficiency.
We use off-heap/native memory for storing most of our data which has a number of advantages.
it can be larger than the heap size.
it can be larger than main memory.
it can be shared between JVMs. i.e. one copy for multiple JVMs.
it can be persisted and retained across restarts of the JVM or even machine.
it has little to no impact on GC pause times.
depending on usage it can be faster
The reason it is not used more is that it is harder to make it both efficient and work like normal Java objects. For this reason, we have libraries such as Chronicle Map which act as a ConcurrentMap but using off-heap memory, and Chronicle Queue which is a journal, logger and persisted IPC between processes.
The JVM relies on the concept of garbage collection for reclaiming memory that is no longer used. This allows JVM language developers (e.g., Java, Scala, etc) to not have to worry about memory allocation and deallocation. You simply ask for memory, and let the JVM worry about when it will be reclaimed, or garbage collected.
While this is extremely convenient, it comes with the added overhead of a separate thread, consuming CPU and having to go through the JVM heap constantly, reclaiming objects that are not reachable anymore. There's entire books written about the topic, but if you want to read a bit more about JVM garbage collection, there's a ton of references out there, but this one is decent: https://dzone.com/articles/understanding-the-java-memory-model-and-the-garbag
Anyway, if in your app, you know you're going to be doing massive amounts of copying, updating objects and values, you can elect to handle those objects and their memory consumption yourself. So, regardless of how much churn there is in those objects, those objects will never be moved around in the heap, they will never be garbage collected, and thus, won't impact garbage collection in the JVM. There's a bit more detail in this answer: https://stackoverflow.com/a/6091680/236528
From the Official Javadoc:
Direct vs. non-direct buffers
A byte buffer is either direct or non-direct. Given a direct byte
buffer, the Java virtual machine will make a best effort to perform
native I/O operations directly upon it. That is, it will attempt to
avoid copying the buffer's content to (or from) an intermediate buffer
before (or after) each invocation of one of the underlying operating
system's native I/O operations.
A direct byte buffer may be created by invoking the allocateDirect
factory method of this class. The buffers returned by this method
typically have somewhat higher allocation and deallocation costs than
non-direct buffers. The contents of direct buffers may reside
outside of the normal garbage-collected heap, and so their impact upon
the memory footprint of an application might not be obvious. It is
therefore recommended that direct buffers be allocated primarily for
large, long-lived buffers that are subject to the underlying system's
native I/O operations. In general it is best to allocate direct
buffers only when they yield a measureable gain in program
performance.
https://download.java.net/java/early_access/jdk11/docs/api/java.base/java/nio/ByteBuffer.html
I just read a wiki here, one of the passages said :
Although theoretically these are general-purpose data structures, the
implementation may select memory for alignment or paging
characteristics, which are not otherwise accessible in Java.
Typically, this would be used to allow the buffer contents to occupy
the same physical memory used by the underlying operating system for
its native I/O operations, thus allowing the most direct transfer
mechanism, and eliminating the need for any additional copying
I am curious about the words "eliminating the need for any additional copying", when will JVM need this and why NIO could avoid it ?
It's talking about a direct mapping between a kernel data structure and a user space data structure; normally a context switch is required when moving between the two. However, with nio and a direct buffer, the context switch (and corresponding memory copies) does not occur.
From java.nio package API:
A byte buffer can be allocated as a direct buffer, in which case the Java virtual machine will make a best effort to perform native I/O operations directly upon it.
Example:
FileChannel fc = ...
ByteBuffer buf = ByteBuffer.allocateDirect(8192);
int n = fc.read(buf);
simply, old IO way always copy data from the kernel to memory in the heap. Using NIO allows to use buffers where file/network stream is mapped by the kernel directly. Result: less memory consumption and far better performance.
Many developers know only a single JVM, the Oracle HotSpot JVM, and speak of garbage collection in general when they are referring to Oracle’s HotSpot implementation specifically. but the thing is check Bob's post
New input/output (NIO) library, introduced with JDK 1.4, provides high-speed, block-oriented I/O in standard Java code.
Few points on NIO,
IO is stream oriented, where NIO is buffer oriented.
Offer non-blocking I/O operations
Avoid an extra copy of data passed between Java and native memory
Allows to read and write blocks of
data direct from disk, rather than byte by byte
The NIO API introduces a new primitive I/O abstraction called channel. A channel represents an open connection to an entity such as a hardware device, a file, a network socket.
When you are using APIs FileChannel.transferTo() or FileChannel.transferFrom() JVM uses the OS's access to DMA (Direct Memory Access) which is potential advantage.
According to Ron Hitches on Java NIO
Direct buffers are intended for interaction with channels and native
I/O routines. They make a best effort to store the byte elements in a
memory area that a channel can use for direct, or raw, access by using
native code to tell the operating system to drain or fill the memory
area directly.
Direct byte buffers are usually the best choice for I/O operations. By
design, they support the most efficient I/O mechanism available to the
JVM. Nondirect byte buffers can be passed to channels, but doing so
may incur a performance penalty. It's usually not possible for a
nondirect buffer to be the target of a native I/O operation.
Direct buffers are optimal for I/O, but they may be more expensive to
create than nondirect byte buffers. The memory used by direct buffers
is allocated by calling through to native, operating system-specific
code, bypassing the standard JVM heap. Setting up and tearing down
direct buffers could be significantly more expensive than
heap-resident buffers, depending on the host operating system and JVM
implementation. The memory-storage areas of direct buffers are not
subject to garbage collection because they are outside the standard
JVM heap
Chapter 2 on below tutorial will give you more insight ( especially 2.4, 2.4.2 etc)
http://blogimg.chinaunix.net/blog/upfile2/090901134800.pdf
I have an Android project (targeting Android 1.6 and up) which includes native code written in C/C++, accessed via NDK. I'm wondering what the most efficient way is to pass an array of bytes from Java through NDK to my JNI glue layer. My concern is around whether or not NDK for Android will copy the array of bytes, or just give me a direct reference. I need read-only access to the bytes at the C++ level, so any copying behind the scenes would be a waste of time from my perspective.
It's easy to find info about this on the web, but I'm not sure what is the most pertinent info. Examples:
Get the pointer of a Java ByteBuffer though JNI
http://www.milk.com/kodebase/dalvik-docs-mirror/docs/jni-tips.html
http://elliotth.blogspot.com/2007/03/optimizing-jni-array-access.html
So does anyone know what is the best (most efficient, least copying) way to do this in the current NDK? GetByteArrayRegion? GetByteArrayElements? Something else?
According to the documentation, GetDirectBufferAddress will give you the reference without copying the array.
However, to call this function you need to allocate a direct buffer with ByteBuffer.allocateDirect() instead of a simple byte array. It has a counterpart as explained here :
A direct byte buffer may be created by invoking the allocateDirect
factory method of this class. The buffers returned by this method
typically have somewhat higher allocation and deallocation costs than
non-direct buffers. The contents of direct buffers may reside outside
of the normal garbage-collected heap, and so their impact upon the
memory footprint of an application might not be obvious. It is
therefore recommended that direct buffers be allocated primarily for
large, long-lived buffers that are subject to the underlying system's
native I/O operations. In general it is best to allocate direct
buffers only when they yield a measureable gain in program
performance.
I'm using ByteBuffers and FileChannels to write binary data to a file. When doing that for big files or successively for multiple files, I get an OutOfMemoryError exception.
I've read elsewhere that using Bytebuffers with NIO is broken and should be avoided. Does any of you already faced this kind of problem and found a solution to efficiently save large amounts of binary data in a file in java?
Is the jvm option -XX:MaxDirectMemorySize the way to go?
I would say don't create a huge ByteBuffer that contains ALL of the data at once. Create a much smaller ByteBuffer, fill it with data, then write this data to the FileChannel. Then reset the ByteBuffer and continue until all the data is written.
Check out Java's Mapped Byte Buffers, also known as 'direct buffers'. Basically, this mechanism uses the OS's virtual memory paging system to 'map' your buffer directly to disk. The OS will manage moving the bytes to/from disk and memory auto-magically, very quickly, and you won't have to worry about changing your virtual machine options. This will also allow you to take advantage of NIO's improved performance over traditional java stream-based i/o, without any weird hacks.
The only two catches that I can think of are:
On 32-bit system, you are limited to just under 4GB total for all mapped byte buffers. (That is actually a limit for my application, and I now run on 64-bit architectures.)
Implementation is JVM specific and not a requirement. I use Sun's JVM and there are no problems, but YMMV.
Kirk Pepperdine (a somewhat famous Java performance guru) is involved with a website, www.JavaPerformanceTuning.com, that has some more MBB details: NIO Performance Tips
If you access files in a random fashion (read here, skip, write there, move back) then you have a problem ;-)
But if you only write big files, you should seriously consider using streams. java.io.FileOutputStream can be used directly to write file byte after byte or wrapped in any other stream (i.e. DataOutputStream, ObjectOutputStream) for convenience of writing floats, ints, Strings or even serializeable objects. Similar classes exist for reading files.
Streams offer you convenience of manipulating arbitrarily large files in (almost) arbitrarily small memory. They are preferred way of accessing file system in vast majority of cases.
Using the transferFrom method should help with this, assuming you write to the channel incrementally and not all at once as previous answers also point out.
This can depend on the particular JDK vendor and version.
There is a bug in GC in some Sun JVMs. Shortages of direct memory will not trigger a GC in the main heap, but the direct memory is pinned down by garbage direct ByteBuffers in the main heap. If the main heap is mostly empty they many not be collected for a long time.
This can burn you even if you aren't using direct buffers on your own, because the JVM may be creating direct buffers on your behalf. For instance, writing a non-direct ByteBuffer to a SocketChannel creates a direct buffer under the covers to use for the actual I/O operation.
The workaround is to use a small number of direct buffers yourself, and keep them around for reuse.
The previous two responses seem pretty reasonable. As for whether the command line switch will work, it depends how quickly your memory usage hits the limit. If you don't have enough ram and virtual memory available to at least triple the memory available, then you will need to use one of the alternate suggestions given.