Predicting Java memory

Predicting Java memory - java

Is there a way to predict how much memory my Java program is going to take? I come from a C++ background where I implemented methods such as "size_in_bytes()" on classes and I could fairly accurately predict the runtime memory footprint of my app. Now I'm in a Java world, and that is not so easy... There are references, pools, immutable objects that are shared... but I'd still like to be able to predict my memory footprint before I look at the process size in top.

You can inspect the size of objects if you use the instrumentation API. It is a bit tricky to use -- it requires a "premain" method and extra VM parameters -- but there are plenty of examples on the web. "java instrumentation size" should find you these.
Note that the default methods will only give you a shallow size. And unless you avoid any object construction outside of the constructor (which is next to impossible), there will be dead objects around waiting to be garbage collected.
But in general, you could use these to estimate the memory requirements of your application, if you have a good control on the amount of objects generated.

You can't predict the amount of memory a program is going to take. However, you can predict how much an object will take. Edit it turns out I'm almost completely wrong, this document describes the memory usage of objects better: http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

In general, you can predict fairly closely what a given object will require. There's some overhead that is relatively fixed, plus the instance fields in the object, plus a modest amount of padding. But then object size is rounded up to at least (on most JVMs) a 16-byte boundary, and some JVMs round up some object sizes to larger boundaries (to allow the use of standard sized pre-allocated object frames). But all this is relatively fixed for a given JVM.
What varies, of course, is the overhead required for garbage collection. A naive garbage collector requires 100% overhead (at least one free byte for every allocated byte), though certain varieties of "generational" collectors can improve on this to a degree. But how much space is required for GC is highly dependent on the workload (on most JVMs).
The other problem is that when you're running at a relatively low level of allocation (where you're only using maybe 10% of max available heap) then garbage will accumulate. It isn't actively referenced, but the bits of garbage are interspersed with your active objects, so it takes up working set. As a result, your working set tends to be roughly equal to your current overall garbage-collected heap size (plus other system overhead).
You can, of course, "throttle" the heap size so that you run at a higher % utilization, but that increases the frequency of garbage collection (and the overall cost of GC to a lesser degree).

You can use profilers to understand the constant set of objects that are always in memory. Then you should execute all the code paths to check for memory leaks. JProfiler is a good one to start with.

Related

Java: reliably allocate large array on heap

The Task
Allocate X=4..8MB of byte array (on heap), e.g. using ByteBuffer.allocate() such that it will not cause an OutOfMemoryError. It is not allowed to split the array and process it in smaller portions. Note that the allocation happens on heap, this is not a direct ByteBuffer.
The Challenges
Memory can be fragmented, and if there is enough memory (greater than X), a continuous portion of size X bytes may still be unavailable to allocate the array (any API to find out is there a continuous region of X bytes is available probably would help).
Heap memory is divided into regions to keep objects of different generations, and an object cannot span two or more regions of the heap: Huge arrays throws out of memory despite enough memory available and Large Array allocation across young and tenured portions of java Heap
Large objects are immediately allocated in a tenured region, but it is tricky to reliably reason about which region exactly even using ManagementFactory.getMemoryPoolMXBeans(): how can I know size of each generation in java heap with jmx Some JVMs dynamically adjust LOAs: https://www.ibm.com/docs/en/sdk-java-technology/8?topic=SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/mm_allocation_loa.html
Question
Is there a way in Java to code as follows?
if (<I can reliably allocate an array sized X bytes on heap right now>) {
ByteBuffer.allocate(X);
}

There’s a fundamental problem with the idea to do
if (<I can reliably allocate an array sized X bytes on heap right now>) {
ByteBuffer.allocate(X);
}
known as “check-then-act” anti-pattern. Regardless of how the check in the if’s condition is supposed to work, you need to ensure that it doesn’t change between the check and the subsequent action, i.e. the allocation.
To ensure that the result doesn’t change, you’d not only need to stop all other threads of the same JVM from performing allocations (or concurrent garbage collection from completing) but also prevent all other processes of the same machine from allocating memory, as it is possible that the operating system did not reserve memory for your JVM exclusively but still allows other processing to take it right at this point.
The condition itself has the challenges already named in your question and, as you said yourself, all this fiddling with implementation specific memory regions might be moot when the JVM is capable of reconfiguring them on-the-fly. Since this is usually done as response to the result of a garbage collection, you’d need to perform a full garbage collection first, to determine the resulting situation. Only in this case we were able to be sure that another GC won’t change the situation, if we were able to stop all other threads and processes from doing allocations.
And on some JVMs the only way to reliably trigger a garbage collection, is to perform an actual allocation.
So you need a way to atomically perform the check, followed by an actual allocation that ensures that the memory stays available to you no matter what happens in the environment or an answer that the memory is not available. This mechanism does exist. Just call ByteBuffer.allocate(X) and if it completes normally, the returned reference ensures that the memory stays available as long as you keep it. Otherwise, the thrown OutOfMemoryError signals the unavailability of the memory. Since this mechanism exist, there is no reason to provide a second one with the same outcome.

No, there is no reliable way to do this in Java.
There are several ways to get estimates or best-effort guesses for the available memory, but nothing reliable. Also note that even if there were such a thing, another thread could change the available amount between the condition and the call to allocate.
This related answer contains a way to get such an estimate, and also explains some of the reasons why this can not be reliable.

Does garbage collection change the object addresses in Java?

I read that garbage collection can lead to memory fragmentation problem at run-time. To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
This means that the object addresses must change from time to time? Also, if this happens,
Are the references to these objects also re-assigned?
Won't this cause significant performance issues? How does Java cope with it?

I read that garbage collection can lead to memory fragmentation problem at run-time.
This is not an exclusive problem of garbage collected heaps. When you have a manually managed heap and free memory in a different order than the preceding allocations, you may get a fragmented heap as well. And being able to have different lifetimes than the last-in-first-out order of automatic storage aka stack memory, is one of the main motivations to use the heap memory.
To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
Not necessarily all objects. Typical implementation strategies will divide the memory into logical regions and only move objects from a specific region to another, but not all existing objects at a time. These strategies may incorporate the age of the objects, like generational collectors moving objects of the young generation from the Eden space to a Survivor space, or the distribution of the remaining objects, like the “Garbage First” collector which will, like the name suggests, evacuate the fragment with the highest garbage ratio first, which implies the smallest work to get a free contiguous memory block.
This means that the object addresses must change from time to time?
Of course, yes.
Also, if this happens,
Are the references to these objects also re-assigned?
The specification does not mandate how object references are implemented. An indirect pointer may eliminate the need to adapt all references, see also this Q&A. However, for JVMs using direct pointers, this does indeed imply that these pointers need to get adapted.
Won't this cause significant performance issues? How does Java cope with it?
First, we have to consider what we gain from that. To “eliminate fragmentation” is not an end in itself. If we don’t do it, we have to scan the reachable objects for gaps between them and create a data structure maintaining this information, which we would call “free memory” then. We also needed to implement memory allocations as a search for matching chunks in this data structure or to split chunks if no exact match has been found. This is a rather expensive operation compared to an allocation from a contiguous free memory block, where we only have to bump the pointer to the next free byte by the required size.
Given that allocations happen much more often than garbage collection, which only runs when the memory is full (or a threshold has been crossed), this does already justify more expensive copy operations. It also implies that just using a larger heap can solve performance issues, as it reduces the number of required garbage collector runs, whereas the number of survivor objects will not scale with the memory (unreachable objects stay unreachable, regardless of how long you defer the collection). In fact, deferring the collection raises the chances that more objects became unreachable in the meanwhile. Compare also with this answer.
The costs of adapting references are not much higher than the costs of traversing references in the marking phase. In fact, non-concurrent collectors could even combine these two steps, transferring an object on first encounter and adapting subsequently encountered references, instead of marking the object. The actual copying is the more expensive aspect, but as explained above, it is reduced by not copying all objects but using certain strategies based on typical application behavior, like generational approaches or the “garbage first” strategy, to minimize the required work.

If you move an object around the memory, its address will change. Therefore, references pointing to it will need to be updated. Memory fragmentation occurs when an object in a contigous (in memory) sequence of objects gets deleted. This creates a hole in the memory space, which is generally bad because contigous chunks of memory have faster access times and a higher probability of fitting in chache lines, among other things. It should be noted that the use of indirection tables can prevent reference updates up to the maximum level of indirection used.
Garbage collection has a moderate performance overhead, not just in Java but in other languages as well, such as C# for example. As how Java copes with this, the strategies for performing garbage collection and how to minimize its impact on performance depends on the particular JVM being used, since each JVM can implement garbage collection however it pleases; the only requirement is that it meets the JVM specification.
However, as a programmer, there are some best practices you should follow to make the best out of garbage collection and to minimze its performance hit on your application. See this, also this, this, this blog post, and this other blog post. You might want to check the JVM specs but it's a bit dense.

How to gracefully tell Java about total memory limits?

I have troubles with Java memory consumption.
I'd like to say to Java something like this: "you have 8GB of memory, please use it, and only it. Only if you really can't put all your resources in this memory pool, then fail with OOM".
I know, there are default parameters like -Xmx - they limit only the heap. There are also plenty of other parameters, I know. The problems with these parameters are:
They aren't relevant. I don't want to limit the heap size to 6GB (and trust that native memory won't take more than 2GB). I do want to limit all the memory (heap, native, whatever). And do that effectively, not just saying "-Xmx1GB" - to be safe.
There is too many different parameters related to memory, and I don't know how to configure all of them to achieve the goal.
So, I don't want to go there and care about heap, perm and whatever types of memory. My high-level expectation is: since there is only 8GB, and some static memory is needed - take the static memory from the 8GB, and carefully split the remaining memory between other dynamic memory entities.
Also, ulimit and similar things don't work. I don't want to kill the java process once it consumes more memory than expected. I want Java does its best to not reach the limit firstly, and only if it really, really can't - kill the process.
And I'm OK to define even 100 java parameters, why not. :) But then I need assistance with the full list of needed parameters (for, say, Java 8).

Have you tried -XX:MetaspaceSize?
Is this what you need?
Please, read this article: http://karunsubramanian.com/websphere/one-important-change-in-memory-management-in-java-8/
Keep in mind that this is only valid to Java 8.

AFAIK, there is no java command line parameter or set of parameters that will do that.
Your best bet (IMO) is to set the max heap size and the max metaspace size and hope that other things are going to be pretty static / predictable for your application. (It won't cover the size of the JVM binary and it probably won't cover native libraries, memory mapped files, stacks and so on.)
In a comment you said:
So I'm forced to have a significant amount of memory unused to be safe.
I think you are worrying about the wrong thing here. Assuming that you are not constrained by address space or swap space limitations, memory that is never used doesn't matter.
If a page of your address space is not used, the OS will (in the long term) swap it out, and give the physical RAM page to something else.
Pages in the heap won't be in that situation in a typical Java application. (Address space pages will cycle between in-use and free as the GC moves objects within and between "spaces".)
However, the flip-side is that a GC needs the total heap size to be significantly larger than the sum of the live objects. If too much of the heap is occupied with reachable objects, the interval between garbage collection runs decreases, and your GC ergonomics suffer. In the worst case, a JVM can grind to a halt as the time spent in the GC tends to 100%. Ugly. The GC overhead limit mechanism prevents this, but that just means that your JVM gets an OOME sooner.
So, in the normal heap case, a better way to think about it is that you need to keep a portion of memory "unused" so that the GC can operate efficiently.

Is Java Native Memory Faster than the heap?

I'm exploring options to help my memory-intensive application, and in doing so I came across Terracotta's BigMemory. From what I gather, they take advantage of non-garbage-collected, off-heap "native memory," and apparently this is about 10x slower than heap-storage due to serialization/deserialization issues. Prior to reading about BigMemory, I'd never heard of "native memory" outside of normal JNI. Although BigMemory is an interesting option that warrants further consideration, I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])? Or do the vagaries of garbage collection, etc. render this question unanswerable? I know "measure it" is a common answer around here, but I'm afraid I would not set up a representative test as I don't yet know enough about how native memory works in Java.

Direct memory is faster when performing IO because it avoid one copy of the data. However, for 95% of application you won't notice the difference.
You can store data in direct memory, however it won't be faster than storing data POJOs. (or as safe or readable or maintainable) If you are worried about GC, try creating your objects (have to be mutable) in advance and reuse them without discarding them. If you don't discard your objects, there is nothing to collect.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])?
Direct memory can be faster than using a byte[] if you use use non bytes like int as it can read/write the whole four bytes without turning the data into bytes.
However it is slower than using POJOs as it has to bounds check every access.
Or do the vagaries of garbage collection, etc. render this question unanswerable?
The speed has nothing to do with the GC. The GC only matters when creating or discard objects.
BTW: If you minimise the number of object you discard and increase your Eden size, you can prevent even minor collection occurring for a long time e.g. a whole day.

The point of BigMemory is not that native memory is faster, but rather, it's to reduce the overhead of the garbage collector having to go through the effort of tracking down references to memory and cleaning it up. As your heap size increases, so do your GC intervals and CPU commitment. Depending upon the situation, this can create a sort of "glass ceiling" where the Java heap gets so big that the GC turns into a hog, taking up huge amounts of processor power each time the GC kicks in. Also, many GC algorithms require some level of locking that means nobody can do anything until that portion of the GC reference tracking algorithm finishes, though many JVM's have gotten much better at handling this. Where I work, with our app server and JVM's, we found that the "glass ceiling" is about 1.5 GB. If we try to configure the heap larger than that, the GC routine starts eating up more than 50% of total CPU time, so it's a very real cost. We've determined this through various forms of GC analysis provided by our JVM vendor.
BigMemory, on the other hand, takes a more manual approach to memory management. It reduces the overhead and sort of takes us back to having to do our own memory cleanup, as we did in C, albeit in a much simpler approach akin to a HashMap. This essentially eliminates the need for a traditional garbage collection routine, and as a result, we eliminate that overhead. I believe that the Terracotta folks used native memory via a ByteBuffer as it's an easy way to get out from under the Java garbage collector.
The following whitepaper has some good info on how they architected BigMemory and some background on the overhead of the GC: http://www.terracotta.org/resources/whitepapers/bigmemory-whitepaper.

I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
I think that your question is predicated on a false assumption. AFAIK, it is impossible to bypass the serialization issue that they are talking about here. The only thing you could do would be to simplify the objects that you put into BigMemory and use custom serialization / deserialization code to reduce the overheads.
While benchmarks might give you a rough idea of the overheads, the actual overheads will be very application specific. My advice would be:
Only go down this route if you know you need to. (You will be tying your application to a particular implementation technology.)
Be prepared for some intrusive changes to your application if the data involved isn't already managed using as a cache.
Be prepared to spend some time in (re-)tuning your caching code to get good performance with BigMemory.
If your data structures are complicated, expect a proportionately larger runtime overheads and tuning effort.

Does allocation speed depend on the garbage collector being used?

My app is allocating a ton of objects (>1mln per second; most objects are byte arrays of size ~80-100 and strings of the same size) and I think it might be the source of its poor performance.
The app's working set is only tens of megabytes. Profiling the app shows that GC time is negligibly small.
However, I suspect that perhaps the allocation procedure depends on which GC is being used, and some settings might make allocation faster or perhaps make a positive influence on cache hit rate, etc.
Is that so? Or is allocation performance independent on GC settings under the assumption that garbage collection itself takes little time?

Of course your performance depends on the allocator used. But you have profiled GC and saw that it is not much of an issue. Also, one of the strengths of the GC is fast allocation at the expense of slower collection.
I think you are having issues with resulting fragmentation which makes memory access pattern problematic for the cpu, since it may need to invalidate its cache too often. Most GC algorithms doesn't reclaim space in an optimum way.
Since your working set is limited and predictable, you might want to use an object pool which is allocated beforehand. You may also want to use reference counting to avoid much of the manual memory management. Technically it is still GC but not in the common sense of the GC.
Still, I don't think the performance is much affected by how you manage memory but how you actually use, access it. Most likely your profiler has the definite answer.

There are two distinct aspects to object allocation. The first is finding a suitable area of memory - with todays generational garbarge collectors, this is usually very fast (in the order of a few 10ths of machine cycles).
The second is the initialization of the objects you allocate. Since everything you allocate in Java is initialized, the cost for initialization can easily outweight the cost of allocation (except for the most simple, smallest objects). There is more. Since initialization requires writing the entire memory area the new object occupies (if you allocate a "new byte[1<<20]" for example, the entire megabyte needs to be set to zeros), this also usually pulls that memory into the cpu's cache, evicting other, older cache lines (which may or may not belong to your current "hot" working set).
If you do comparatively little processing on each of your arrays, those effects can severly affect the performance of your code. This can be partially avoided by re-using the same arrays over and over, but it usually makes the program logic more complex. It is also often not easy to determine if cache trashing is really the culprit. Its impossible to say from what little information is given in your question.

Does your VM try to pool strings? I had heard once, that IBM's VM did something like string interning but dynamically (no idea if its true) perhaps your VM is trying doing extra work to build an internal data structure of String internals.
Are you doing something like byte b[] = new byte[100]; String s = new String(b); by any chance? You might try not to allocate the String objects, and instead allocate some random object which has a reference to the byte[] (for comparison).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.