Dynamic Memory Handling Java vs C++

Dynamic Memory Handling Java vs C++ - java

I am a C++ programmer currently trying to work on Java. Working on C++ I have an habit of keeping track of dynamic memory allocations and employing various techniques like RAII to avoid memory leak. Java as we know provides a Garbage Collector(GC) to take care of memory leaks.So while programing in Java should one just let go all the wholesome worries of heap memory and leave it for GC to take care of the memory leaks or should one have a approach similar to that while programming languages without GC, try to take care of the memory you allocate and just let GC take care of ones that you might miss out. What should be the approach? What are the downsides of either?

I'm not sure what you mean by trying to take care of the memory you allocate in presence of a GC, but I'll try some mind reading.
Basically, you shouldn't "worry" about your memory being collected. If you don't reference objects anymore, they will be picked up. Logical memory leaks are still possible if you create a situation where objects are referenced for the rest of your program (example: register listeners and never un-register them, example: implementing a vector-like collection that doesn't set items to null when removing items off the end).
However, if you have a strong RAII background, you'll be disapointed to learn that there is no direct equivalent in Java. The GC is a first-class tool for dealing with memory, but there is no guaranteed on when (or even if) finalizers are called. This means that the first-class treatment applied to memory is not applied to any other resource, such as: windows, database connections, sockets, files, synchronization primitives, etc.

With Java and .net (and i imagine, most other GC'ed languages), you don't need to worry much about heap memory at all. What you do need to worry about, is native resources like file handles, sockets, GUI primitives like fonts and such. Those generally need to be "disposed", which releases the native resources. (They often dispose themselves on finalization anyway, but it's kinda iffy to fall back on that. Dispose stuff yourself.)

With a GC, you have to:
still take care to properly release non-memory resources like file handles and DB connections.
make sure you don't keep references to objects you don't need anymore (like keeping them in collections). Because if you do that, you have a memory leak.
Apart from that, you can't really "take care of the memory you allocate", and trying to do so would be a waste of time.

Technically you don't need to worry about cleaning up after memory allocations since all objects are properly reference counted and the GC will take care of everything. In practice, an overactive GC will negatively impact performance. So while Java does not have a delete operator, you will do well to reuse objects as much as possible.
Also Java does not have destructors since objects will exists until the GC gets to them,. Java therefore has the finally construct which you should use to ensure that all non-memory related resources (files sockets etc) are closed when you are finished with them. Do not rely on the finalise method to do this for you.

In Java the GC takes care of allocating memory and freeing unused memory. This does not mean you can disregard the issue alltogether.
The Java GC frees objects that have are not referenced from the root. This means that Java can still have memory leaks if you are not carefull to remove references from global contexts like caches in global HashMaps, etc.
If any cluster of objects that reference eachother are not referenced from the root, the Java GC will free them. I.e. it does not work with reference counts, so you do not need to null all object references (although some coding styles do prefer clearing references as sonn as they are not needed anymore.)

Related

Out of memory errors in Java API that uses finalizers in order to free memory allocated by C calls

We have a Java API that is a wrapper around a C API.
As such, we end up with several Java classes that are wrappers around C++ classes.
These classes implement the finalize method in order to free the memory that has been allocated for them.
Generally, this works fine. However, in high-load scenarios we get out of memory exceptions.
Memory dumps indicate that virtually all the memory (around 6Gb in this case) is filled with the finalizer queue and the objects waiting to be finalized.
For comparison, the C API on its own never goes over around 150 Mb of memory usage.
Under low load, the Java implementation can run indefinitely. So this doesn't seem to be a memory leak as such. It just seem to be that under high load, new objects that require finalizing are generated faster than finalizers get executed.
Obviously, the 'correct' fix is to reduce the number of objects being created. However, that's a significant undertaking and will take a while. In the meantime, is there a mechanism that might help alleviate this issue? For example, by giving the GC more resources.

Java was designed around the idea that finalizers could be used as the primary cleanup mechanism for objects that go out of scope. Such an approach may have been almost workable when the total number of objects was small enough that the overhead of an "always scan everything" garbage collector would have been acceptable, but there are relatively few cases where finalization would be appropriate cleanup measure in a system with a generational garbage collector (which nearly all JVM implementations are going to have, because it offers a huge speed boost compared to always scanning everything).
Using Closable along with a try-with-resources constructs is a vastly superior approach whenever it's workable. There is no guarantee that finalize methods will get called with any degree of timeliness, and there are many situations where patterns of interrelated objects may prevent them from getting called at all. While finalize can be useful for some purposes, such as identifying objects which got improperly abandoned while holding resources, there are relatively few purposes for which it would be the proper tool.
If you do need to use finalizers, you should understand an important principle: contrary to popular belief, finalizers do not trigger when an object is actually garbage collected"--they fire when an object would have been garbage collected but for the existence of a finalizer somewhere [including, but not limited to, the object's own finalizer]. No object can actually be garbage collected while any reference to it exists in any local variable, in any other object to which any reference exists, or any object with a finalizer that hasn't run to completion. Further, to avoid having to examine all objects on every garbage-collection cycle, objects which have been alive for awhile will be given a "free pass" on most GC cycles. Thus, if an object with a finalizer is alive for awhile before it is abandoned, it may take quite awhile for its finalizer to run, and it will keep objects to which it holds references around long enough that they're likely to also earn a "free pass".
I would thus suggest that to the extent possible, even when it's necessary to use finalizer, you should limit their use to privately-held objects which in turn avoid holding strong references to anything which isn't explicitly needed for their cleanup task.

Phantom references is an alternative to finalizers available in Java.
Phantom references allow you to better control resource reclamation process.
you can combine explicit resource disposal (e.g. try with resources construct) with GC base disposal
you can employ multiple threads for postmortem housekeeping
Using phantom references is complicated tough. In this article you can find a minimal example of phantom reference base resource housekeeping.
In modern Java there are also Cleaner class which is based on phantom reference too, but provides infrastructure (reference queue, worker threads etc) for ease of use.

Is there any way that I can free up memory in the java code that is generated to bind C code via JNI/JNA?

I am using a java library that use JNA to bind to the original C library (That library is called Leptonica). I encountered a situation where free(data) has to be called in the C code to free up the memory. But, is there any function in java that I can free up the memory?
In the C code
void ImageData::SetPixInternal(Pix* pix, GenericVector<char>* image_data) {
l_uint8* data;
size_t size;
pixWriteMem(&data, &size, pix, IFF_PNG);
pixDestroy(&pix);
image_data->init_to_size(size, 0);
memcpy(&(*image_data)[0], data, size);
free(data);
}
The function pixWriteMem() will create and allocate memory to the "data", which you need to do free(data) to free up the memory later.
In Java code, I can only access pixWriteMem(), not the SetPixInternal(), so I have no way to free up the "data", which create a memory leak.

The other comments and answers here all seem to be suggesting that you just rely on the garbage collector or tell the garbage collector to run. That is not the correct answer for memory allocated in C and being used in Java via JNI.
It looks like that execution() does free the memory. The last line you show us is free(data). Still, to answer your the question as you asked it, the answer is "not directly." If you have the ability to add to the C code, you could create another C function which frees the data and then call that using JNI. Perhaps there is more that we are not seeing which relates better to your concern about the memory leak?
Also, be careful about freeing memory allocated by a library you are using. You should make sure that the library doesn't still need it and is leaking it before you go trying to free it.
And now back to memory management in general...
Java is indeed a garbage-collected language. This means that you do not specifically delete objects. Instead, you make sure there are no references to it, then the garbage collector takes care of the memory management. This does not mean that Java is free from memory leaks, as there are ways to accidentally keep a reference hanging around such that the object never gets garbage collected. If you have a situation like this, you might want to read up on the different kinds of references in Java (strong/weak/etc.).
Again, this is not the problem here. This is a C/Java hybrid, and the code in question is in C being called by Java. In C, you allocate the memory you want to use and then you need to free the memory yourself when you are done with it. Even if the C code is being run by Java via the JNI, you are still responsible for your own memory. You cannot just malloc() a bunch of memory and expect the Java garbage collector to know when to clean it up. Hence the OP's question.
If you need to add the functionality yourself to do a free, even without the source code for the C part, you might still be able to write your own C interface for freeing the memory if you have access to the pointer to the memory in question. You could write basically a tiny library that just frees the memory for you, make the JNI interface for it, and pass the pointer to that. If you go this route then, depending on your OS, you might need to guarantee that your tiny free library's native code is running in the same process as the rest of the native code, or if not the same process then at least that the process you run it from has write access to the memory owned by the other code's process; this memory/process issue is probably not an issue in your case, but I'm throwing it out there for completeness.

In Java code, I can only access createData(), not the excution(), so I have no way to free up the "data", which create a memory leak.
Then it sucks to be you.
Seriously, if you want to free memory allocated by a native method and not freed before that method returns, then you need to maintain a handle of some kind on that memory and later pass it to another native method that will free the memory. If you do not presently have such a native method available, then you'll need to create one.
The other question is how to ensure that the needed native method is invoked. Relying on users to invoke it, directly or indirectly, leaves you open to memory leaks should users fail to do so. There are two main ways to solve that problem:
Give your class a finalizer that ensures the memory is freed. This is the core use case for finalizers, but even so, there are good reasons to prefer to avoid writing them. The other alternative is to
Create a reference object (SoftReference, WeakReference, or PhantomReference), associate the reference with a mechanism for freeing the native-allocated memory belonging to the referenced Java object (but not via that object), and register that object with a reference queue. The reference will be enqueued when the object is GC'd, at which point you know to free the native-allocated memory.
That does not necessarily mean that you should prevent users from explicitly freeing the memory, for with enough bookkeeping you can track whether anything still needs to be freed at any given time. Allowing users to release resources explicitly may help keep your overall resource usage lower. But if you want to avoid memory leaks then you need to have a fallback.

No there is no function like C's free() in Java. But you can suggest garbage collector to run by calling System.gc()

How to get the statistics of the garbage collected objects?

Is it possible to see the java objects (and their class type) that were made null and which are
Not yet garbage collected/cleaned
garbage collected/cleaned.
This statistic will help to know that how many objects repeatedly created (by a wrong logic) instead of creating one time.

I think that it is theoretically possible, though frankly you would be crazy to try it.
The route to finding unreachable objects is to use the Java VM Tool Interface (JVMTI) to iterate over all objects in the heap (reachable or unreachable) in order to find the one you are looking for. Then you extract its state via JVMTI and (somehow) reify it so that you can display it.
Normally you would do this in a separate JVM; e.g. the one running your debugger or profiling tool. But it is possible for an application to attach an agent to itself, and use it to dig around in the JVM. However, this is not the intended usage for JVMTI, and I would anticipate that there could be "hazards" in doing this.
You can read more here:
Creating a Debugging and Profiling Agent with JVMTI
Own your heap: Iterate class instances with JVMTI
But please don't blame me if you go crazy trying to get this working.
UPDATE I concur with Marko's note that you are unlikely to learn anything significant by looking at unreachable objects.

to display the unwanted or null java objects that are not cleaned by the java garbage process
This is not a well-defined concept; at least there are no useful definitions which would give you anything of relevance.
A piece of memory where an object was allocated can be considered free for all practical purposes as soon as that object has become unreachable. The amount of memory that the block represents is available to the JVM allocator in the sense that no out-of-memory event will happen due to that block being "overlooked" in some sense.
Further note that many "garbage collection" algorithms usually do the exact opposite: they find live objects and relocate them so they occupy a contiguous block of memory. The algorithms are simply oblivious to "garbage" objects and treat them as just empty space.
So, even if you manage to write up some low-level Java Agent-based module which will enumerate all the objects on the heap, you will not gain any interesting insight: the unreachable objects which you encounter will just happen to linger on because the JVM has not yet felt the need to reuse their memory.

What if creat many object in short time in java?

I know the best practice is to avoid object creation. But have to some time.
These new instance just behave as temp object. They don't be referenced after create. How java GC react to this.
Can it cause OOM? If I can collect them immediately. How can I remove them from memory?

In a modern JVM, you'll find that creating lots and lots of very short-lived objects is cheap. "Short-lived" is key here, as things do get more expensive for objects that live long enough to get promoted out the Eden space.
As a practical matter, I recommend using a good profiler to examine the actual performance of your application.

If you worry so much about memory management you could also consider using C++ instead, in order to have complete control of your actions. On Garbage Collector behaviour.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
Something along the lines of http://wwwasd.web.cern.ch/wwwasd/lhc++/Objectivity/V5.2/Java/guide/jgdStorage.fm.html and specifically non-garbage-collectible containers there (non-garbage-collectable?).
The problem is that I have lots of ordinary temporary objects, but I have even bigger (several Gigs) of objects that are stored for Cache purposes. For no reason should the Java GC traverse all those Cache gigabytes trying to find anything to collect, because they contain cached data which have their own timeouts.
This way I could partition my data in a custom way into infinite-lived and normal-lived objects, and hopefully GC would be quite fast because normal objects don't live so long and amount to smaller amounts.
There are some workarounds to this problem, such as Apache DirectMemory and Commercial Terracotta BigMemory(http://terracotta.org/products/bigmemory), but a java-native solution would be nicer (I mean free and probably more reliable?). Also I want to avoid serialization overhead which means it should happen within same jvm. To my understanding DirectMemory and BigMemory operate mainly off heap which means that the objects must be serialized/deserialized to/from memory outside jvm. Simply marking non-gc regions within the jvm would seem a better solution. Using Files for cache is not an option either, it has the same unaffordable serialization/deserialization overhead - use case is a HA server with lots of data used in random (human) order and low latency needed.

Any memory the JVM manages is also garbage-collected by the JVM. And any “live” objects which are directly available to Java methods without deserialization have to live in JVM memory. Therefore in my understanding you cannot have live objects which are immune to garbage collection.
On the other hand, the usage you describe should make the generational approach to garbage collection quite efficient. If your big objects stay around for a while, they will be checked for reclamation less often. So I doubt there is much to be gained from avoiding those checks.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
No it is not possible.
You can prevent objects from being garbage collected by keeping them reachable, but the GC will still need to trace them to check reachability on each full; GC (at least).
Is simply my assumption, that when the jvm is starving it begins scanning all those unnecessary objects too.
Yes. That is correct. However, unless you've got LOTS of objects that you want to be treated this way, the overhead is likely to be insignificant. (And anyway, a better idea is to give the JVM more memory ... if that is possible.)

Quite simply, for you to be able to do this, the garbage collection algorithm would need to be aware of such a flag, and take it into account when doing its work.
I'm not aware of any of the standard GC algorithms having such a flag, so for this to work you would need to write your own GC algorithm (after deciding on some feasible way to communicate this information to it).
In principle, in fact, you've already started down this track - you're deciding how garbage collection should be done rather than being happy to leaving it to the JVM's GC algo. Is the situation you describe a measurable problem for you; something for which the existing garbage collection is insufficient, but your plan would work? Garbage collectors are extremely well-tuned, so I wouldn't be surprised if the "inefficient" default strategy is actually faster than your naively-optimal one.
(Doing manual memory management is tricky and error-prone at the best of times; managing some memory yourself while using a stock garbage collector to handle the rest seems even worse. I expect you'd run into a lot of edge cases where the GC assumes it "knows" what's happening with the whole heap, which would no longer be true. Steer clear if you can...)

The recommended approaches would be to use either a commerical RTSJ implementation to avoid GC, or to use off heap memory. One could also look into soft references for caches as well (they do get collected).
This is not recommended:
If for some reason you do not believe these options are sufficient, you could look into direct memory access which is UNSAFE (part of sun.misc.Unsafe). You can use the 'theUnsafe' field to get the 'Unsafe' instance. Unsafe allows to allocation/deallocate memory via 'allocateMemory' and 'freeMemory'. This is not under GC control nor limited by JVM heap size. The impact on GC/application, once you go down this route, is not guaranteed - which is why using byte buffers might be the way to go (if you're not using a RTSJ like implementation).
Hope this helps.

Living Java objects will always be part of the GC life cycle. Or said another way, marking an object to be non-gc is the same order of overhead than having your object referenced by a root reference (a static final map for instance).
But thinking a bit further, data put in a cache are most likely to be temporary, and would eventually be evicted. At that point you will start again to like the JVM and the GC.
If you have 100's of GBs of permanent data, you may want to rethink the architecture of your application, and try to shard and distribute your data (horizontally scalability).
Last but not least, lots of work has been done around serialization, and the overhead of serialization (I'm not speaking about the poor reputation of ObjectInputStream and ObjectOutputStream) is not that big.
More than that, if your data is mainly composed of primitive types (including bytes array), there is efficient way to readInt() or readBytes() from off heap buffers (for instannce netty.io's ChannelBuffer). This could be a way to go.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.