How to get the statistics of the garbage collected objects? - java

Is it possible to see the java objects (and their class type) that were made null and which are
Not yet garbage collected/cleaned
garbage collected/cleaned.
This statistic will help to know that how many objects repeatedly created (by a wrong logic) instead of creating one time.

I think that it is theoretically possible, though frankly you would be crazy to try it.
The route to finding unreachable objects is to use the Java VM Tool Interface (JVMTI) to iterate over all objects in the heap (reachable or unreachable) in order to find the one you are looking for. Then you extract its state via JVMTI and (somehow) reify it so that you can display it.
Normally you would do this in a separate JVM; e.g. the one running your debugger or profiling tool. But it is possible for an application to attach an agent to itself, and use it to dig around in the JVM. However, this is not the intended usage for JVMTI, and I would anticipate that there could be "hazards" in doing this.
You can read more here:
Creating a Debugging and Profiling Agent with JVMTI
Own your heap: Iterate class instances with JVMTI
But please don't blame me if you go crazy trying to get this working.
UPDATE I concur with Marko's note that you are unlikely to learn anything significant by looking at unreachable objects.

to display the unwanted or null java objects that are not cleaned by the java garbage process
This is not a well-defined concept; at least there are no useful definitions which would give you anything of relevance.
A piece of memory where an object was allocated can be considered free for all practical purposes as soon as that object has become unreachable. The amount of memory that the block represents is available to the JVM allocator in the sense that no out-of-memory event will happen due to that block being "overlooked" in some sense.
Further note that many "garbage collection" algorithms usually do the exact opposite: they find live objects and relocate them so they occupy a contiguous block of memory. The algorithms are simply oblivious to "garbage" objects and treat them as just empty space.
So, even if you manage to write up some low-level Java Agent-based module which will enumerate all the objects on the heap, you will not gain any interesting insight: the unreachable objects which you encounter will just happen to linger on because the JVM has not yet felt the need to reuse their memory.

Related

After Text was deleted from JTextArea, heap is not empty

I have an application with AWT GUI, and I use JTextArea for logging output. If I erase the text with setText(null) or removeAll() or setText("") and then run garbage collector System.gc(), I notice that the whole text still in memory. How can I really delete the text?
I'm not very familiar with profiler, here is what I see in memory dump after setText(null):
Please have a read on: How Garbage Collection works in Java.
As per the docs System.gc():
Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects
NB - suggests. This means that the garbage collector is only suggested to do a clean up and not forced also it may entirely ignore your request, thus we cannot know when the garbage will be collected only that it will be in time.
NB - disgarded objects: this refers to all objects that are not static/final or in use/referenced by any other instances/classes/fields/variables etc.
Here is also an interesting question I found on the topic:
Why is it a bad practice to call System.gc?
with the top answer going along the lines of:
The reason everyone always says to avoid System.gc() is that it is a
pretty good indicator of fundamentally broken code. Any code that
depends on it for correctness is certainly broken; any that rely on it
for performance are most likely broken
and further there has even been a bug submitted for the bad phrasing of the documentation:
http://bugs.sun.com/view_bug.do?bug_id=6668279
.
As #DavidK notes System.gc() is not a useful way to examine this. Using the mechanism described here, most profilers can force garbage collection in a way that, subject to some limitations, is a useful debugging tool.
if there are any String objects holding this content in your client program, please set them to null as well.
Also you don't need to explicitly call the System.gc() mothod. JVM does garbage collects the orphaned objects when ever it needs more memory to allocate for other objects.
you only need to worry about if you a see an out of memory / continuous heap memory increase usage etc.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
Something along the lines of http://wwwasd.web.cern.ch/wwwasd/lhc++/Objectivity/V5.2/Java/guide/jgdStorage.fm.html and specifically non-garbage-collectible containers there (non-garbage-collectable?).
The problem is that I have lots of ordinary temporary objects, but I have even bigger (several Gigs) of objects that are stored for Cache purposes. For no reason should the Java GC traverse all those Cache gigabytes trying to find anything to collect, because they contain cached data which have their own timeouts.
This way I could partition my data in a custom way into infinite-lived and normal-lived objects, and hopefully GC would be quite fast because normal objects don't live so long and amount to smaller amounts.
There are some workarounds to this problem, such as Apache DirectMemory and Commercial Terracotta BigMemory(http://terracotta.org/products/bigmemory), but a java-native solution would be nicer (I mean free and probably more reliable?). Also I want to avoid serialization overhead which means it should happen within same jvm. To my understanding DirectMemory and BigMemory operate mainly off heap which means that the objects must be serialized/deserialized to/from memory outside jvm. Simply marking non-gc regions within the jvm would seem a better solution. Using Files for cache is not an option either, it has the same unaffordable serialization/deserialization overhead - use case is a HA server with lots of data used in random (human) order and low latency needed.
Any memory the JVM manages is also garbage-collected by the JVM. And any “live” objects which are directly available to Java methods without deserialization have to live in JVM memory. Therefore in my understanding you cannot have live objects which are immune to garbage collection.
On the other hand, the usage you describe should make the generational approach to garbage collection quite efficient. If your big objects stay around for a while, they will be checked for reclamation less often. So I doubt there is much to be gained from avoiding those checks.
Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
No it is not possible.
You can prevent objects from being garbage collected by keeping them reachable, but the GC will still need to trace them to check reachability on each full; GC (at least).
Is simply my assumption, that when the jvm is starving it begins scanning all those unnecessary objects too.
Yes. That is correct. However, unless you've got LOTS of objects that you want to be treated this way, the overhead is likely to be insignificant. (And anyway, a better idea is to give the JVM more memory ... if that is possible.)
Quite simply, for you to be able to do this, the garbage collection algorithm would need to be aware of such a flag, and take it into account when doing its work.
I'm not aware of any of the standard GC algorithms having such a flag, so for this to work you would need to write your own GC algorithm (after deciding on some feasible way to communicate this information to it).
In principle, in fact, you've already started down this track - you're deciding how garbage collection should be done rather than being happy to leaving it to the JVM's GC algo. Is the situation you describe a measurable problem for you; something for which the existing garbage collection is insufficient, but your plan would work? Garbage collectors are extremely well-tuned, so I wouldn't be surprised if the "inefficient" default strategy is actually faster than your naively-optimal one.
(Doing manual memory management is tricky and error-prone at the best of times; managing some memory yourself while using a stock garbage collector to handle the rest seems even worse. I expect you'd run into a lot of edge cases where the GC assumes it "knows" what's happening with the whole heap, which would no longer be true. Steer clear if you can...)
The recommended approaches would be to use either a commerical RTSJ implementation to avoid GC, or to use off heap memory. One could also look into soft references for caches as well (they do get collected).
This is not recommended:
If for some reason you do not believe these options are sufficient, you could look into direct memory access which is UNSAFE (part of sun.misc.Unsafe). You can use the 'theUnsafe' field to get the 'Unsafe' instance. Unsafe allows to allocation/deallocate memory via 'allocateMemory' and 'freeMemory'. This is not under GC control nor limited by JVM heap size. The impact on GC/application, once you go down this route, is not guaranteed - which is why using byte buffers might be the way to go (if you're not using a RTSJ like implementation).
Hope this helps.
Living Java objects will always be part of the GC life cycle. Or said another way, marking an object to be non-gc is the same order of overhead than having your object referenced by a root reference (a static final map for instance).
But thinking a bit further, data put in a cache are most likely to be temporary, and would eventually be evicted. At that point you will start again to like the JVM and the GC.
If you have 100's of GBs of permanent data, you may want to rethink the architecture of your application, and try to shard and distribute your data (horizontally scalability).
Last but not least, lots of work has been done around serialization, and the overhead of serialization (I'm not speaking about the poor reputation of ObjectInputStream and ObjectOutputStream) is not that big.
More than that, if your data is mainly composed of primitive types (including bytes array), there is efficient way to readInt() or readBytes() from off heap buffers (for instannce netty.io's ChannelBuffer). This could be a way to go.

How to memory profile in Java?

I'm still learning the ropes of Java so sorry if there's a obvious answer to this. I have a program that is taking a ton of memory and I want to figure a way to reduce its usage, but after reading many SO questions I have the idea that I need to prove where the problem is before I start optimizing it.
So here's what I did, I added a break point to the start of my program and ran it, then I started visualVM and had it profile the memory(I also did the same thing in netbeans just to compare the results and they are the same). My problem is I don't know how to read them, I got the highest area just saying char[] and I can't see any code or anything(which makes sense because visualvm is connecting to the jvm and can't see my source, but netbeans also does not show me the source as it does when doing cpu profiling).
Basically what I want to know is which variable(and hopefully more details like in which method) all the memory is being used so I can focus on working there. Is there a easy way to do this? I right now I am using eclipse and java to develop(and installed visualVM and netbeans specifically for profiling but am willing to install anything else that you feel gets this job done).
EDIT: Ideally, I'm looking for something that will take all my objects and sort them by size(so I can see which one is hogging memory). Currently it returns generic information such as string[] or int[] but I want to know which object its referring to so I can work on getting its size more optimized.
Strings are problematic
Basically in Java, String references ( things that use char[] behind the scenes ) will dominate most business applications memory wise. How they are created determines how much memory they consume in the JVM.
Just because they are so fundamental to most business applications as a data type, and they are one of the most memory hungry as well. This isn't just a Java thing, String data types take up lots of memory in pretty much every language and run time library, because at the least they are just arrays of 1 byte per character or at the worse ( Unicode ) they are arrays of multiple bytes per character.
Once when profiling CPU usage on a web app that also had an Oracle JDBC dependency I discovered that StringBuffer.append() dominated the CPU cycles by many orders of magnitude over all other method calls combined, much less any other single method call. The JDBC driver did lots and lots of String manipulation, kind of the trade off of using PreparedStatements for everything.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
It is most likely lots of smaller allocations because of StringBuffer/StringBuilder objects not sized appropriately on first instantantations and then having to automatically grow the char[] arrays to hold subsequent append() calls.
These intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.
In JProfiler, you can take go to the heap walker and activate the biggest objects view. You will see the objects the retain most memory. "Retained" memory is the memory that would be freed by the garbage collector if you removed the object.
You can then open the object nodes to see the reference tree of the retained objects. Here's a screen shot of the biggest object view:
Disclaimer: My company develops JProfiler
I would recommend capturing heap dumps and using a tool like Eclipse MAT that lets you analyze them. There are many tutorials available. It provides a view of the dominator tree to provide insight into the relationships between the objects on the heap. Specifically for what you mentioned, the "path to GC roots" feature of MAT will tell you where the majority of those char[], String[] and int[] objects are being referenced. JVisualVM can also be useful in identifying leaks and allocations, particularly by using snapshots with allocation stack traces. There are quite a few walk-throughs of the process of getting the snapshots and comparing them to find the allocation point.
Java JDK comes with JVisualVM under bin folder, once your application server (for example is running) you can run visualvm and connect it to your localhost, which will provide you memory allocation and enable you to perform heap dump
If you use visualVM to check your memory usage, it focuses on the data, not the methods. Maybe your big char[] data is caused by many String values? Unless you are using recursion, the data will not be from local variables. So you can focus on the methods that insert elements into large data structures. To find out what precise statements cause your "memory leakage", I suggest you additionally
read Josh Bloch's Effective Java Item 6: (Eliminate obsolete object references)
use a logging framework an log instance creations on the highest verbosity level.
There are generally two distinct approaches to analyse Java code to gain an understanding of its memory allocation profile. If you're trying to measure the impact of a specific, small section of code – say you want to compare two alternative implementations in order to decide which one gives better runtime performance – you would use a microbenchmarking tool such as JMH.
While you can pause the running program, the JVM is a sophisticated runtime that performs a variety of housekeeping tasks and it's really hard to get a "point in time" snapshot and an accurate reading of the "level of memory usage". It might allocate/free memory at a rate that does not directly reflect the behaviour of the running Java program. Similarly, performing a Java object heap dump does not fully capture the low-level machine specific memory layout that dictates the actual memory footprint, as this could depend on the machine architecture, JVM version, and other runtime factors.
Tools like JMH get around this by repeatedly running a small section of code, and observing a long-running average of memory allocations across a number of invocations. E.g. in the GC profiling sample JMH benchmark the derived *·gc.alloc.rate.norm metric gives a reasonably accurate per-invocation normalised memory cost.
In the more general case, you can attach a profiler to a running application and get JVM-level metrics, or perform a heap dump for offline analysis. Some commonly used tools for profiling full applications are Async Profiler and the newly open-sourced Java Flight Recorder in conjunction with Java Mission Control to visualise results.

Dynamic Memory Handling Java vs C++

I am a C++ programmer currently trying to work on Java. Working on C++ I have an habit of keeping track of dynamic memory allocations and employing various techniques like RAII to avoid memory leak. Java as we know provides a Garbage Collector(GC) to take care of memory leaks.So while programing in Java should one just let go all the wholesome worries of heap memory and leave it for GC to take care of the memory leaks or should one have a approach similar to that while programming languages without GC, try to take care of the memory you allocate and just let GC take care of ones that you might miss out. What should be the approach? What are the downsides of either?
I'm not sure what you mean by trying to take care of the memory you allocate in presence of a GC, but I'll try some mind reading.
Basically, you shouldn't "worry" about your memory being collected. If you don't reference objects anymore, they will be picked up. Logical memory leaks are still possible if you create a situation where objects are referenced for the rest of your program (example: register listeners and never un-register them, example: implementing a vector-like collection that doesn't set items to null when removing items off the end).
However, if you have a strong RAII background, you'll be disapointed to learn that there is no direct equivalent in Java. The GC is a first-class tool for dealing with memory, but there is no guaranteed on when (or even if) finalizers are called. This means that the first-class treatment applied to memory is not applied to any other resource, such as: windows, database connections, sockets, files, synchronization primitives, etc.
With Java and .net (and i imagine, most other GC'ed languages), you don't need to worry much about heap memory at all. What you do need to worry about, is native resources like file handles, sockets, GUI primitives like fonts and such. Those generally need to be "disposed", which releases the native resources. (They often dispose themselves on finalization anyway, but it's kinda iffy to fall back on that. Dispose stuff yourself.)
With a GC, you have to:
still take care to properly release non-memory resources like file handles and DB connections.
make sure you don't keep references to objects you don't need anymore (like keeping them in collections). Because if you do that, you have a memory leak.
Apart from that, you can't really "take care of the memory you allocate", and trying to do so would be a waste of time.
Technically you don't need to worry about cleaning up after memory allocations since all objects are properly reference counted and the GC will take care of everything. In practice, an overactive GC will negatively impact performance. So while Java does not have a delete operator, you will do well to reuse objects as much as possible.
Also Java does not have destructors since objects will exists until the GC gets to them,. Java therefore has the finally construct which you should use to ensure that all non-memory related resources (files sockets etc) are closed when you are finished with them. Do not rely on the finalise method to do this for you.
In Java the GC takes care of allocating memory and freeing unused memory. This does not mean you can disregard the issue alltogether.
The Java GC frees objects that have are not referenced from the root. This means that Java can still have memory leaks if you are not carefull to remove references from global contexts like caches in global HashMaps, etc.
If any cluster of objects that reference eachother are not referenced from the root, the Java GC will free them. I.e. it does not work with reference counts, so you do not need to null all object references (although some coding styles do prefer clearing references as sonn as they are not needed anymore.)

Why is it bad practice to call System.gc()?

After answering a question about how to force-free objects in Java (the guy was clearing a 1.5GB HashMap) with System.gc(), I was told it's bad practice to call System.gc() manually, but the comments were not entirely convincing. In addition, no one seemed to dare to upvote, nor downvote my answer.
I was told there that it's bad practice, but then I was also told that garbage collector runs don't systematically stop the world anymore, and that it could also effectively be used by the JVM only as a hint, so I'm kind of at loss.
I do understand that the JVM usually knows better than you when it needs to reclaim memory. I also understand that worrying about a few kilobytes of data is silly. I also understand that even megabytes of data isn't what it was a few years back. But still, 1.5 gigabytes? And you know there's like 1.5 GB of data hanging around in memory; it's not like it's a shot in the dark. Is System.gc() systematically bad, or is there some point at which it becomes okay?
So the question is actually double:
Why is or isn't it bad practice to call System.gc()? Is it really merely a hint to the JVM under certain implementations, or is it always a full collection cycle? Are there really garbage collector implementations that can do their work without stopping the world? Please shed some light over the various assertions people have made in the comments to my answer.
Where's the threshold? Is it never a good idea to call System.gc(), or are there times when it's acceptable? If so, what are those times?
The reason everyone always says to avoid System.gc() is that it is a pretty good indicator of fundamentally broken code. Any code that depends on it for correctness is certainly broken; any that rely on it for performance are most likely broken.
You don't know what sort of garbage collector you are running under. There are certainly some that do not "stop the world" as you assert, but some JVMs aren't that smart or for various reasons (perhaps they are on a phone?) don't do it. You don't know what it's going to do.
Also, it's not guaranteed to do anything. The JVM may just entirely ignore your request.
The combination of "you don't know what it will do," "you don't know if it will even help," and "you shouldn't need to call it anyway" are why people are so forceful in saying that generally you shouldn't call it. I think it's a case of "if you need to ask whether you should be using this, you shouldn't"
EDIT to address a few concerns from the other thread:
After reading the thread you linked, there's a few more things I'd like to point out.
First, someone suggested that calling gc() may return memory to the system. That's certainly not necessarily true - the Java heap itself grows independently of Java allocations.
As in, the JVM will hold memory (many tens of megabytes) and grow the heap as necessary. It doesn't necessarily return that memory to the system even when you free Java objects; it is perfectly free to hold on to the allocated memory to use for future Java allocations.
To show that it's possible that System.gc() does nothing, view
JDK bug 6668279
and in particular that there's a -XX:DisableExplicitGC VM option:
By default calls to System.gc() are enabled (-XX:-DisableExplicitGC). Use -XX:+DisableExplicitGC to disable calls to System.gc(). Note that the JVM still performs garbage collection when necessary.
It has already been explained that calling system.gc() may do nothing, and that any code that "needs" the garbage collector to run is broken.
However, the pragmatic reason that it is bad practice to call System.gc() is that it is inefficient. And in the worst case, it is horribly inefficient! Let me explain.
A typical GC algorithm identifies garbage by traversing all non-garbage objects in the heap, and inferring that any object not visited must be garbage. From this, we can model the total work of a garbage collection consists of one part that is proportional to the amount of live data, and another part that is proportional to the amount of garbage; i.e. work = (live * W1 + garbage * W2).
Now suppose that you do the following in a single-threaded application.
System.gc(); System.gc();
The first call will (we predict) do (live * W1 + garbage * W2) work, and get rid of the outstanding garbage.
The second call will do (live* W1 + 0 * W2) work and reclaim nothing. In other words we have done (live * W1) work and achieved absolutely nothing.
We can model the efficiency of the collector as the amount of work needed to collect a unit of garbage; i.e. efficiency = (live * W1 + garbage * W2) / garbage. So to make the GC as efficient as possible, we need to maximize the value of garbage when we run the GC; i.e. wait until the heap is full. (And also, make the heap as big as possible. But that is a separate topic.)
If the application does not interfere (by calling System.gc()), the GC will wait until the heap is full before running, resulting in efficient collection of garbage1. But if the application forces the GC to run, the chances are that the heap won't be full, and the result will be that garbage is collected inefficiently. And the more often the application forces GC, the more inefficient the GC becomes.
Note: the above explanation glosses over the fact that a typical modern GC partitions the heap into "spaces", the GC may dynamically expand the heap, the application's working set of non-garbage objects may vary and so on. Even so, the same basic principal applies across the board to all true garbage collectors2. It is inefficient to force the GC to run.
1 - This is how the "throughput" collector works. Concurrent collectors such as CMS and G1 use different criteria to decide when to start the garbage collector.
2 - I'm also excluding memory managers that use reference counting exclusively, but no current Java implementation uses that approach ... for good reason.
Lots of people seem to be telling you not to do this. I disagree. If, after a large loading process like loading a level, you believe that:
You have a lot of objects that are unreachable and may not have been gc'ed. and
You think the user could put up with a small slowdown at this point
there is no harm in calling System.gc(). I look at it like the c/c++ inline keyword. It's just a hint to the gc that you, the developer, have decided that time/performance is not as important as it usually is and that some of it could be used reclaiming memory.
Advice to not rely on it doing anything is correct. Don't rely on it working, but giving the hint that now is an acceptable time to collect is perfectly fine. I'd rather waste time at a point in the code where it doesn't matter (loading screen) than when the user is actively interacting with the program (like during a level of a game.)
There is one time when i will force collection: when attempting to find out is a particular object leaks (either native code or large, complex callback interaction. Oh and any UI component that so much as glances at Matlab.) This should never be used in production code.
People have been doing a good job explaining why NOT to use, so I will tell you a couple situations where you should use it:
(The following comments apply to Hotspot running on Linux with the CMS collector, where I feel confident saying that System.gc() does in fact always invoke a full garbage collection).
After the initial work of starting up your application, you may be a terrible state of memory usage. Half your tenured generation could be full of garbage, meaning that you are that much closer to your first CMS. In applications where that matters, it is not a bad idea to call System.gc() to "reset" your heap to the starting state of live data.
Along the same lines as #1, if you monitor your heap usage closely, you want to have an accurate reading of what your baseline memory usage is. If the first 2 minutes of your application's uptime is all initialization, your data is going to be messed up unless you force (ahem... "suggest") the full gc up front.
You may have an application that is designed to never promote anything to the tenured generation while it is running. But maybe you need to initialize some data up-front that is not-so-huge as to automatically get moved to the tenured generation. Unless you call System.gc() after everything is set up, your data could sit in the new generation until the time comes for it to get promoted. All of a sudden your super-duper low-latency, low-GC application gets hit with a HUGE (relatively speaking, of course) latency penalty for promoting those objects during normal operations.
It is sometimes useful to have a System.gc call available in a production application for verifying the existence of a memory leak. If you know that the set of live data at time X should exist in a certain ratio to the set of live data at time Y, then it could be useful to call System.gc() a time X and time Y and compare memory usage.
This is a very bothersome question, and I feel contributes to many being opposed to Java despite how useful of a language it is.
The fact that you can't trust "System.gc" to do anything is incredibly daunting and can easily invoke "Fear, Uncertainty, Doubt" feel to the language.
In many cases, it is nice to deal with memory spikes that you cause on purpose before an important event occurs, which would cause users to think your program is badly designed/unresponsive.
Having ability to control the garbage collection would be very a great education tool, in turn improving people's understanding how the garbage collection works and how to make programs exploit it's default behavior as well as controlled behavior.
Let me review the arguments of this thread.
It is inefficient:
Often, the program may not be doing anything and you know it's not doing anything because of the way it was designed. For instance, it might be doing some kind of long wait with a large wait message box, and at the end it may as well add a call to collect garbage because the time to run it will take a really small fraction of the time of the long wait but will avoid gc from acting up in the middle of a more important operation.
It is always a bad practice and indicates broken code.
I disagree, it doesn't matter what garbage collector you have. Its' job is to track garbage and clean it.
By calling the gc during times where usage is less critical, you reduce odds of it running when your life relies on the specific code being run but instead it decides to collect garbage.
Sure, it might not behave the way you want or expect, but when you do want to call it, you know nothing is happening, and user is willing to tolerate slowness/downtime. If the System.gc works, great! If it doesn't, at least you tried. There's simply no down side unless the garbage collector has inherent side effects that do something horribly unexpected to how a garbage collector is suppose to behave if invoked manually, and this by itself causes distrust.
It is not a common use case:
It is a use case that cannot be achieved reliably, but could be if the system was designed that way. It's like making a traffic light and making it so that some/all of the traffic lights' buttons don't do anything, it makes you question why the button is there to begin with, javascript doesn't have garbage collection function so we don't scrutinize it as much for it.
The spec says that System.gc() is a hint that GC should run and the VM is free to ignore it.
what is a "hint"? what is "ignore"? a computer cannot simply take hints or ignore something, there are strict behavior paths it takes that may be dynamic that are guided by the intent of the system. A proper answer would include what the garbage collector is actually doing, at implementation level, that causes it to not perform collection when you request it. Is the feature simply a nop? Is there some kind of conditions that must me met? What are these conditions?
As it stands, Java's GC often seems like a monster that you just don't trust. You don't know when it's going to come or go, you don't know what it's going to do, how it's going to do it. I can imagine some experts having better idea of how their Garbage Collection works on per-instruction basis, but vast majority simply hopes it "just works", and having to trust an opaque-seeming algorithm to do work for you is frustrating.
There is a big gap between reading about something or being taught something, and actually seeing the implementation of it, the differences across systems, and being able to play with it without having to look at the source code. This creates confidence and feeling of mastery/understanding/control.
To summarize, there is an inherent problem with the answers "this feature might not do anything, and I won't go into details how to tell when it does do something and when it doesn't and why it won't or will, often implying that it is simply against the philosophy to try to do it, even if the intent behind it is reasonable".
It might be okay for Java GC to behave the way it does, or it might not, but to understand it, it is difficult to truly follow in which direction to go to get a comprehensive overview of what you can trust the GC to do and not to do, so it's too easy simply distrust the language, because the purpose of a language is to have controlled behavior up to philosophical extent(it's easy for a programmer, especially novices to fall into existential crisis from certain system/language behaviors) you are capable of tolerating(and if you can't, you just won't use the language until you have to), and more things you can't control for no known reason why you can't control them is inherently harmful.
Sometimes (not often!) you do truly know more about past, current and future memory usage then the run time does. This does not happen very often, and I would claim never in a web application while normal pages are being served.
Many year ago I work on a report generator, that
Had a single thread
Read the “report request” from a queue
Loaded the data needed for the report from the database
Generated the report and emailed it out.
Repeated forever, sleeping when there were no outstanding requests.
It did not reuse any data between reports and did not do any cashing.
Firstly as it was not real time and the users expected to wait for a report, a delay while the GC run was not an issue, but we needed to produce reports at a rate that was faster than they were requested.
Looking at the above outline of the process, it is clear that.
We know there would be very few live objects just after a report had been emailed out, as the next request had not started being processed yet.
It is well known that the cost of running a garbage collection cycle is depending on the number of live objects, the amount of garbage has little effect on the cost of a GC run.
That when the queue is empty there is nothing better to do then run the GC.
Therefore clearly it was well worth while doing a GC run whenever the request queue was empty; there was no downside to this.
It may be worth doing a GC run after each report is emailed, as we know this is a good time for a GC run. However if the computer had enough ram, better results would be obtained by delaying the GC run.
This behaviour was configured on a per installation bases, for some customers enabling a forced GC after each report greatly speeded up the production of reports. (I expect this was due to low memory on their server and it running lots of other processes, so hence a well time forced GC reduced paging.)
We never detected an installation that did not benefit from a forced GC run every time the work queue was empty.
But, let be clear, the above is not a common case.
These days I would be more inclined to run each report in a seperate process leaving the operating system to clear up memory rather then the garbage collector and having the custom queue manager service use mulple working processes on large servers.
GC efficiency relies on a number of heuristics. For instance, a common heuristic is that write accesses to objects usually occur on objects which were created not long ago. Another is that many objects are very short-lived (some objects will be used for a long time, but many will be discarded a few microseconds after their creation).
Calling System.gc() is like kicking the GC. It means: "all those carefully tuned parameters, those smart organizations, all the effort you just put into allocating and managing the objects such that things go smoothly, well, just drop the whole lot, and start from scratch". It may improve performance, but most of the time it just degrades performance.
To use System.gc() reliably(*) you need to know how the GC operates in all its fine details. Such details tend to change quite a bit if you use a JVM from another vendor, or the next version from the same vendor, or the same JVM but with slightly different command-line options. So it is rarely a good idea, unless you want to address a specific issue in which you control all those parameters. Hence the notion of "bad practice": that's not forbidden, the method exists, but it rarely pays off.
(*) I am talking about efficiency here. System.gc() will never break a correct Java program. It will neither conjure extra memory that the JVM could not have obtained otherwise: before throwing an OutOfMemoryError, the JVM does the job of System.gc(), even if as a last resort.
Maybe I write crappy code, but I've come to realize that clicking the trash-can icon on eclipse and netbeans IDEs is a 'good practice'.
Some of what I am about to write is simply a summarization of what has already been written in other answers, and some is new.
The question "Why is it bad practice to call System.gc()?" does not compute. It assumes that it is bad practice, while it is not. It greatly depends on what you are trying to accomplish.
The vast majority of programmers out there have no need for System.gc(), and it will never do anything useful to them in the vast majority of use cases. So, for the majority, calling it is bad practice because it will not do whatever it is that they think it will do, it will only add overhead.
However, there are a few rare cases where invoking System.gc() is actually beneficial:
When you are absolutely sure that you have some CPU time to spare now, and you want to improve the throughput of code that will run later. For example, a web server that discovers that there are no pending web requests at the moment can initiate garbage collection now, so as to reduce the chances that garbage collection will be needed during the processing of a barrage of web requests later on. (Of course this can hurt if a web request arrives during collection, but the web server could be smart about it and abandon collection if a request comes in.) Desktop GUIs are another example: on the idle event (or, more broadly, after a period of inactivity,) you can give the JVM a hint that if it has any garbage collection to do, now is better than later.
When you want to detect memory leaks. This is often done in combination with a debug-mode-only finalizer, or with the java.lang.ref.Cleaner class from Java 9 onwards. The idea is that by forcing garbage collection now, and thus discovering memory leaks now as opposed to some random point in time in the future, you can detect the memory leaks as soon as possible after they have happened, and therefore be in a better position to tell precisely which piece of code has leaked memory and why. (Incidentally, this is also one of, or perhaps the only, legitimate use cases for finalizers or the Cleaner. The practice of using finalization for recycling of unmanaged resources is flawed, despite being very widespread and even officially recommended, because it is non-deterministic. For more on this topic, read this: https://blog.michael.gr/2021/01/object-lifetime-awareness.html)
When you are measuring the performance of code, (benchmarking,) in order to reduce/minimize the chances of garbage collection occurring during the benchmark, or in order to guarantee that whatever overhead is suffered due to garbage collection during the benchmark is due to garbage generated by the code under benchmark, and not by unrelated code. A good benchmark always starts with an as thorough as possible garbage collection.
When you are measuring the memory consumption of code, in order to determine how much garbage is generated by a piece of code. The idea is to perform a full garbage collection so as to start in a clean state, run the code under measurement, obtain the heap size, then do another full garbage collection, obtain the heap size again, and take the difference. (Incidentally, the ability to temporarily suppress garbage collection while running the code under measurement would be useful here, alas, the JVM does not support that. This is deplorable.)
Note that of the above use cases, only one is in a production scenario; the rest are in testing / diagnostics scenarios.
This means that System.gc() can be quite useful under some circumstances, which in turn means that it being "only a hint" is problematic.
(For as long as the JVM is not offering some deterministic and guaranteed means of controlling garbage collection, the JVM is broken in this respect.)
Here is how you can turn System.gc() into a bit less of a hint:
private static void runGarbageCollection()
{
for( WeakReference<Object> ref = new WeakReference<>( new Object() ); ; )
{
System.gc(); //optional
Runtime.getRuntime().runFinalization(); //optional
if( ref.get() == null )
break;
Thread.yield();
}
}
This still does not guarantee that you will get a full GC, but it gets a lot closer. Specifically, it will give you some amount of garbage collection even if the -XX:DisableExplicitGC VM option has been used. (So, it truly uses System.gc() as a hint; it does not rely on it.)
Yes, calling System.gc() doesn't guarantee that it will run, it's a request to the JVM that may be ignored. From the docs:
Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects
It's almost always a bad idea to call it because the automatic memory management usually knows better than you when to gc. It will do so when its internal pool of free memory is low, or if the OS requests some memory be handed back.
It might be acceptable to call System.gc() if you know that it helps. By that I mean you've thoroughly tested and measured the behaviour of both scenarios on the deployment platform, and you can show it helps. Be aware though that the gc isn't easily predictable - it may help on one run and hurt on another.
First, there is a difference between spec and reality. The spec says that System.gc() is a hint that GC should run and the VM is free to ignore it. The reality is, the VM will never ignore a call to System.gc().
Calling GC comes with a non-trivial overhead to the call and if you do this at some random point in time it's likely you'll see no reward for your efforts. On the other hand, a naturally triggered collection is very likely to recoup the costs of the call. If you have information that indicates that a GC should be run than you can make the call to System.gc() and you should see benefits. However, it's my experience that this happens only in a few edge cases as it's very unlikely that you'll have enough information to understand if and when System.gc() should be called.
One example listed here, hitting the garbage can in your IDE. If you're off to a meeting why not hit it. The overhead isn't going to affect you and heap might be cleaned up for when you get back. Do this in a production system and frequent calls to collect will bring it to a grinding halt! Even occasional calls such as those made by RMI can be disruptive to performance.
In my experience, using System.gc() is effectively a platform-specific form of optimization (where "platform" is the combination of hardware architecture, OS, JVM version and possible more runtime parameters such as RAM available), because its behaviour, while roughly predictable on a specific platform, can (and will) vary considerably between platforms.
Yes, there are situations where System.gc() will improve (perceived) performance. On example is if delays are tolerable in some parts of your app, but not in others (the game example cited above, where you want GC to happen at the start of a level, not during the level).
However, whether it will help or hurt (or do nothing) is highly dependent on the platform (as defined above).
So I think it is valid as a last-resort platform-specific optimization (i.e. if other performance optimizations are not enough). But you should never call it just because you believe it might help(without specific benchmarks), because chances are it will not.
Since objects are dynamically allocated by using the new operator,
you might be wondering how such objects are destroyed and their
memory released for later reallocation.
In some languages, such as C++, dynamically allocated objects must
be manually released by use of a delete operator.
Java takes a different approach; it handles deallocation for you
automatically.
The technique that accomplishes this is called garbage collection.
It works like this: when no references to an object exist, that object is assumed to be no longer needed, and the memory occupied by the object can be reclaimed. There is no explicit need to destroy objects as in C++.
Garbage collection only occurs sporadically (if at all) during the
execution of your program.
It will not occur simply because one or more objects exist that are
no longer used.
Furthermore, different Java run-time implementations will take
varying approaches to garbage collection, but for the most part, you
should not have to think about it while writing your programs.

Categories