Related
Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?
Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.
In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.
Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.
At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.
It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.
Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.
public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null
I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...
Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)
I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.
Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.
Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html
"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx
In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.
As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}
Java programmers know that JVM runs a Garbage Collector, and System.gc() would just be a suggestion to JVM to run a Garbage Collector. It is not necessarily that if we use System.gc(), it would immediately run the GC.Please correct me if I misunderstand Java's Garbage Collector.
Is/are there any other way/s doing memory management other than relying on Java's Garbage Collector?If you intend to answer the question by some sort of programming practice that would help managing the memory, please do so.
The most important thing to remember about Java memory management is "nullify" your reference.
Only objects that are not referenced are to be garbage collected.
For example, objects in the following code is never get collected and your memory will be full just to do nothing.
List objs = new ArrayList();
for (int i = 0; i < Integer.MAX_VALUE; i++) objs.add(new Object());
But if you don't reference those object ... you can loop as much as you like without memory problem.
List objs = new ArrayList();
for (int i = 0; i < Integer.MAX_VALUE; i++) new Object();
So what ever you do, make sure you remove reference to object to no longer used (set reference to null or clear collection).
When the garbage collector will run is best left to JVM to decide. Well unless your program is about to start doing things that use a lot of memory and is speed critical so you may suggest JVM to run GC before going in as you may likely get the garbaged collected and extra memory to go on. Other wise, I personally see no reason to run System.gc().
Hope this helps.
Below is little summary I wrote back in the days (I stole it from some blog, but I can't remember where from - so no reference, sorry)
There is no manual way of doing garbage collection in Java.
Java Heap is divided into three generation for the sake of garbage collection. These are the young generation, tenured or old generation, and Perm area.
New objects are created in the young generation and subsequently moved to the old generation.
String pool is created in Perm area of Heap, Garbage collection can occur in perm space but depends on upon JVM to JVM.
Minor garbage collection is used to move an object from Eden space to Survivor 1 and Survivor 2 space, and Major collection is used to move an object from young to tenured generation.
Whenever Major garbage collection occurs application, threads stops during that period which will reduce application’s performance and throughput.
There are few performance improvements has been applied in garbage collection in Java 6 and we usually use JRE 1.6.20 for running our application.
JVM command line options -Xms and -Xmx is used to setup starting and max size for Java Heap. The ideal ratio of this parameter is either 1:1 or 1:1.5 based on my experience, for example, you can have either both –Xmx and –Xms as 1GB or –Xms 1.2 GB and 1.8 GB.
Command line options: -Xms:<min size> -Xmx:<max size>
Just to add to the discussion: Garbage Collection is not the only form of Memory Management in Java.
In the past, there have been efforts to avoid the GC in Java when implementing the memory management (see Real-time Specification for Java (RTSJ)). These efforts were mainly dedicated to real-time and embedded programming in Java for which GC was not suitable - due to performance overhead or GC-introduced latency.
The RTSJ characteristics
Immortal and Scoped Memory Management - see below for examples.
GC and Immortal/Scoped Memory can coexist withing one application
RTSJ requires a specially modified JVM.
RTSJ advantages:
low latency, no GC pauses
delivers predictable performance that is able to meet real-time system requirements
Why RTSJ failed/Did not make a big impact:
Scoped Memory concept is hard to program with, error-prone and difficult to learn.
Advance in Real-time GC algoritms reduced the GC pause-time in such way that Real-time GCs replaced the RTSJ in most of the real-time apps. However, Scoped Memories are still used in places where no latencies are tolerated.
Scoped Memory Code Example (take from An Example of Scoped Memory Usage):
import javax.realtime.*;
public class ScopedMemoryExample{
private LTMemory myMem;
public ScopedMemoryExample(int Size) {
// initialize memory
myMem = new LTMemory(1000, 5000);
}
public void periodicTask() {
while (true)) {
myMem.enter(new Runnable() {
public void run() {
// do some work in the SCOPED MEMORY
new Object();
...
// end of the enter() method, the scoped Memory is emptied.
}
});
}
}
}
Here, a ScopedMemory implementation called LTMemory is preallocated. Then a thread enters the scoped memory, allocates the temporary data that are needed only during the time of the computation. After the end of the computation, the thread leaves the scoped memory which immediately makes the whole content of the specific ScopedMemory to be emptied. No latency introduced, done in constant time e.g. predictable time, no GC is triggered.
From my experience, in java you should rely on the memory management that is provided by JVM itself.
The point I'd focus on in this topic is to configure it in a way acceptable for your use case. Maybe checking/understanding JVM tuning options would be useful: http://docs.oracle.com/cd/E15523_01/web.1111/e13814/jvm_tuning.htm
You cannot avoid garbage collection if you use Java. Maybe there are some obscure JVM implementations that do, but I don't know of any.
A properly tuned JVM shouldn't require any System.gc() hints to operate smoothly. The exact tuning you would need depends heavily on what your application does, but in my experience, I always turn on the concurrent-mark-and-sweep option with the following flag: -XX:+UseConcMarkSweepGC. This flag allows the JVM to take advantage of the extra cores in your CPU to clean up dead memory on a background thread. It helps to drastically reduce the amount of time your program is forcefully paused when doing garbage collections.
Well, the GC is always there -- you can't create objects that are outside its grasp (unless you use native calls or allocate a direct byte buffer, but in the latter case you don't really have an object, just a bunch of bytes). That said, it's definitely possible to circumvent the GC by reusing objects. For instance, if you need a bunch of ArrayList objects, you could just create each one as you need it and let the GC handle memory management; or you could call list.clear() on each one after you finish with it, and put it onto some queue where somebody else can use it.
Standard best practices are to not do that sort of reuse unless you have good reason to (ie, you've profiled and seen that the allocations + GC are a problem, and that reusing objects fixes that problem). It leads to more complicated code, and if you get it wrong it can actually make the GC's job harder (because of how the GC tracks objects).
Basically the idea in Java is that you should not deal with memory except using "new" to allocate new objects and ensure that there is no references left to objects when you are done with them.
All the rest is deliberately left to the Java Runtime and is - also deliberately - defined as vaguely as possible to allow the JVM designers the most freedom in doing so efficiently.
To use an analogy: Your operating system manages named areas of harddisk space (called "files") for you. Including deleting and reusing areas you do not want to use any more. You do not circumvent that mechanism but leave it to the operating system
You should focus on writing clear, simple code and ensure that your objects are properly done with. This will give the JVM the best possible working conditions.
You are correct in saying that System.gc() is a request to the compiler and not a command. But using below program you can make sure it happens.
import java.lang.ref.WeakReference;
public class GCRun {
public static void main(String[] args) {
String str = new String("TEMP");
WeakReference<String> wr = new WeakReference<String>(str);
str = null;
String temp = wr.get();
System.out.println("temp -- " + temp);
while(wr.get() != null) {
System.gc();
}
}
}
I would suggest to take a look at the following tutorials and its contents
This is a four part tutorial series to know about the basics of garbage collection in Java:
Java Garbage Collection Introduction
How Java Garbage Collection Works?
Types of Java Garbage Collectors
Monitoring and Analyzing Java Garbage Collection
I found This tutorial very helpful.
"Nullify"ing the reference when not required is the best way to make an object eligible for Garbage collection.
There are 4 ways in which an object can be Garbage collected.
Point the reference to null, once it is no longer required.
String s = new String("Java");
Once this String is not required, you can point it to null.
s = null;
Hence, s will be eligible for Garbage collection.
Point one object to another, so that both reference points to same object and one of the object is eligible for GC.
String s1 = new String("Java");
String s2 = new String("C++");
In future if s2 also needs to pointed to s1 then;
s1 = s2;
Then the object having "Java" will be eligible for GC.
All the objects created within a method are eligible for GC once the method is completed. Hence, once the method is destroyed from the stack of the thread then the corresponding objects in that method will be destroyed.
Island of Isolation is another concept where the objects with internal links and no extrinsic link to reference is eligible for Garbage collection.
"Island of isolation" of Garbage Collection
Examples:
Below is a method of Camera class in android. See how the developer has pointed mCameraSource to null once it is not required. This is expert level code.
public void release() {
if (mCameraSource != null) {
mCameraSource.release();
mCameraSource = null;
}
}
How Garbage Collector works?
Garbage collection is performed by the daemon thread called Garbage Collector. When there is sufficient memory available that time this demon thread has low priority and it runs in background. But when JVM finds that the heap is full and JVM wants to reclaim some memory then it increases the priority of Garbage collector thread and calls Runtime.getRuntime.gc() method which searches for all the objects which are not having reference or null reference and destroys those objects.
I just had an interview where I was asked to create a memory leak with Java.
Needless to say, I felt pretty dumb, having no idea how to start creating one.
What would an example be?
Here's a good way to create a true memory leak (objects inaccessible by running code but still stored in memory) in pure Java:
The application creates a long-running thread (or use a thread pool to leak even faster).
The thread loads a class via an (optionally custom) ClassLoader.
The class allocates a large chunk of memory (e.g. new byte[1000000]), stores a strong reference to it in a static field, and then stores a reference to itself in a ThreadLocal. Allocating the extra memory is optional (leaking the class instance is enough), but it will make the leak work that much faster.
The application clears all references to the custom class or the ClassLoader it was loaded from.
Repeat.
Due to the way ThreadLocal is implemented in Oracle's JDK, this creates a memory leak:
Each Thread has a private field threadLocals, which actually stores the thread-local values.
Each key in this map is a weak reference to a ThreadLocal object, so after that ThreadLocal object is garbage-collected, its entry is removed from the map.
But each value is a strong reference, so when a value (directly or indirectly) points to the ThreadLocal object that is its key, that object will neither be garbage-collected nor removed from the map as long as the thread lives.
In this example, the chain of strong references looks like this:
Thread object → threadLocals map → instance of example class → example class → static ThreadLocal field → ThreadLocal object.
(The ClassLoader doesn't really play a role in creating the leak, it just makes the leak worse because of this additional reference chain: example class → ClassLoader → all the classes it has loaded. It was even worse in many JVM implementations, especially prior to Java 7, because classes and ClassLoaders were allocated straight into permgen and were never garbage-collected at all.)
A variation on this pattern is why application containers (like Tomcat) can leak memory like a sieve if you frequently redeploy applications which happen to use ThreadLocals that in some way point back to themselves. This can happen for a number of subtle reasons and is often hard to debug and/or fix.
Update: Since lots of people keep asking for it, here's some example code that shows this behavior in action.
Static field holding an object reference [especially a final field]
class MemorableClass {
static final ArrayList list = new ArrayList(100);
}
(Unclosed) open streams (file , network, etc.)
try {
BufferedReader br = new BufferedReader(new FileReader(inputFile));
...
...
} catch (Exception e) {
e.printStackTrace();
}
Unclosed connections
try {
Connection conn = ConnectionFactory.getConnection();
...
...
} catch (Exception e) {
e.printStackTrace();
}
Areas that are unreachable from JVM's garbage collector, such as memory allocated through native methods.
In web applications, some objects are stored in application scope until the application is explicitly stopped or removed.
getServletContext().setAttribute("SOME_MAP", map);
Incorrect or inappropriate JVM options, such as the noclassgc option on IBM JDK that prevents unused class garbage collection
See IBM JDK settings.
A simple thing to do is to use a HashSet with an incorrect (or non-existent) hashCode() or equals(), and then keep adding "duplicates". Instead of ignoring duplicates as it should, the set will only ever grow and you won't be able to remove them.
If you want these bad keys/elements to hang around you can use a static field like
class BadKey {
// no hashCode or equals();
public final String key;
public BadKey(String key) { this.key = key; }
}
Map map = System.getProperties();
map.put(new BadKey("key"), "value"); // Memory leak even if your threads die.
Below there will be a non-obvious case where Java leaks, besides the standard case of forgotten listeners, static references, bogus/modifiable keys in hashmaps, or just threads stuck without any chance to end their life-cycle.
File.deleteOnExit() - always leaks the string, if the string is a substring, the leak is even worse (the underlying char[] is also leaked) - in Java 7 substring also copies the char[], so the later doesn't apply; #Daniel, no needs for votes, though.
I'll concentrate on threads to show the danger of unmanaged threads mostly, don't wish to even touch swing.
Runtime.addShutdownHook and not remove... and then even with removeShutdownHook due to a bug in ThreadGroup class regarding unstarted threads it may not get collected, effectively leak the ThreadGroup. JGroup has the leak in GossipRouter.
Creating, but not starting, a Thread goes into the same category as above.
Creating a thread inherits the ContextClassLoader and AccessControlContext, plus the ThreadGroup and any InheritedThreadLocal, all those references are potential leaks, along with the entire classes loaded by the classloader and all static references, and ja-ja. The effect is especially visible with the entire j.u.c.Executor framework that features a super simple ThreadFactory interface, yet most developers have no clue of the lurking danger. Also a lot of libraries do start threads upon request (way too many industry popular libraries).
ThreadLocal caches; those are evil in many cases. I am sure everyone has seen quite a bit of simple caches based on ThreadLocal, well the bad news: if the thread keeps going more than expected the life the context ClassLoader, it is a pure nice little leak. Do not use ThreadLocal caches unless really needed.
Calling ThreadGroup.destroy() when the ThreadGroup has no threads itself, but it still keeps child ThreadGroups. A bad leak that will prevent the ThreadGroup to remove from its parent, but all the children become un-enumerateable.
Using WeakHashMap and the value (in)directly references the key. This is a hard one to find without a heap dump. That applies to all extended Weak/SoftReference that might keep a hard reference back to the guarded object.
Using java.net.URL with the HTTP(S) protocol and loading the resource from(!). This one is special, the KeepAliveCache creates a new thread in the system ThreadGroup which leaks the current thread's context classloader. The thread is created upon the first request when no alive thread exists, so either you may get lucky or just leak. The leak is already fixed in Java 7 and the code that creates thread properly removes the context classloader. There are few more cases (like ImageFetcher, also fixed) of creating similar threads.
Using InflaterInputStream passing new java.util.zip.Inflater() in the constructor (PNGImageDecoder for instance) and not calling end() of the inflater. Well, if you pass in the constructor with just new, no chance... And yes, calling close() on the stream does not close the inflater if it's manually passed as constructor parameter. This is not a true leak since it'd be released by the finalizer... when it deems it necessary. Till that moment it eats native memory so badly it can cause Linux oom_killer to kill the process with impunity. The main issue is that finalization in Java is very unreliable and G1 made it worse till 7.0.2. Moral of the story: release native resources as soon as you can; the finalizer is just too poor.
The same case with java.util.zip.Deflater. This one is far worse since Deflater is memory hungry in Java, i.e. always uses 15 bits (max) and 8 memory levels (9 is max) allocating several hundreds KB of native memory. Fortunately, Deflater is not widely used and to my knowledge JDK contains no misuses. Always call end() if you manually create a Deflater or Inflater. The best part of the last two: you can't find them via normal profiling tools available.
(I can add some more time wasters I have encountered upon request.)
Good luck and stay safe; leaks are evil!
Most examples here are "too complex". They are edge cases. With these examples, the programmer made a mistake (like don't redefining equals/hashcode), or has been bitten by a corner case of the JVM/JAVA (load of class with static...). I think that's not the type of example an interviewer want or even the most common case.
But there are really simpler cases for memory leaks. The garbage collector only frees what is no longer referenced. We as Java developers don't care about memory. We allocate it when needed and let it be freed automatically. Fine.
But any long-lived application tend to have shared state. It can be anything, statics, singletons... Often non-trivial applications tend to make complex objects graphs. Just forgetting to set a reference to null or more often forgetting to remove one object from a collection is enough to make a memory leak.
Of course all sort of listeners (like UI listeners), caches, or any long-lived shared state tend to produce memory leak if not properly handled. What shall be understood is that this is not a Java corner case, or a problem with the garbage collector. It is a design problem. We design that we add a listener to a long-lived object, but we don't remove the listener when no longer needed. We cache objects, but we have no strategy to remove them from the cache.
We maybe have a complex graph that store the previous state that is needed by a computation. But the previous state is itself linked to the state before and so on.
Like we have to close SQL connections or files. We need to set proper references to null and remove elements from the collection. We shall have proper caching strategies (maximum memory size, number of elements, or timers). All objects that allow a listener to be notified must provide both a addListener and removeListener method. And when these notifiers are no longer used, they must clear their listener list.
A memory leak is indeed truly possible and is perfectly predictable. No need for special language features or corner cases. Memory leaks are either an indicator that something is maybe missing or even of design problems.
The answer depends entirely on what the interviewer thought they were asking.
Is it possible in practice to make Java leak? Of course it is, and there are plenty of examples in the other answers.
But there are multiple meta-questions that may have been being asked?
Is a theoretically "perfect" Java implementation vulnerable to leaks?
Does the candidate understand the difference between theory and reality?
Does the candidate understand how garbage collection works?
Or how garbage collection is supposed to work in an ideal case?
Do they know they can call other languages through native interfaces?
Do they know to leak memory in those other languages?
Does the candidate even know what memory management is, and what is going on behind the scene in Java?
I'm reading your meta-question as "What's an answer I could have used in this interview situation". And hence, I'm going to focus on interview skills instead of Java. I believe you're more likely to repeat the situation of not knowing the answer to a question in an interview than you are to be in a place of needing to know how to make Java leak. So, hopefully, this will help.
One of the most important skills you can develop for interviewing is learning to actively listen to the questions and working with the interviewer to extract their intent. Not only does this let you answer their question the way they want, but also shows that you have some vital communication skills. And when it comes down to a choice between many equally talented developers, I'll hire the one who listens, thinks, and understands before they respond every time.
The following is a pretty pointless example if you do not understand JDBC. Or at least how JDBC expects a developer to close Connection, Statement, and ResultSet instances before discarding them or losing references to them, instead of relying on implementing the finalize method.
void doWork() {
try {
Connection conn = ConnectionFactory.getConnection();
PreparedStatement stmt = conn.preparedStatement("some query");
// executes a valid query
ResultSet rs = stmt.executeQuery();
while(rs.hasNext()) {
// ... process the result set
}
} catch(SQLException sqlEx) {
log(sqlEx);
}
}
The problem with the above is that the Connection object is not closed, and hence the physical Connection will remain open until the garbage collector comes around and sees that it is unreachable. GC will invoke the finalize method, but there are JDBC drivers that do not implement the finalize, at least not in the same way that Connection.close is implemented. The resulting behavior is that while the JVM will reclaim memory due to unreachable objects being collected, resources (including memory) associated with the Connection object might not be reclaimed.
As such, Connection's final method does not clean up everything. One might find that the physical Connection to the database server will last several garbage collection cycles until the database server eventually figures out that the Connection is not alive (if it does) and should be closed.
Even if the JDBC driver implemented finalize, the compiler can throw exceptions during finalization. The resulting behavior is that any memory associated with the now "dormant" object will not be reclaimed by the compiler, as finalize is guaranteed to be invoked only once.
The above scenario of encountering exceptions during object finalization is related to another scenario that could lead to a memory leak - object resurrection. Object resurrection is often done intentionally by creating a strong reference to the object from being finalized, from another object. When object resurrection is misused it will lead to a memory leak in combination with other sources of memory leaks.
There are plenty more examples that you can conjure up - like
Managing a List instance where you are only adding to the list and not deleting from it (although you should be getting rid of elements you no longer need), or
Opening Sockets or Files, but not closing them when they are no longer needed (similar to the above example involving the Connection class).
Not unloading Singletons when bringing down a Java EE application. The Classloader that loaded the singleton class will retain a reference to the class, and hence the singleton instance will never be collected by the JVM. When a new instance of the application is deployed, a new class loader is usually created, and the former class loader will continue to exist due to the singleton.
Probably one of the simplest examples of a potential memory leak, and how to avoid it, is the implementation of ArrayList.remove(int):
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index + 1, elementData, index,
numMoved);
elementData[--size] = null; // (!) Let gc do its work
return oldValue;
}
If you were implementing it yourself, would you have thought to clear the array element that is no longer used (elementData[--size] = null)? That reference might keep a huge object alive ...
Any time you keep references around to objects that you no longer need you have a memory leak. See Handling memory leaks in Java programs for examples of how memory leaks manifest themselves in Java and what you can do about it.
You are able to make memory leak with sun.misc.Unsafe class. In fact this service class is used in different standard classes (for example in java.nio classes). You can't create instances of this class directly, but you may use reflection to get an instance.
Code doesn't compile in the Eclipse IDE - compile it using command javac (during compilation you'll get warnings)
import java.lang.reflect.Constructor;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
public class TestUnsafe {
public static void main(String[] args) throws Exception{
Class unsafeClass = Class.forName("sun.misc.Unsafe");
Field f = unsafeClass.getDeclaredField("theUnsafe");
f.setAccessible(true);
Unsafe unsafe = (Unsafe) f.get(null);
System.out.print("4..3..2..1...");
try
{
for(;;)
unsafe.allocateMemory(1024*1024);
} catch(Error e) {
System.out.println("Boom :)");
e.printStackTrace();
}
}
}
I can copy my answer from here:
Easiest way to cause memory leak in Java
"A memory leak, in computer science (or leakage, in this context), occurs when a computer program consumes memory but is unable to release it back to the operating system." (Wikipedia)
The easy answer is: You can't. Java does automatic memory management and will free resources that are not needed for you. You can't stop this from happening. It will always be able to release the resources. In programs with manual memory management, this is different. You can get some memory in C using malloc(). To free the memory, you need the pointer that malloc returned and call free() on it. But if you don't have the pointer any more (overwritten, or lifetime exceeded), then you are unfortunately incapable of freeing this memory and thus you have a memory leak.
All the other answers so far are in my definition not really memory leaks. They all aim at filling the memory with pointless stuff real fast. But at any time you could still dereference the objects you created and thus freeing the memory --> no leak. acconrad's answer comes pretty close though as I have to admit since his solution is effectively to just "crash" the garbage collector by forcing it in an endless loop).
The long answer is: You can get a memory leak by writing a library for Java using the JNI, which can have manual memory management and thus have memory leaks. If you call this library, your Java process will leak memory. Or, you can have bugs in the JVM, so that the JVM looses memory. There are probably bugs in the JVM, there may even be some known ones since garbage collection is not that trivial, but then it's still a bug. By design this is not possible. You may be asking for some Java code that is effected by such a bug. Sorry I don't know one and it might well not be a bug any more in the next Java version anyway.
Here's a simple/sinister one via http://wiki.eclipse.org/Performance_Bloopers#String.substring.28.29.
public class StringLeaker
{
private final String muchSmallerString;
public StringLeaker()
{
// Imagine the whole Declaration of Independence here
String veryLongString = "We hold these truths to be self-evident...";
// The substring here maintains a reference to the internal char[]
// representation of the original string.
this.muchSmallerString = veryLongString.substring(0, 1);
}
}
Because the substring refers to the internal representation of the original, much longer string, the original stays in memory. Thus, as long as you have a StringLeaker in play, you have the whole original string in memory, too, even though you might think you're just holding on to a single-character string.
The way to avoid storing an unwanted reference to the original string is to do something like this:
...
this.muchSmallerString = new String(veryLongString.substring(0, 1));
...
For added badness, you might also .intern() the substring:
...
this.muchSmallerString = veryLongString.substring(0, 1).intern();
...
Doing so will keep both the original long string and the derived substring in memory even after the StringLeaker instance has been discarded.
A common example of this in GUI code is when creating a widget/component and adding a listener to some static/application scoped object and then not removing the listener when the widget is destroyed. Not only do you get a memory leak, but also a performance hit as when whatever you are listening to fires events, all your old listeners are called too.
Take any web application running in any servlet container (Tomcat, Jetty, GlassFish, whatever...). Redeploy the application 10 or 20 times in a row (it may be enough to simply touch the WAR in the server's autodeploy directory.
Unless anybody has actually tested this, chances are high that you'll get an OutOfMemoryError after a couple of redeployments, because the application did not take care to clean up after itself. You may even find a bug in your server with this test.
The problem is, the lifetime of the container is longer than the lifetime of your application. You have to make sure that all references the container might have to objects or classes of your application can be garbage collected.
If there is just one reference surviving the undeployment of your web application, the corresponding classloader and by consequence all classes of your web application cannot be garbage collected.
Threads started by your application, ThreadLocal variables, logging appenders are some of the usual suspects to cause classloader leaks.
Maybe by using external native code through JNI?
With pure Java, it is almost impossible.
But that is about a "standard" type of memory leak, when you cannot access the memory anymore, but it is still owned by the application. You can instead keep references to unused objects, or open streams without closing them afterwards.
I have had a nice "memory leak" in relation to PermGen and XML parsing once.
The XML parser we used (I can't remember which one it was) did a String.intern() on tag names, to make comparison faster.
One of our customers had the great idea to store data values not in XML attributes or text, but as tagnames, so we had a document like:
<data>
<1>bla</1>
<2>foo</>
...
</data>
In fact, they did not use numbers but longer textual IDs (around 20 characters), which were unique and came in at a rate of 10-15 million a day. That makes 200 MB of rubbish a day, which is never needed again, and never GCed (since it is in PermGen). We had permgen set to 512 MB, so it took around two days for the out-of-memory exception (OOME) to arrive...
The interviewer was probably looking for a circular reference like the code below (which incidentally only leak memory in very old JVMs that used reference counting, which isn't the case anymore). But it's a pretty vague question, so it's a prime opportunity to show off your understanding of JVM memory management.
class A {
B bRef;
}
class B {
A aRef;
}
public class Main {
public static void main(String args[]) {
A myA = new A();
B myB = new B();
myA.bRef = myB;
myB.aRef = myA;
myA=null;
myB=null;
/* at this point, there is no access to the myA and myB objects, */
/* even though both objects still have active references. */
} /* main */
}
Then you can explain that with reference counting, the above code would leak memory. But most modern JVMs don't use reference counting any longer. Most use a sweep garbage collector, which will in fact collect this memory.
Next, you might explain creating an Object that has an underlying native resource, like this:
public class Main {
public static void main(String args[]) {
Socket s = new Socket(InetAddress.getByName("google.com"),80);
s=null;
/* at this point, because you didn't close the socket properly, */
/* you have a leak of a native descriptor, which uses memory. */
}
}
Then you can explain this is technically a memory leak, but really the leak is caused by native code in the JVM allocating underlying native resources, which weren't freed by your Java code.
At the end of the day, with a modern JVM, you need to write some Java code that allocates a native resource outside the normal scope of the JVM's awareness.
What's a memory leak:
It's caused by a bug or bad design.
It's a waste of memory.
It gets worse over time.
The garbage collector cannot clean it.
Typical example:
A cache of objects is a good starting point to mess things up.
private static final Map<String, Info> myCache = new HashMap<>();
public void getInfo(String key)
{
// uses cache
Info info = myCache.get(key);
if (info != null) return info;
// if it's not in cache, then fetch it from the database
info = Database.fetch(key);
if (info == null) return null;
// and store it in the cache
myCache.put(key, info);
return info;
}
Your cache grows and grows. And pretty soon the entire database gets sucked into memory. A better design uses an LRUMap (Only keeps recently used objects in cache).
Sure, you can make things a lot more complicated:
using ThreadLocal constructions.
adding more complex reference trees.
or leaks caused by 3rd party libraries.
What often happens:
If this Info object has references to other objects, which again have references to other objects. In a way you could also consider this to be some kind of memory leak, (caused by bad design).
I thought it was interesting that no one used the internal class examples. If you have an internal class; it inherently maintains a reference to the containing class. Of course it is not technically a memory leak because Java WILL eventually clean it up; but this can cause classes to hang around longer than anticipated.
public class Example1 {
public Example2 getNewExample2() {
return this.new Example2();
}
public class Example2 {
public Example2() {}
}
}
Now if you call Example1 and get an Example2 discarding Example1, you will inherently still have a link to an Example1 object.
public class Referencer {
public static Example2 GetAnExample2() {
Example1 ex = new Example1();
return ex.getNewExample2();
}
public static void main(String[] args) {
Example2 ex = Referencer.GetAnExample2();
// As long as ex is reachable; Example1 will always remain in memory.
}
}
I've also heard a rumor that if you have a variable that exists for longer than a specific amount of time; Java assumes that it will always exist and will actually never try to clean it up if cannot be reached in code anymore. But that is completely unverified.
I recently encountered a memory leak situation caused in a way by log4j.
Log4j has this mechanism called Nested Diagnostic Context(NDC) which is an instrument to distinguish interleaved log output from different sources. The granularity at which NDC works is threads, so it distinguishes log outputs from different threads separately.
In order to store thread specific tags, log4j's NDC class uses a Hashtable which is keyed by the Thread object itself (as opposed to say the thread id), and thus till the NDC tag stays in memory all the objects that hang off of the thread object also stay in memory. In our web application we use NDC to tag logoutputs with a request id to distinguish logs from a single request separately. The container that associates the NDC tag with a thread, also removes it while returning the response from a request. The problem occurred when during the course of processing a request, a child thread was spawned, something like the following code:
pubclic class RequestProcessor {
private static final Logger logger = Logger.getLogger(RequestProcessor.class);
public void doSomething() {
....
final List<String> hugeList = new ArrayList<String>(10000);
new Thread() {
public void run() {
logger.info("Child thread spawned")
for(String s:hugeList) {
....
}
}
}.start();
}
}
So an NDC context was associated with inline thread that was spawned. The thread object that was the key for this NDC context, is the inline thread which has the hugeList object hanging off of it. Hence even after the thread finished doing what it was doing, the reference to the hugeList was kept alive by the NDC context Hastable, thus causing a memory leak.
Create a static Map and keep adding hard references to it. Those will never be garbage collected.
public class Leaker {
private static final Map<String, Object> CACHE = new HashMap<String, Object>();
// Keep adding until failure.
public static void addToCache(String key, Object value) { Leaker.CACHE.put(key, value); }
}
Everyone always forgets the native code route. Here's a simple formula for a leak:
Declare a native method.
In the native method, call malloc. Don't call free.
Call the native method.
Remember, memory allocations in native code come from the JVM heap.
You can create a moving memory leak by creating a new instance of a class in that class's finalize method. Bonus points if the finalizer creates multiple instances. Here's a simple program that leaks the entire heap in sometime between a few seconds and a few minutes depending on your heap size:
class Leakee {
public void check() {
if (depth > 2) {
Leaker.done();
}
}
private int depth;
public Leakee(int d) {
depth = d;
}
protected void finalize() {
new Leakee(depth + 1).check();
new Leakee(depth + 1).check();
}
}
public class Leaker {
private static boolean makeMore = true;
public static void done() {
makeMore = false;
}
public static void main(String[] args) throws InterruptedException {
// make a bunch of them until the garbage collector gets active
while (makeMore) {
new Leakee(0).check();
}
// sit back and watch the finalizers chew through memory
while (true) {
Thread.sleep(1000);
System.out.println("memory=" +
Runtime.getRuntime().freeMemory() + " / " +
Runtime.getRuntime().totalMemory());
}
}
}
I don't think anyone has said this yet: you can resurrect an object by overriding the finalize() method such that finalize() stores a reference of this somewhere. The garbage collector will only be called once on the object so after that the object will never destroyed.
I came across a more subtle kind of resource leak recently.
We open resources via class loader's getResourceAsStream and it happened that the input stream handles were not closed.
Uhm, you might say, what an idiot.
Well, what makes this interesting is: this way, you can leak heap memory of the underlying process, rather than from JVM's heap.
All you need is a jar file with a file inside which will be referenced from Java code. The bigger the jar file, the quicker memory gets allocated.
You can easily create such a jar with the following class:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class BigJarCreator {
public static void main(String[] args) throws IOException {
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(new File("big.jar")));
zos.putNextEntry(new ZipEntry("resource.txt"));
zos.write("not too much in here".getBytes());
zos.closeEntry();
zos.putNextEntry(new ZipEntry("largeFile.out"));
for (int i=0 ; i<10000000 ; i++) {
zos.write((int) (Math.round(Math.random()*100)+20));
}
zos.closeEntry();
zos.close();
}
}
Just paste into a file named BigJarCreator.java, compile and run it from command line:
javac BigJarCreator.java
java -cp . BigJarCreator
Et voilà: you find a jar archive in your current working directory with two files inside.
Let's create a second class:
public class MemLeak {
public static void main(String[] args) throws InterruptedException {
int ITERATIONS=100000;
for (int i=0 ; i<ITERATIONS ; i++) {
MemLeak.class.getClassLoader().getResourceAsStream("resource.txt");
}
System.out.println("finished creation of streams, now waiting to be killed");
Thread.sleep(Long.MAX_VALUE);
}
}
This class basically does nothing, but create unreferenced InputStream objects. Those objects will be garbage collected immediately and thus, do not contribute to heap size.
It is important for our example to load an existing resource from a jar file, and size does matter here!
If you're doubtful, try to compile and start the class above, but make sure to chose a decent heap size (2 MB):
javac MemLeak.java
java -Xmx2m -classpath .:big.jar MemLeak
You will not encounter an OOM error here, as no references are kept, the application will keep running no matter how large you chose ITERATIONS in the above example.
The memory consumption of your process (visible in top (RES/RSS) or process explorer) grows unless the application gets to the wait command. In the setup above, it will allocate around 150 MB in memory.
If you want the application to play safe, close the input stream right where it's created:
MemLeak.class.getClassLoader().getResourceAsStream("resource.txt").close();
and your process will not exceed 35 MB, independent of the iteration count.
Quite simple and surprising.
As a lot of people have suggested, resource leaks are fairly easy to cause - like the JDBC examples. Actual memory leaks are a bit harder - especially if you aren't relying on broken bits of the JVM to do it for you...
The ideas of creating objects that have a very large footprint and then not being able to access them aren't real memory leaks either. If nothing can access it then it will be garbage collected, and if something can access it then it's not a leak...
One way that used to work though - and I don't know if it still does - is to have a three-deep circular chain. As in Object A has a reference to Object B, Object B has a reference to Object C and Object C has a reference to Object A. The GC was clever enough to know that a two deep chain - as in A <--> B - can safely be collected if A and B aren't accessible by anything else, but couldn't handle the three-way chain...
Another way to create potentially huge memory leaks is to hold references to Map.Entry<K,V> of a TreeMap.
It is hard to asses why this applies only to TreeMaps, but by looking at the implementation the reason might be that: a TreeMap.Entry stores references to its siblings, therefore if a TreeMap is ready to be collected, but some other class holds a reference to any of its Map.Entry, then the entire Map will be retained into memory.
Real-life scenario:
Imagine having a db query that returns a big TreeMap data structure. People usually use TreeMaps as the element insertion order is retained.
public static Map<String, Integer> pseudoQueryDatabase();
If the query was called lots of times and, for each query (so, for each Map returned) you save an Entry somewhere, the memory would constantly keep growing.
Consider the following wrapper class:
class EntryHolder {
Map.Entry<String, Integer> entry;
EntryHolder(Map.Entry<String, Integer> entry) {
this.entry = entry;
}
}
Application:
public class LeakTest {
private final List<EntryHolder> holdersCache = new ArrayList<>();
private static final int MAP_SIZE = 100_000;
public void run() {
// create 500 entries each holding a reference to an Entry of a TreeMap
IntStream.range(0, 500).forEach(value -> {
// create map
final Map<String, Integer> map = pseudoQueryDatabase();
final int index = new Random().nextInt(MAP_SIZE);
// get random entry from map
for (Map.Entry<String, Integer> entry : map.entrySet()) {
if (entry.getValue().equals(index)) {
holdersCache.add(new EntryHolder(entry));
break;
}
}
// to observe behavior in visualvm
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
});
}
public static Map<String, Integer> pseudoQueryDatabase() {
final Map<String, Integer> map = new TreeMap<>();
IntStream.range(0, MAP_SIZE).forEach(i -> map.put(String.valueOf(i), i));
return map;
}
public static void main(String[] args) throws Exception {
new LeakTest().run();
}
}
After each pseudoQueryDatabase() call, the map instances should be ready for collection, but it won't happen, as at least one Entry is stored somewhere else.
Depending on your jvm settings, the application may crash in the early stage due to a OutOfMemoryError.
You can see from this visualvm graph how the memory keeps growing.
The same does not happen with a hashed data-structure (HashMap).
This is the graph when using a HashMap.
The solution? Just directly save the key / value (as you probably already do) rather than saving the Map.Entry.
I have written a more extensive benchmark here.
There are many good examples of memory leaks in Java, and I will mention two of them in this answer.
Example 1:
Here is a good example of a memory leak from the book Effective Java, Third Edition (item 7: Eliminate obsolete object references):
// Can you spot the "memory leak"?
public class Stack {
private static final int DEFAULT_INITIAL_CAPACITY = 16;
private Object[] elements;
private int size = 0;
public Stack() {
elements = new Object[DEFAULT_INITIAL_CAPACITY];
}
public void push(Object e) {
ensureCapacity();
elements[size++] = e;
}
public Object pop() {
if (size == 0) throw new EmptyStackException();
return elements[--size];
}
/*** Ensure space for at least one more element, roughly* doubling the capacity each time the array needs to grow.*/
private void ensureCapacity() {
if (elements.length == size) elements = Arrays.copyOf(elements, 2 * size + 1);
}
}
This is the paragraph of the book that describes why this implementation will cause a memory leak:
If a stack grows and then shrinks, the objects that were popped off the
stack will not be garbage collected, even if the program using the
stack has no more references to them. This is because the
stack maintains obsolete references to these objects. An obsolete
reference is simply a reference that will never be dereferenced
again. In this case, any references outside of the “active portion” of
the element array are obsolete. The active portion consists of the
elements whose index is less than size
Here is the solution of the book to tackle this memory leak:
The fix for this sort of problem is simple: null out
references once they become obsolete. In the case of our Stack class,
the reference to an item becomes obsolete as soon as it’s popped
off the stack. The corrected version of the pop method looks like this:
public Object pop() {
if (size == 0) throw new EmptyStackException();
Object result = elements[--size];
elements[size] = null; // Eliminate obsolete reference
return result;
}
But how can we prevent a memory leak from happening? This is a good caveat from the book:
Generally speaking, whenever a class manages its own memory,
the programmer should be alert for memory leaks. Whenever an element
is freed, any object references contained in the element should be
nulled out.
Example 2:
The observer pattern also can cause a memory leak. You can read about this pattern in the following link: Observer pattern.
This is one implementation of the Observer pattern:
class EventSource {
public interface Observer {
void update(String event);
}
private final List<Observer> observers = new ArrayList<>();
private void notifyObservers(String event) {
observers.forEach(observer -> observer.update(event)); //alternative lambda expression: observers.forEach(Observer::update);
}
public void addObserver(Observer observer) {
observers.add(observer);
}
public void scanSystemIn() {
Scanner scanner = new Scanner(System.in);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
notifyObservers(line);
}
}
}
In this implementation, EventSource, which is Observable in the Observer design pattern, can hold links to Observer objects, but this link is never removed from the observers field in EventSource. So they will never be collected by the garbage collector. One solution to tackle this problem is providing another method to the client for removing the aforementioned observers from the observers field when they don't need those observers anymore:
public void removeObserver(Observer observer) {
observers.remove(observer);
}
Threads are not collected until they terminate. They serve as roots of garbage collection. They are one of the few objects that won't be reclaimed simply by forgetting about them or clearing references to them.
Consider: the basic pattern to terminate a worker thread is to set some condition variable seen by the thread. The thread can check the variable periodically and use that as a signal to terminate. If the variable is not declared volatile, then the change to the variable might not be seen by the thread, so it won't know to terminate. Or imagine if some threads want to update a shared object, but deadlock while trying to lock on it.
If you only have a handful of threads these bugs will probably be obvious because your program will stop working properly. If you have a thread pool that creates more threads as needed, then the obsolete/stuck threads might not be noticed, and will accumulate indefinitely, causing a memory leak. Threads are likely to use other data in your application, so will also prevent anything they directly reference from ever being collected.
As a toy example:
static void leakMe(final Object object) {
new Thread() {
public void run() {
Object o = object;
for (;;) {
try {
sleep(Long.MAX_VALUE);
} catch (InterruptedException e) {}
}
}
}.start();
}
Call System.gc() all you like, but the object passed to leakMe will never die.
The interviewer might have been looking for a circular reference solution:
public static void main(String[] args) {
while (true) {
Element first = new Element();
first.next = new Element();
first.next.next = first;
}
}
This is a classic problem with reference counting garbage collectors. You would then politely explain that JVMs use a much more sophisticated algorithm that doesn't have this limitation.
There is no Automatic Garbage Collection in C/C++.
Assume that I wrote a simple program in C/C++ and created a single object.
Assume that there are exactly 10 or extremely limited number of addresses for allocation.
I have a for loop running for 100 times inside which this single object is created every time the loop is run.
In Java, because there is Automatic Garbage collection, after a single loop is executed the address of the object is removed each time automatically.
Rough Example:
for(int i = 0; i < 100; i++)
{
Object o = new Object;
}
In C/C++ we have to manually remove the Object address inside the for loop. Do we have to reboot each time as well to properly remove the Object reference in C++?
For those who say that there are no problems deleting an object more than once in C++. from Wikipedia:
When an object is deleted more than
once, or when the programmer attempts
to release a pointer to an object not
allocated from the free store,
catastrophic failure of the dynamic
memory management system can result.
The result of such actions can include
heap corruption, premature destruction
of a different (and newly created)
object which happens to occupy the
same location in memory as the
multiply deleted object, and other
forms of undefined behavior.
The link: Manual Memory Management
So there IS a risk?
C++ doesn't have (built-in) garbage collection, but that doesn't mean you should let your dynamically allocated memory stay allocated forever. You can and should free it. The most basic way to that would be to free your object reference (or pointer, since this is C++) manually when you don't need it anymore:
for(int i = 0; i < 100; i++)
{
// Dynamically allocate an object
Object* o = new Object();
// do something with object
// Release object memory (and call object destructor, if there is one)
delete o;
}
If you allocated your object on the stack however, it always gets automatically released when it goes out of scope, and you don't have to wait for garbage collection to happen - it's always released immediately:
for(int i = 0; i < 100; i++)
{
// Create a new object on the stack
Object o = Object(); // Note we're not using the new keyword here.
// Do something with object
// Object gets automatically deallocated, or more accurately, in this specific
// case (a loop), the compiler will optimize things, so object's destructor
// will get called, but the object's stack memory will be reused.
}
This behavior of C++ stack values (automatic destruction when they go out of scope), which is awkwardly termed RAII (Resource Acquisition Is Initialization) allows for very nice things that Java just can't do. One of them is smart pointers, that allow dynamically allocated objects to be freed automatically just like their stack counterparts, and you can use sophisticated smart pointers to implement your own version of garbage collection if you want.
Another advantage of RAII is that finally-blocks are rarely necessary in C++: local variables that reference to a resource that should be freed immediately are usually allocated on the stack, and therefore get released automatically. No need for finally blocks here.
Practically speaking, when programming in C++ you'd usually put local variables on the stack (and get all the advantages of RAII without lifting a finger), unless you need to keep them alive for longer than that (e.g. you need to create an object and store a reference to it that stays alive when you leave the function that created the object). In that case, you can use pointers directly, but if you don't want to deal with manually deleting pointers and all the problems it can lead to, you'd usually use a smart pointer. A smart pointer is an object allocated on the stack, that wraps a 'dumb' (i.e. regular) pointer, and deletes the pointer when its destructor gets called. More advanced version of smart pointers can implement reference counting (so the pointed object will be released only when all smart pointers referencing it go out of scope) and even garbage collection. The standard library comes with two smart pointers: auto_ptr and smart_ptr (the latter is reference-counting, but some old compiler may not support it).
Java doesn't automatically remove the object at the end of each loop. Instead it waits until there is a lot of garbage and then goes through and collects it. Java makes no guarantees about how long it will be before the object is collected.
In C++, once the program has exited all resources are returned. You don't need to reboot (assuming a reasonable operating system) in order to make sure that resources are returned. Your operating system will take care of making sure that any resources your program didn't release get released when it is shutdown.
If you delete the object in C++, then its gone right then. you don't have to do anything else to recover the memory.
C++ certainly does have automatic storage duration; it just has other types of storage duration as well, and the ability to choose what is most appropriate.
In something like your example, you would use automatic storage:
for (int i = 0; i < 100; ++i) {
Object o;
// The object is automatically destroyed after each iteration
}
If the object is required to outlive the loop, perhaps because it is passed to another object to manage, then you would use smart pointers (the closest C++ equivalent to Java's references). Here are examples using auto_ptr and shared_ptr:
// some function that takes ownership of an object
void register(auto_ptr<Object> const & o);
void register(shared_ptr<Object> const & o);
for (int i = 0; i < 100; ++i) {
auto_ptr<Object> o(new Object);
register(o); // ownership may be transferred; our pointer is now null in that case
// The pointer is automatically destroyed after each iteration,
// deleting the object if it still owns it.
}
for (int i = 0; i < 100; ++i) {
shared_ptr<Object> o(new Object);
register(o); // ownership is shared; our pointer is still valid
// The pointer is automatically destroyed after each iteration,
// deleting the object if there are no other shared pointers to it.
}
In none of these cases do you need to manually delete the object; that is only necessary when dealing with raw object pointers which, in my opinion, should only be done when absolutely necessary.
C++ also has an advantage over Java here: destruction is always deterministic (that is, you know exactly when it happens). In Java, once an object is discarded, you do not know exactly when (or even if) the garbage collector will remove it. This means that, if the object manages a resource (such as a lock, or a database connection) that needs to be released after use, then it is up to the user of the object to manually release it. In C++, this can be done automatically in the destructor, making the class easier and less error-prone to use.
There is no Automatic Garbage Collection in C/C++.
False. C and C++ don't mandate automatic garbage collection, but it's available anyway (e.g., see the Boehm/Demers/Weiser collector).
In C/C++ we have to manually remove the Object address inside the for loop.
Again, false. In C or C++, we'd define the object using the automatic storage class, which would deterministically destroy the object:
for(int i = 0; i < 100; i++)
{
Object o;
}
Just for example, let's do a quick test, by defining Object something like this:
struct Object {
Object() { std::cout << "Created an Object\n"; }
~Object() { std::cout << "Destroyed an Object\n"; }
};
For the loop above, this generates:
Created an Object
Destroyed an Object
Created an Object
Destroyed an Object
Created an Object
Destroyed an Object
[97 more repetitions of the same pattern removed ]
Do we have to reboot each time as well to properly remove the Object reference in C++?
No, of course not. The fundamental difference between C++ and Java in this respect is that in C++ the object destruction is deterministic, but in Java it's not. Just for example, in C++, the code above must follow exactly the prescribed pattern -- the body of the loop is a block, and the object must be created on entry to the block, and destroyed on exit from the block. In Java, you get no such assurance. It might only destroy the first object after it has allocated 10 or 20. The objects may be destroyed in any order, and there's not really a guarantee that any particular object will be destroyed at all.
That difference isn't always important, but certainly can be. In C++, it's used to support RAII (aka., SBRM -- stack bound resource management). This used to assure not only that memory is freed when no longer needed (i.e., the same thing Java's automatic garbage collector handles) but also that other resources (anything from files to widgets to network or database connections) are handled automatically as well.
If you have short lived objects and performance is critical (most of the time it is not) you can create a mutable object which is reused on each loop. This moves work from a per-iteration to per-loop.
List list = new ArrayList(); // mutable object.
for(int i = 0; i < 100; i++) {
list.clear();
// do something with the list.
}
// one list is freed later.
You can make the list a member field (meaning the mutable object might never be freed), which is fine, provided your class doesn't need to be thread safe.
Assuming you are on a typical operating system (Windows/Linux) - no you do not need to reboot. The OS will protect you via the Process/Virtual Memory structure.
Only your process will run out of memory. The OS will clean up after you when you process ends.
Running on many small embedded systems without an OS - yes you would crash or lockup the processor and require a reboot.
Adding to the already given answers, you can always write signal handlers, upon receiving which you can clean up the memory used by your process..
Normally the OS deallocate your program when you quit it. But some OS might not (I've seen that on a http://www.beck-ipc.com/en/products/sc1x/sc13.asp : RTOS didn't deallocate on exit so I ran out of memory after a couple of launch when I didn't deallocate all my objects, so yes I had to reboot).
But most OS will clear/deallocate previously used programs (linux and windows does), so you shouldn't have to worry.
Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?
Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.
In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.
Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.
At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.
It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.
Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.
public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null
I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...
Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)
I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.
Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.
Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html
"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx
In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.
As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}