I have a very weird problem with GC in Java. I am running th following piece of code:
while(some condition){
//do a lot of work...
logger.info("Generating resulting time series...");
Collection<MetricTimeSeries> allSeries = manager.getTimeSeries();
logger.info(String.format("Generated %,d time series! Storing in files now...", allSeries.size()));
//for (MetricTimeSeries series : allSeries) {
// just empty loop
//}
}
When I look into JConsole, at the restart of every loop iteration, my old gen heap space, if I manually force GC, takes a size of about 90 MB. If I uncomment the loop, like this
while(some condition){
//do a lot of work...
logger.info("Generating resulting time series...");
Collection<MetricTimeSeries> allSeries = manager.getTimeSeries();
logger.info(String.format("Generated %,d time series! Storing in files now...", allSeries.size()));
for (MetricTimeSeries series : allSeries) {
// just empty loop
}
}
Even if I force it to refresh, it won't fall below 550MB. According to yourKit profiler, the TimeSeries objects are accessible via main thread's local var (the collection), just after the GC at the restart of a new iteration... And the collection is huge (250K time series.)... Wyy is this happening and how can I "fight" this (incorrect?) behaviour?
Yup, the garbage collector can be mysterious.. but it beats managing your own memory ;)
Collections and Maps have a way of hanging onto references longer than you might like and thus preventing garbage collection when you might expect. As you noticed, setting the allSeries reference to null itself will ear mark it for garbage collection, and thus it's contents are up for grabs as well. Another way would be to call allSeries.clear(): this will unlink all it's MetricTimeSeries objects and they will be free for garbage collection.
Why does removing the loop get around this problem also? This is the more interesting question. I'm tempted to suggest the compiler is optimizing the reference to allSeries.. but you are still calling allSeries.size() so it can't completely optimize out the reference.
To muddy the waters, different compiles (and settings) behave differently and use different garbage collectors which themselves behave differently. It's tough to say exactly what's happening under the hood without more information.
Since you're building a (large) ArrayList of time series, it will occupy the heap as long as it's referenced, and will get promoted to old if it stays long enough (or if the young generation is too small to actually hold it). I'm not sure how you're associating the information you're seeing in JConsole or Yourkit to a specific point in the program, but until the empty loop is optimized by several JIT passes, your while loop will take longer and keep the collection longer, which might explain the perceived difference while there's actually not a lot.
There's nothing incorrect about that behaviour. If you don't want to consume so much memory, you need to change your Collection so it's not an eagerly-filled ArrayList, but a lazy collection, more of a stream (if you've ever done XML processing, think DOM vs SAX) which gets evaluated as it's iterated. If you don't need the whole collection to be sorted, that's doable, especially since you seem to be saying that the collection is a concatenation of sub-collections returned by underlying objects.
If you can change your return type from Collection to Iterable, you could for example use Guava's FluentIterable.transformAndConcat() to transform the collection of underlying objects to a lazily-evaluated Iterable concatenation of their time series. Of course, the size of the collection is not directly available anymore (and if you try to get it independently of the iteration, you'll evaluate the lazy collection twice).
Related
In Effective Java 3rd edition, on page 50, author has talked about total time an object lasted from its creation to the time it was garbage collected.
On my machine, the time to create a simple AutoCloseable object, to close
it using try-with-resources, and to have the garbage collector reclaim it is about
12 ns. Using a finalizer instead increases the time to 550 ns.
How can we calculate such time? Is there some reliable mechanism for calculating this time?
The only reliable method I am aware of (I being emphasized here) is in java-9 via the Cleaner API, something like this:
static class MyObject {
long start;
public MyObject() {
start = System.nanoTime();
}
}
private static void test() {
MyObject m = new MyObject();
Cleaner c = Cleaner.create();
Cleanable clean = c.register(m, () -> {
// ms from birth to death
System.out.println("done" + (System.nanoTime() - m.start) / 1_000_000);
});
clean.clean();
System.out.println(m.hashCode());
}
The documentation for register says:
Runnable to invoke when the object becomes phantom reachable
And my question was really what is phantom reachable after all? (It's a question I still doubt I really understand it)
In java-8 the documentation says (for PhantomReference)
Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An
object that is reachable via phantom references will remain so until all
such references are cleared or themselves become unreachable.
There are good topics here on SO that try to explain why this is so, taking into consideration that PhantomReference#get will always return null, thus not much use not to collect them immediately.
There are also topics here (I'll try to dig them up), where it is shown how easy is to resurrect and Object in the finalize method (by making it strongly reachable again - I think this was not intended by the API in any way to begin with).
In java-9 that sentence in bold is removed, so they are collected.
Any attempt to track the object’s lifetime is invasive enough to alter the result significantly.
That’s especially true for the AutoCloseable variant, which may be subject to Escape Analysis in the best case, reducing the costs of allocation and deallocation close to zero. Any tracking approach implies creating a global reference which will hinder this optimization.
In practice, the exact time of deallocation is irrelevant for ordinary objects (i.e. those without a special finalize() method). The memory of all unreachable objects will be reclaimed en bloc the next time the memory manager actually needs free memory. So for real life scenarios, there is no sense in trying to measure a single object in isolation.
If you want to measure the costs of allocation and deallocation in a noninvasive way that tries to be closer to a real application’s behavior, you may do the following:
Limit the JVM’s heap memory to n
Run a test program that allocates and abandons a significant number of the test instances, such, that their required amount of memory is orders of magnitude higher than the heap memory n.
measure the total time needed to execute the test program and divide it by the number of objects it created
You know for sure that objects not fitting into the limited heap must have been reclaimed to make room for newer objects. Since this doesn’t apply to the last allocated objects, you know that you have a maximum error matching the number of objects fitting into n. When you followed the recipe and allocated large multiples of that number, you have a rather small error, especially when comparing the numbers reveals something like variant A needing ~12 ns per instance on average and variant B needing 550 ns (as already stated here, these numbers are clearly marked with “on my machine” and not meant to be reproducible exactly).
Depending on the test environment, you may even have to slow down the allocating thread for the variant with finalize(), to allow the finalizer thread to catch up.
That’s a real life issue, when only relying on finalize(), allocating too many resources in a loop can break the program.
tl;dr: In Java, which is better, reusing of container object or creating object every time and let garbage collector do the work
I am dealing with huge amount of data in Java where frequently I have following type of code structure:-
Version1:
for(...){//outer loop
HashSet<Integer> test = new HashSet<>(); //Some container
for(...){
//Inner loop working on the above container Data Structure
}
//More operation on the container defined above
}//Outer loop ends
Here I allocated new memory every time in a loop and do some operations in inner/outer loop before allocating empty memory again.
Now I am concerned about the memory leaks in Java. I know that Java has a fairly good Garbage Collector but instead of relying on that should I modify my code as follows:-
Version2:
HashSet<Integer> test = null;
for(...){//outer loop
if(test == null){
test = new HashSet<>(); //Some container
}else{
test.clear()
}
for(...){
//Inner loop working on the above container Data Structure
}
//More operation on the container defined above
}//Outer loop ends
I have three questions:-
Which will perform better, or there is no definitive answer.
Will second version will have more time complexity? In other other words is clear() function O(1) of O(n) in complexity. I didn't anything in javadocs.
This pattern is quite common, which version is more recommended one?
To my opinion it's better to use the first approach. Note that HashSet.clear never shrinks the size of hash-table. Thus if the first iteration of the outer loop adds many elements to the set, the hash-table will become quite big, but on the subsequent iterations even if much less space is necessary if won't be shrinked.
Also first version makes the further refactoring easier: you may later want to put the whole inner loop into the separate method. Using the first version you can just move it together with HashSet.
Finally note that for garbage-collection it's usually easier to manage short-lived objects. If your HashSet is long-lived, it may be moved to old generation and removed only during the full GC.
I think it's simpler to create a new HashSet each time, and likely to be less prone to refactoring errors later on. Unless you have a good reason to resuse the HashSet (Garbage Collection pauses are an issue for you, and profiling shows this part of the code is the cause) - I would keep things as simple as possible and stick to 1. Focus on maintainability, Premature Optimization should be avoided.
I would recommend you to stick to the first variant. The main reason behind this will be keeping the scope of your HashSet variable as small as possible. This way you actually ensure that it will be eligible for garbage collection after the iteration has ended. Promoting it's scope may cause other problems - the reference can be later used to actually change the state of the object.
Also, most modern Java compilers will produce the same byte code if you are creating the instance inside or outside the loop.
Which one is faster?. Actually the answer could vary depending on various factors.
Version-1 advantages :
Predictive branching at processor level might make this faster.
Scope of instance is limited to the first loop. If reference doesn't escape, JIT might actually compile your method. GC's job will
probably be easier.
Version -2 :
Less time in creation of new containers (frankly, this is not too much).
clear() is O(n)
Escaped reference might prevent JIT from making some optimizations.
Which one to choose?. measure performance for both versions several times. Then if you find significant difference, change your code, if not, don't do anything :)
Version 2 is better
but it will take little bit of more time but memory performance will be good
It depends.
Recycling objects can be useful in tight loops to eliminate GC pressure. Especially when the object is too large for the young generation or the loop runs long enough for it be tenured.
But in your particular example it's it may not help much because a hashset still contains node objects which will be created on inserting and become eligible for GC on clearing.
On the other hand, if you put so many items into the set that its internal Object[] array has to be resized multiple times and becomes too large for the young generation then it might be useful to recycle the set. But in that case you should be pre-sizing the set anyway.
Additionally objects that only live for the duration of a code block may be eligible for object decomposition/stack allocation via escape analysis. The shorter their lifetime and the less complex the code-paths touching those objects the more likely it is for EA to succeed.
In the end it doesn't matter much though until this method actually becomes an allocation hotspot in your application, in which case it would show up in profiler results and you could act accordingly.
This question already has answers here:
Declaring variables inside or outside of a loop
(20 answers)
Closed 8 years ago.
I know similar question has been asked many times previously but I am still not convinced about when objects become eligible for GC and which approach is more efficient.
Approach one:
for (Item item : items) {
MyObject myObject = new MyObject();
//use myObject.
}
Approach Two:
MyObject myObject = null;
for (Item item : items) {
myObject = new MyObject();
//use myObject.
}
I understand: "By minimizing the scope of local variables, you increase the readability and maintainability of your code and reduce the likelihood of error". (Joshua Bloch).
But How about performance/memory consumption? In Java Objects are Garbage collected when there is no reference left to the object. If there are e.g. 100000 items then 100000 objects will be created. In Approach One each object will have a reference (myObject) to it so they are not eligible for GC?
Where as in Approach Two with every loop iteration you are removing reference from the object created in previous iteration. so surely objects start becoming eligible after the first loop iteration.
Or is it a trade off between performance and code readability & maintainability?
What have I misunderstood?
Note:
Assuming I care about performance and myObject is not needed after the loop.
Thanks In Advance
If there are e.g. 100000 items then 100000 objects will be created in Approach One and each object will have a reference (myObject) to it so they are not eligible for GC?
No, from Garbage Collector's point of view both the approaches work the same i.e. no memory is leaked. With approach two, as soon as the following statement runs
myObject = new MyObject();
the previous MyObject that was being referenced becomes an orphan (unless while using that Object you passed it around, say, to another method where that reference was saved) and is eligible for garbage collection.
The difference is that once the loop runs out you would have the last instance of MyObject still reachable through the myObject reference originally created outside the loop.
Does GC know when references go out of scope during the loop execution or it can only know at the end of method?
First of all there's only one reference, not references. It's the objects that are getting unreferenced in the loop. Secondly, the garbage collection doesn't kick in spontaneously. So forget the loop, it may not even happen when the method exits.
Notice that I said, orphan objects become eligible for gc, not that they get collected immediately. Garbage collection never happens in real time, it happens in phases. In the mark phase, all the objects that are not reachable through a live thread anymore are marked for deletion. Then in the sweep phase, memory is reclaimed and additionally compacted much like defragmenting a hard drive. So, it works more like a batch rather than piecemeal operations.
GC isn't bothered about scopes or methods as such. It only looks for unreferenced objects and it does so when it feels like doing it. You can't force it. The only thing that you can be sure of is that GC would run if the JVM is running out of memory but you can't pin exactly when it would do so.
But, all this does not mean that GC can't kick in while the method executes or even while the loop is running. If you had, say, a Message Processor that processed 10,000 messages every 10 mins or so and then slept in between i.e. the bean waits within the loop, does 10,000 iterations and then waits again; GC would definitely kick into action to reclaim memory even though the method hasn't run to completion yet.
You have misunderstood when objects become eligible for GC - they do this when they are no longer reachable from an active thread. In this context that means:
When the only reference to them goes out of scope (approach 1).
When the only reference to them is assigned another value (approach 2).
So, the instance of MyObject would be eligible for GC at the end of each loop iteration whichever approach was used. The difference (theoretically) between the two approaches is that the JVM would have to allocate memory for a new object reference each iteration in approach 1 but not in approach 2. However, this assumes the Java compiler and/or Just-In-Time compiler is not smart to optimise approach 1 to actually act like approach 2.
In any case, I would go for the more readable and less error prone approach 1 on the grounds that:
The performance overhead for a single object reference allocation is tiny.
It will probably get optimised away anyway.
In both approaches objects will get Garbage collected.
In Approach 1: As and when for loop exits , all the local variable inside for loop get Garbage collected , as the loop ends.
In Approach 2 : As when new new reference is assigned to myObject variable the earlier has no proper reference .So that earlier get garbage collected and so on until loop runs.
So in both approaches there is no performance bottle neck.
I would not expect declaring the variable inside a block to have a detrimental impact on performance.
At least notionally the JVM allocates the stack frame at the start of the method and destroys it at the end. By implication will have the cumulative size to accommodate all the local variables.
See section 2.6 in here:
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html
That is consistent with other languages such as C where resizing the stack frame as the function/method executes is an overhead with no apparent return.
So wherever you declare it shouldn't make a difference.
Indeed declaring variables in blocks may help the compiler realize that the effective size of the stack frame can be smaller:
void foo() {
int x=6;
int y=7;
int z=8;
//.....
}
Versus
void bar() {
{
int x=6;
//....
}
{
int y=7;
//....
}
{
int z=8;
//....
}
}
Notice that bar() clearly only needs one local variable not 3.
Though making the stack frame smaller is unlikely to have any real influence on performance!
However when a reference goes out of the scope may make the object it references available for garbage collection. You would otherwise need to set references to null which is an untidy and unnecessary bother (and tinsy weenie overhead).
Without question you should declare variables inside a loop if (and only if) you don't need to access them outside the loop.
IMHO blocked statements (like bar has above) are under used.
If a method proceeds in stages you can protect the later stages from variable pollution using blocks.
With suitable (short) comments it can often be more readable (and efficient) way of structuring code than breaking it down it lost of private methods.
I have a chunky algorithm (Hashlife) where making earlier artifacts available for garbage collection during the method can make the difference between getting to the end and getting OutOfMemoryError.
I have an application in which the number of java.util.LinkedList$Entry objects seems to be steadily increasing. This application contains a method that contains the following code:
final List<Boolean> correctnessList = new ArrayList<Boolean>();
final List<Double> discriminationList = new ArrayList<Double>();
final List<Double> difficultyList = new ArrayList<Double>();
final List<Double> guessingList = new ArrayList<Double>();
.
.
.
for (ItemData datum : candidateItemData) {
.
.
.
correctnessList.add(datum.isCorrect);
discriminationList.add(iRTParameter.discrimination);
difficultyList.add(iRTParameter.difficulty);
guessingList.add(iRTParameter.guessing);
.
.
.
}
The method that contains this code is called many, many times. And, of course, each time the method returns, the List objects go out of scope, and, presumably, are available for garbage collection.
However, as I said, the number of java.util.LinkedList$Entry objects seems to be steadily increasing.
Have I created a memory leak here? Should I call some method on the List objects at the end of the method so that the LinkedList$Entry objects can be garbage collected?
No, you don't need to do any explicit de-initialization for the objects to be claimable.
Your best bet is to find out why the elements are not garbage collected. To do this, use your prefered memory profiler, take a snapshot and try to trace some of those elements path to the nearest GC route (personally I'd suggest VisualVM, since it's relatively simple to use and still powerful enough for many things).
Also: in your sample you use ArrayList as your List implementation. That implementation does not use Entry objects. So you need to check where in your code you use a LinkedList.
Did you check whether the garbage collector actually ran? Under normal circumstances the JVM decides to run the gc at regular intervals or when memory is scarce.
And as Joachim Sauer said, there might be some dangling reference from an active thread to your lists. So if the gc ran but did not collect at least some of those objects (it might sometimes not collect all objects that are eligible for gc, so that's not generally a problem) you should check where the references are.
We once had a problem with database connection entries that were closed but not released and thus held tons of data in some maps. Getting rid of the references to those connections helped in that case, but it was only obvious, when we imployed a memory tracing tool (JProbe in our case).
It looks like no memory leaks here. It depends on how you use the result of this. In general, garbage collector will collect all of them.
From the other hand it will be better for memoty usage and hadling when it will be only one list and the data will be wrapped into structure contains these fields. When it will be adding the 17-th item - new blok will be allocated in memory and previous items will be moved to the new block of memory. So it is better to make it only once instead 4 time.
And the last notice is that it is better to use constructor where you can provide count of items. It will allocate appropriative block of memory. It will avoid possible reallocations in when you will fill the collection.
I have a class with a static member like this:
class C
{
static Map m=new HashMap();
{
... initialize the map with some values ...
}
}
AFAIK, this would consume memory practically to the end of the program. I was wondering, if I could solve it with soft references, like this:
class C
{
static volatile SoftReference<Map> m=null;
static Map getM() {
Map ret;
if(m == null || (ret = m.get()) == null) {
ret=new HashMap();
... initialize the map ...
m=new SoftReference(ret);
}
return ret;
}
}
The question is
is this approach (and the implementation) right?
if it is, does it pay off in real situations?
First, the code above is not threadsafe.
Second, while it works in theory, I doubt there is a realistic scenario where it pays off. Think about it: In order for this to be useful, the map's contents would have to be:
Big enough so that their memory usage is relevant
Able to be recreated on the fly without unacceptable delays
Used only at times when other parts of the program require less memory - otherwise the maximum memory required would be the same, only the average would be less, and you probably wouldn't even see this outside the JVM since it give back heap memory to the OS very reluctantly.
Here, 1. and 2. are sort of contradictory - large objects also take longer to create.
This is okay if your access to getM is single threaded and it only acts as a cache.
A better alternative is to have a fixed size cache as this provides a consistent benefit.
getM() should be synchronized, to avoid m being initialized at the same time by different threads.
How big is this map going to be ? Is it worth the effort to handle it this way ? Have you measured the memory consumption of this (for what it's worth, I believe the above is generally ok, but my first question with optimisations is "what does it really save me").
You're returning the reference to the map, so you need to ensure that your clients don't hold onto this reference (and prevent garbage collection). Perhaps your class can hold the reference, and provide a getKey() method to access the content of the map on behalf of clients ? That way you'll maintain control of the reference to the map in one place.
I would synchronise the above, in case the map gets garbage collected and two threads hit getMap() at the same time. Otherwise you're going to create two maps simultaneously!
Maybe you are looking for WeakHashMap? Then entries in the map can be garbage collected separately.
Though in my experience it didn't help much, so I instead built an LRU cache using LinkedHashMap. The advantage is that I can control the size so that it isn't too big and still useful.
I was wondering, if I could solve it with soft references
What is it that you are trying to solve? Are you running into memory problems, or are you prematurely optimizing?
In any case,
The implementation should be altered a bit if you were to use it. As has been noted, it isnt thread-safe. Multiple threads could access the method at the same time, allowing multiple copies of your collection to be created. If these collections were then strongly referenced for the remainder of your program you would end up with more memory consumption, not less
A reason to use SoftReferences is to avoid running out of memory, as there is no contract other than that they will be cleared before the VM throws an OutOfMemoryError. Therefore there is no guaranteed benefit of this approach, other than not creating the cache until it is first used.
The first thing I notice about the code is that it mixes generic with raw types. That is just going to lead to a mess. javac in JDK7 has -Xlint:rawtypes to quickly spot that kind of mistake before trouble starts.
The code is not thread-safe but uses statics so is published across all threads. You probably don' want it to be synchronized because the cause problems if contended on multithreaded machines.
A problem with use a SoftReference for the entire cache is that you will cause spikes when the reference is cleared. In some circumstances it might work out better to have ThreadLocal<SoftReference<Map<K,V>>> which would spread the spikes and help-thread safety at the expense of not sharing between threads.
However, creating a smarter cache is more difficult. Often you end up with values referencing keys. There are ways around this bit it is a mess. I don't think ephemerons (essentially a pair of linked References) are going to make JDK7. You might find the Google Collections worth looking at (although I haven't).
java.util.LinkedHashMap gives an easy way to limit the number of cached entries, but is not much use if you can't be sure how big the entries are, and can cause problems if it stops collection of large object systems such as ClassLoaders. Some people have said you shouldn't leave cache eviction up to the whims of the garbage collector, but then some people say you shouldn't use GC.