What is retained size for an object on heapdump? - java

I've recently increase my use of the Profiler in Netbeans (6.7), this is a great tool.
I have a question however. When taking a heap dump, on the summary page (expect window) it is possible to 'find the biggest objects by retained size'.
What is this value and how is it used to analyze memory usage?

The retained size for an object is the quantity of memory this objects preserves from garbage collection.
The formal definition is "the size of the object plus the size of all objects referenced only by the first object, recursively".
For more explanations about what the retained memory is, see this article.
One easy way to remember it is that the retained memory is all the memory that could be garbage collected if this object was dereferenced.

Related

how much memory is a stack object allocated upon creation?

I was wondering- when defining a new stack through the stack class
Stack stack=new Stack();
How much memory is allocated to it? It cannot depend on the amount of N objects (like arrays and lists, for example) because it is initialized without any data regarding the amount of objects that would be placed in.
However, it also doesn't make a lot of sense that it'd have a fixed amount of memory like an intor double for example, because you constantly place objects in it.
Does push command increases the memory allocation of the stack?
I assume it is placed in the 'heap' memory?
Thanks!
I'm speaking from C#, so bear with me; Whenever you allocate memory for a local variable, it gets allocated on the stack, heap is for things like objects, which allocates a reference to the object and then the actual object, then the object reference gets used by the garbage collector to go through and figure out what objects need to be cleaned up and which ones don't.
In this case, I believe you are allocating the object on the heap, because all a "stack" object is, is a filo data structure.
Stacks in Java only store primitives that exist within a local scope, ergo the stack size in Java is usually pretty small, the size however depends on several factors and is variable at runtime, the initial size for example is typically calculated based on how much memory the compiler thinks it will need to run, then as it grows it will increase in size (I think Windows for example increases the stack by pages, which is 256 bytes of memory, but don't hold me to that.)
In your case, since you are asking about the initial size of an uninitialized stack object, the size is the size of the stack object, and it changes as you add elements to it.
Hope that helps.
Stack extends Vector, and Stack() calls Vector() implicitly, which uses a default initial capacity of 10.
Stack inherits from Vector. The default constructor of Vector initializes an array with size 10.

Explain the difference between contiguous allocation (memory held in a heap in the stack) vs. memory in a heap [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Explain the difference between contiguous allocation (memory held in a heap in the stack) vs. memory in a heap.
I'm new at this and not entirely sure.
The question isn't contiguous versus heap, but automatic versus heap.
Automatic storage is set up on entry to a block of code -- traditionally on entry to a function or method -- and discarded when that function returns, so its memory space on the stack can be reused by the next function call. That's how most local variables are handled. Obviously this isn't useful for anything which is intended to persist past the end of that function call.
In Java, objects are never allocated from automatic storage. Instead, they are allocated from the heap, on demands, when the new operation is performed. There are several reasons for this which, frankly, unless you're designing a programming language you don't really need to know about and it's too large a topic to cover here. The important thing is that since they were obtained from the heap, their lifetime is independent of the stack frame. Since Java is a garbage-collected language, their memory will be automatically recovered for reuse sometime after that last reference to them goes away -- again, the details are too large a topic to cover here, but basically you can trust that the GC comes through periodically to pick up the clothes we dropped on the floor and toss them into the laundry.
A stack frame only exists for the life of a method call, which means that memory is allocated to provide storage for all your local variables and method parameters that are used in some way that assist helping the method achieve its goals of whatever task it set out to achieve.
Examples of memory storage in a stack frame are temporary pointers that are used to keep track of an index position in an array which you are iterating through. Once the loop is finished, the stack frame would be popped off the stack, which means all the temporary memory allocated for the local variables and method parameters that existed are released back into the system.
The heap is different because it is where objects live, not "pointers" to objects.
When I was learning I found it hard to work out the difference between the two.
The key point that helped me was that, a pointer to an object is kept in a stack frame, it has a little bit of temporary memory allocated that exists for the life time of the method call. Thus, you can only access an object when the method is in "scope".
The pointer contains a memory address that leads to the location of the object stored on the heap. This allows you to reference the object to change the objects state at a later time.
public static void main(String[] args)
{
Person person = new Person("Steven", 30);
}
When you run this program:
new keyword means java will allocate memory on the heap for the space required to store the Person objects instance variables.
The important part to understand is, no memory is required to store an objects methods. When a method is called, a new stack frame is created which allocates temporary memory for the duration of the method call. Using the example above, a Person has 2 instance variables, a String name and int age. This means that the memory required for this Person object is the required memory to store a reference variable of type String (bit pattern of the memory address of a String object on the heap) and memory to hold the bit pattern of an int.
Lastly, the main method is a stack frame too, so when main finishes, you no longer have a reference to a Person object or access to any temporary variables that may have existed in main.
This is true for any method, if you have a method that creates an object but doesn't return the reference to that object, then you can never access the object and the java garbage collector comes along at a later time and cleans up all the objects on the heap that don't have references pointing to them.
If you are starting out, I highly recommend head first java. It is a great book IMO and covers these topics in easy to understand ways.
When we talk about memory or disc allocation, the word "contiguous" simply means "without any gaps".
A single stack or heap memory allocation is always contiguous ... in every programming language runtime I've ever encountered where it makes sense to talk about allocations at all.
A sequence of allocations is contiguous if there is no gap between the individual allocations.
This is orthogonal to stack versus heap. Both stack allocations and heap allocations can be contiguous ... or non-contiguous.
Well ... not quite orthogonal.
If you are talking about strictly contiguous memory addresses (physical or virtual), a typical heap node consists of the area of memory that the application can use, plus a small node header. So, if you look at the available memory for two consecutive heap nodes, there is a gap ... comprising the node header ... that prevents the two regions being used by the application as a single contiguous region. (And you'd better not try 'cos if you overwrite the node header, bad things could happen.)
However, when we are talking about Java this is not relevant. Java does not allow an application to join objects or arrays together. (That would be a fundamental violation of runtime type safety.) So the notional gap in the address ranges doesn't matter. In the Java context, we would say that two objects are contiguous, ignoring the heap node / object header.
Besides, in Java you can't explicitly allocate things on the heap either. In a classical JVM, only local variables comprising primitive types and references go on the stack. There is no way to say "allocate this array on the stack". (The JVM might do the latter under certain circumstances, but it is entirely transparent to the application, and certainly not something that you could make use of.)

How actually card table and writer barrier work?

I am reading some materials about garbage collection in Java in order to get to know more deeply what really happens in GC process.
I came across on the mechanism called "card table". I've Googled for it and haven't found comprehensive information. Most of explanations are rather shallow and describes it like some magic.
My question is: How card table and write barrier works? What is marked in card tables? How then garbage collector knows that particular object is referenced by another object persisted in older generation.
I would like to have some practical imagination about that mechanism, like I was supposed to prepare some simulation.
I don't know whether you found some exceptionally bad description or whether you expect too many details, I've been quite satisfied with the explanations I've seen. If the descriptions are brief and sound simplistic, that's because it really is a rather simple mechanism.
As you apparently already know, a generational garbage collector needs to be able to enumerate old objects that refer to young objects. It would be correct to scan all old objects, but that destroys the advantages of the generational approach, so you have to narrow it down. Regardless of how you do that, you need a write barrier - a piece of code executed whenever a member variable (of a reference type) is assigned/written to. If the new reference points to a young object and it's stored in an old object, the write barrier records that fact for the garbage collect. The difference lies in how it's recorded. There are exact schemes using so-called remembered sets, a collection of every old object that has (had at some point) a reference to a young object. As you can imagine, this takes quite a bit of space.
The card table is a trade-off: Instead of telling you which objects exactly contains young pointers (or at least did at some point), it groups objects into fixed-sized buckets and tracks which buckets contain objects with young pointers. This, of course, reduces space usage. For correctness, it doesn't really matter how you bucket the objects, as long as you're consistent about it. For efficiency, you just group them by their memory address (because you have that available for free), divided by some larger power of two (to make the division a cheap bitwise operation).
Also, instead of maintaining an explicit list of buckets, you reserve some space for each possible bucket up-front. Specifically, there is an array of N bits or bytes, where N is the number of buckets, so that the ith value is 0 if the ith bucket contains no young pointers, or 1 if it does contain young pointers. This is the card table proper. Typically this space is allocated and freed along with a large block of memory used as (part of) the heap. It may even be embedded in the start of the memory block, if it doesn't need to grow. Unless the entire address space is used as heap (which is very rare), the above formula gives numbers starting from start_of_memory_region >> K instead of 0, so to get an index into the card table you have to subtract the start of the start address of the heap.
In summary, when the write barrier finds that the statement some_obj.field = other_obj; stores a young pointer in an old object, it does this:
card_table[(&old_obj - start_of_heap) >> K] = 1;
Where &old_obj is the address of the object that now has a young pointer (which is already in a register because it was just determined to refer to an old object).
During minor GC, the garbage collector looks at the card table to determine which heap regions to scan for young pointers:
for i from 0 to (heap_size >> K):
if card_table[i]:
scan heap[i << K .. (i + 1) << K] for young pointers
Sometime ago I have written an article explaining mechanics of young collection in HotSpot JVM.
Understanding GC pauses in JVM, HotSpot's minor GC
Principle of dirty card write-barrier is very simple. Each time when program modifies reference in memory, it should mark modified memory page as dirty. There is a special card table in JVM and each 512 byte page of memory has associated one byte entry in card table.
Normally collection of all references from old space to young would require scanning through all objects in old space. That is why we need write-barrier. All objects in young space have been created (or relocated) since last reset of write-barrier, so non-dirty pages cannot have references into young space. This means we can scan only object in dirty pages.
For anyone who is looking for a simple answer:
In JVM, the memory space of objects is broken down into two spaces:
Young generation (space): All new allocations (objects) are created inside this space.
Old generation (space): This is where long lived objects exist (and probably die)
The idea is that, once an object survives a few garbage collection, it is more likely to survive for a long time. So, objects that survive garbage collection for more than a threshold, will be promoted to old generation. The garbage collector runs more frequently in the young generation and less frequently in the old generation. This is because most objects live for a very short time.
We use generational garbage collection to avoid scanning of the whole memory space (like Mark and Sweep approach). In JVM, we have a minor garbage collection which is when GC runs inside the young generation and a major garbage collection (or full GC) which encompasses garbage collection of both young and old generations.
When doing minor garbage collection, JVM follows every reference from the live roots to the objects in the young generation, and marks those objects as live, which excludes them from the garbage collection process. The problem is that there may be some references from the objects in the old generation to the objects in young generation, which should be considered by GC, meaning those objects in young generation that are referenced by objects in old generation should also be marked as live and excluded from the garbage collection process.
One approach to solve this problem is to scan all of the objects in the old generation and find their references to young objects. But this approach is in contradiction with the idea of generational garbage collectors. (Why we broke down our memory space into multiple generations in the first place?)
Another approach is using write barriers and card table. When an object in old generation writes/updates a reference to an object in the young generation, this action goes through something called write barrier. When JVM sees these write barriers, it updates the corresponding entry in the card table. Card table is a table, which each one of its entries correspond to 512 bytes of the memory. You can think of it as an array containing 0 and 1 items. A 1 entry means there is an object in the corresponding area of the memory which contains references to objects in young generation.
Now, when minor garbage collection is happening, first every reference from the live roots to young objects are followed and the referenced objects in young generation will be marked as live. Then, instead of scanning all of the old object to find references to the young objects, the card table is scanned. If GC finds any marked area in the card table, it loads the corresponding object and follows its references to young objects and marks them as live either.

Actual memory allocation by JVM and how do they differ?

This might seem a lot of questions but they are all interrelated.I'm little confused as in where is the heap space allocated and where is the stack memory located ? If both are present in main memory then why it is said that stack memory is easier to access and why can't we allocate objects in stack memory ? Since classes are stored in PermGen where is this space allocated and how does it differ from heap space and where are constant strings stored ?
"Where are the heap and stack allocated?" The accepted answer to this question covers this. Each thread gets its own stack and they all share one heap. The operating system controls the exact memory locations of the stacks and heap items and it varies.
"Why is stack memory easier to access" Each thread has its own stack, so there are fewer concurrency issues. The stack and heap are both eligible for caching in the L1, L2, and L3 portions of the memory hierarchy, so I disagree with Daniel's answer here. Really I would not say that one kind of memory is particularly easier to access than the other.
"Why can't we allocated objects in stack memory?" This is a design decision taken by the JVM. In other languages like C/C++ you can allocate objects on the stack. Once you return from the function that allocated that stack frame such objects are lost. A common source of errors in C/C++ programs is sharing a pointer to such a stack allocated object. I bet that's why the JVM designers made this choice, though I am not sure.
The PermGen is another piece of the heap. Constant strings are stored here for the lifetime of the JVM. It is garbage collected just like the rest of the heap.
If both are present in main memory then why it is said that stack memory is easier to access
There's speed of access and speed of allocation. Stack allocation (as in alloca) is fast because there's no need to search for an unused block of memory. But Java doesn't allow stack allocation, unless you count the allocation of new stack frames.
Accessing stack memory is fast because it tends to be cached. Not only are locals near one another, they are also stored very compactly.
and why can't we allocate objects in stack memory ?
This would be useful, but dangerous. A user could allocate an object on the stack, create references to it from permanent objects, and then try to access the object after the associated stack frame is gone.
It's safe to store primitives on the stack because we can't create references to them from elsewhere.
Since classes are stored in PermGen where is this space allocated and how does it differ from heap space and where are constant strings stored ?
PermGen is just another heap space. String literals are stored in the literal pool, which is just a table in memory which is allocated when a class is loaded.

How does Java Garbage collector handle self-reference?

Hopefully a simple question. Take for instance a Circularly-linked list:
class ListContainer
{
private listContainer next;
<..>
public void setNext(listContainer next)
{
this.next = next;
}
}
class List
{
private listContainer entry;
<..>
}
Now since it's a circularly-linked list, when a single elemnt is added, it has a reference to itself in it's next variable. When deleting the only element in the list, entry is set to null. Is there a need to set ListContainer.next to null as well for Garbage Collector to free it's memory or does it handle such self-references automagically?
Garbage collectors which rely solely on reference counting are generally vulnerable to failing to collection self-referential structures such as this. These GCs rely on a count of the number of references to the object in order to calculate whether a given object is reachable.
Non-reference counting approaches apply a more comprehensive reachability test to determine whether an object is eligible to be collected. These systems define an object (or set of objects) which are always assumed to be reachable. Any object for which references are available from this object graph is considered ineligible for collection. Any object not directly accessible from this object is not. Thus, cycles do not end up affecting reachability, and can be collected.
See also, the Wikipedia page on tracing garbage collectors.
Circular references is a (solvable) problem if you rely on counting the references in order to decide whether an object is dead. No java implementation uses reference counting, AFAIK. Newer Sun JREs uses a mix of several types of GC, all mark-and-sweep or copying I think.
You can read more about garbage collection in general at Wikipedia, and some articles about java GC here and here, for example.
The actual answer to this is implementation dependent. The Sun JVM keeps track of some set of root objects (threads and the like), and when it needs to do a garbage collection, traces out which objects are reachable from those and saves them, discarding the rest. It's actually more complicated than that to allow for some optimizations, but that is the basic principle. This version does not care about circular references: as long as no live object holds a reference to a dead one, it can be GCed.
Other JVMs can use a method known as reference counting. When a reference is created to the object, some counter is incremented, and when the reference goes out of scope, the counter is decremented. If the counter reaches zero, the object is finalized and garbage collected. This version, however, does allow for the possibility of circular references that would never be garbage collected. As a safeguard, many such JVMs include a backup method to determine which objects actually are dead which it runs periodically to resolve self-references and defrag the heap.
As a non-answer aside (the existing answers more than suffice), you might want to check out a whitepaper on the JVM garbage collection system if you are at all interested in GC. (Any, just google JVM Garbage Collection)
I was amazed at some of the techniques used, and when reading through some of the concepts like "Eden" I really realized for the first time that Java and the JVM actually could beat C/C++ in speed. (Whenever C/C++ frees an object/block of memory, code is involved... When Java frees an object, it actually doesn't do anything at all; since in good OO code, most objects are created and freed almost immediately, this is amazingly efficient.)
Modern GC's tend to be very efficient, managing older objects much differently than new objects, being able to control GCs to be short and half-assed or long and thorough, and a lot of GC options can be managed by command line switches so it's actually useful to know what all the terms actually refer to.
Note: I just realized this was misleading. C++'s STACK allocation is very fast--my point was about allocating objects that are able to exist after the current routine has finished (which I believe SHOULD be all objects--it's something you shouldn't have to think about if you are going to think in OO, but in C++ speed may make this impractical).
If you are only allocating C++ classes on the stack, it's allocation will be at least as fast as Java's.
Java collects any objects that are not reachable. If nothing else has a reference to the entry, then it will be collected, even though it has a reference to itself.
yes Java Garbage collector handle self-reference!
How?
There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.
A simple Java application has the following GC roots:
Local variables in the main method
The main thread
Static variables of the main class
To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows
The algorithm traverses all object references, starting with the GC
roots, and marks every object found as alive.
All of the heap memory that is not occupied by marked objects is
reclaimed. It is simply marked as free, essentially swept free of
unused objects.
So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.
Simply, Yes. :)
Check out http://www.ibm.com/developerworks/java/library/j-jtp10283/
All JDKs (from Sun) have a concept of "reach-ability". If the GC cannot "reach" an object, it goes away.
This isn't any "new" info (your first to respondents are great) but the link is useful, and brevity is something sweet. :)

Categories