Well, each JVM implementation may have a different strategy to layout objects and arrays in memory.
The HotSpot JVM uses a data structure called Ordinary Object Pointers (OOPS) to represent pointers to objects.
Each oopDesc describes the pointer with the following information:
One mark word
One, possibly compressed, klass word
A mark word describes the object header. The HotSpot JVM uses this word to store identity hashcode, biased locking pattern, locking information, and GC metadata.
But I can't understated where associated with object wait set is stored.. Can anyone explain?
Related
When we have to save an object in persistent storage or transfer across network, we serialize the object (In this case, the object should implement serializable interface). My question is why there is the restriction that the object class should implement serializable interface. Can't we serialize any random object by taking its memory dump as it is.
My question is why there is the restriction that the object class should implement serializable interface.
Because certain kinds of object are fundamentally not serializable in a HotSpot JVM. Instances of classes like Thread, Process, Socket, and Class. (In some cases, serialization is conceivable but not practical. For others, it is inconceivable because the objects' behavior depends on state that is not accessible to Java.)
Can't we serialize any random object by taking its memory dump as it is.
A few of reasons spring to mind (assuming that we are talking on a JVM based on HotSpot).
Instances of the above classes could not be properly serialized that way.
If you copy a memory snapshot, it has to be copied to the same address in the destination address space. Otherwise the pointers will be broken.
Each object has an object header which includes a special reference to the object's class. The references are liable to be different.
To take copy a block of memory containing objects, you would need to disable the GC at both the sending and receiving end, and also all other application threads. It would be a stop the world event ... at both ends.
The objects that you want to copy are unlikely to be allocated in the same block of memory.
Some of the above could be addressed with a radically different JVM architecture, but there some deeper problems as well. (Plus this approach could only ever work between "like" JVM implementations.)
It is just not practical as an implementation approach for serializing / deserializing Java objects.
What contributes to the size of a single object in memory?
I know that primitives and references would, but is there anything else?
Would the number of methods and the length of them matter?
This is completely implementation-dependent, but there are a few factors that influence object size in Java.
First, the number and types of the fields in the Java object definitely influence space usage, since you need to have at least as much storage space as is necessary to hold all of the object's fields. However, due to padding, alignment, and pointer compression optimizations, there is no direct formula you can use to compute precisely how much space is being used this way.
As for methods, typically speaking the number of methods in an object has no impact on its size. Methods are often implemented using a feature called virtual function tables (or "vtables") that make it possible to invoke methods through a base class reference in constant time. These tables are usually stored by having a single instance of the vtable shared across multiple objects, then having each object store a single pointer to the vtable.
Interface methods complicate this picture a bit, because there are several different implementations possible. One implementation adds a new vtable pointer for each interface, so the number of interfaces implemented may affect object size, while others do not. Again, it's implementation dependent how things are actually put together in memory, so you can't know for certain whether or not this will have a memory cost.
To the best of my knowledge there are no implementations of the JVM in existence today in which the length of a method influences the size of an object. Typically, only one copy of each method is stored in memory, and the code is then shared across all instances of a particular object. Having longer methods might require more total memory, but should not impact the per-object memory for instances of a class. That said, the JVM spec makes no promises that this must be the case, but I can't think of a reasonable implementation that would expend extra space per object for method code.
In addition to fields and methods, many other factors could contribute to the size of an object. Here's a few:
Depending on what type of garbage collector (or collectors) that the JVM is using, each object might have extra storage space to hold information about whether the object is live, dead, reachable, etc. This can increase storage space, but it's out of your control. In some cases, the JVM might optimize object sizes by trying to store the object on the stack instead of the heap. In this case, the overhead may not even be present for some types of objects.
If you use synchronization, the object might have extra space allocated for it so that it can be synchronized on. Some implementations of the JVM don't create a monitor for an object until it's necessary, so you may end up having smaller objects if you don't use synchronization, but you cannot guarantee that this will be the case.
Additionally, to support operators like instanceof and typecasting, each object may have some space reserved to hold type information. Typically, this is bundled with the object's vtable, but there's no guarantee that this will be true.
If you use assertions, some JVM implementations will create a field in your class that contains whether or not assertions are enabled. This is then used to disable or enable assertions at runtime. Again, this is implementation-specific, but it's good to keep in mind.
If your class is a nonstatic inner class, it may need to hold a reference to the class that contains it so that it can access its fields. However, the JVM might optimize this away if you never end up using this.
If you use an anonymous inner class, the class may need to have extra space reserved to hold the final variables that are visible in its enclosing scope so that they can be referenced inside the class. It's implementation-specific whether this information is copied over into the class fields or just stored locally on the stack, but it can increase object size.
Some implementations of Object.hashCode() or System.identityHashCode(Object) may require extra information to be stored in each object that contains the value of that hash code if it can't compute it any other way (for example, if the object can be relocated in memory). This might increase the size of each object.
To add a bit of (admittedly vague) data to #templatetypedef's excellent answer. These numbers are for typical recent 32-bit JVMs, but they are implementation specific:
The header overhead for each object typically 2 words for a regular object and 3 words for an array. The header includes GC related flags, and some kind of pointer to the object's actual class. For an array, an extra word is needed to hold the array size.
If you've called (directly or indirectly) System.identityHashCode() on an object, and it has survived a GC cycle, then add an extra word to store the hashcode value. (Modern JVMs use a clever trick to avoid reserving a hashcode header field for all objects ...)
The storage allocation granularity may be a multiple of words; e.g. 2.
Fields of an object are typically word aligned; i.e. they are not packed.
Elements of an array of a primitive type are packed, but booleans are typically represented by a byte in packed form.
References occupy 4 bytes both as fields and as array elements.
Things are a bit more complicated for 64-bit JVMs because of pointer compression (OOPS) in some JVMs. Also, I'm not sure if fields 32 or 64 bit aligned.
(Note: the above are based on what I've heard / read in various places from various "knowledgeable people". There is no definitive source for this kind of information apart from Oracle / Sun, and (AFAIK) they haven't published anything.)
Check out java.sizeOf in sourceforge here: http://sizeof.sourceforge.net/
AFAIK, in HBase source code, there is some caculation about object size based on the some common known rules how different fields occupies the spaces. And it will be different in 32bit or 64bit OS. At least above people all know. But I didn't look into details why they do that. But they really did it in the source code.
Besides,Java.lang.intrument.Intrumentation Class can do it also by getObjectSize(). I guess the open source project is also based on it.
In this link,there is details of how to use it.
In Java, what is the best way to determine the size of an object?
As a comment. Actually I am also interested in if you do it in the source code, what will be the most meaningful use case?
A friend of mine and I have the following bet going:
It is possible to get the Object again from the memory by using the Identity Hashcode received for that Object using System.identityHashCode() in Java. With the restriction that it has not yet been cleaned up by the Garbage Collector.
I have been looking for an answer for quite some while now and am not able to find a definite one.
I think that it might be possible to do so using the JVMTI, but I havn't yet worked with it.
Does anyone of you have an answer to that? Will buy you a coffie, if I can do so on your site ;)
Thanks in advance,
Felix
p.s: I am saying this behaviour can be achieved and the friend of mine says it is not possible
In theory it is possible however you have some issues.
it is randomly generated so it is not unique. Any number of objects (though unlikely) could have the same identity hash code.
it is not a memory location, it doesn't change when moved from Eden, around the Survivors spaces or in tenured space.
you need to find all the object roots to potentially find it.
If you can assume it is visible to a known object like a static collection, it should be easy to navigate via reflection.
BTW Once the 64-bit OpenJDK/Oracle JVM, the identity hash code is stored in the header from offset 1, this means you can read it, or even change it using sun.misc.Unsafe. ;)
BTW2 The 31-bit hashCode (not 32-bit) stored in the header is lazily set and is also used for biased locking. i.e. once you call Object.hashCode() or System.identityHashCode() you disable biased locking for the object.
I think your friend is going to win this bet. Java/the JVM manages the memory for you and there is no way to access it once you drop all your references to something.
Phantom References, Weak References, etc are all designed to allow just what you are describing - so if you keep a Weak or Phantom reference to something you can. identityHashCode is neither though.
C and C++ might let you do this since you have more direct control of the memory, but even then you would need the memory location not a hash of it.
No, because the identityHashCodes are not necessarily unique. They are not pointers to the objects.
No. The identityHashCode is not necessarily a memory address: it is only the default implementation of hashCode. It is also not guaranteed to be unique for all objects (but different instances should have different identityHashCodes).
Even if the identityHashCode is derived from a memory address, the object may be reallocated (but the identityHashCode cannot change, by definition).
I was looking into this and just wondering does Java provide any construct to find out the size of an object?
Unfortunately not. It's relatively complex. e.g. if I create a String object, I have to consider:
the size of the fields of the objects. For primitives etc. that's simple
the size of objects referred to. Each member object is a reference, and not actually contained exclusively within the object under question. e.g. String contains a reference to a char array, but that char array can be shared across multiple Strings (see the source of substring() to understand how - this is known as the flyweight pattern)
the size of any native implementation details in the JVM
No. It goes against the concept of the language. In a real Object Oriented Programming language (not a hacked-together OOP support like C++), Objects are abstract concepts, not bits on the computer. Until and unless you serialize the object, it's treated like an actual object and not a sequence of bits.
Actually i think you can get the size of an object with the help of the Instrumentation class, but the process is a little more complex(you have to specify in the manifest file the premain class, define a Instrumetation agent etc.) compared to the one in C++ where you have at your use the sizeof() function.
One disadvantage is you get the size of the object but not the size of the referred objects(if it has any).
Another way would be, but this is rudimentary, to have your object under test serialised and wrote in a file and have the size of the file(use the ObjectOutputStream and get its size).For further documentation related to this subject and not only read a little bit about Java agents in general and Java probing.Probing the JVM is a very helpful technique especially if you want to make an analisys on performance(objects sizes, running threads, memory leaks etc.).
I wonder how weak references work internally, for example in .NET or in Java. My two general ideas are:
"Intrusive" - to add list of weak references to the most top class (object class). Then, when an object is destroyed, all the weak references can be iterated and set to null.
"Non-intrusive" - to maintain a hashtable of objects' pointers to lists of weak references. When a weak reference A is created to an object B, there would be an entry in the hashtable modified or created, whose key would be the pointer to B.
"Dirty" - to store a special hash-value with each object, which would be zeroed when the object is destroyed. Weak references would copy that hash-value and would compare it with the object's value to check if the object is alive. This would however cause access violation errors, when used directly, so there would need to be an additional object with that hash-value, I think.
Either of these solutions seems clean nor efficient. Does anyone know how it is actually done?
In .NET, when a WeakReference is created, the GC is asked for a handle/opaque token representing the reference. Then, when needed, WeakReference uses this handle to ask the GC if that handle is still valid (i.e. the original object still exists) - and if so, it can get the actual object reference.
So this is building a list of tokens/handles against object addresses (and presumably maintaining that list during defragmentation etc)
I'm not sure I 100% understand the three bullets, so I hesitate to guess which (if any) that is closest to.
Not sure I understood your question, but you can have a look at the implementation for the class WeakReference and its superclass Reference in Java. It is well commented and you can see it has a field treated specially by the GC and another one used directly by the VM.
Python's PEP 205 has a decent explanation of how weak references should behave in Python, and this gives some insight into how they can be implemented. Since a weak reference is immutable, you could have just one for each object, to which you pass out references as needed. Thus, when the object is destroyed, only one weak reference needs to be invalidated.
It seems that implementation of weak references is well-kept secret in the industry ;-). For example, as of now, wikipedia article lacks any implementation details. And look at the answers above (including the accepted): "go look at the source" or "I think" ;-\ .
Of all the answers, only the one referencing Python's PEP 205 is insightful. As it says, for any single object, there can be at most one weak reference, if we treat weakref as an entity itself.
The rest describes Squirrel language implementation. So, weakref is itself an object, when you put weak reference to an object in some container, you actually put reference to weakref object. Each ref-countable object has field to store pointer to its weakref, which is NULL until weakref to that object is actually requested. Each object has method to request weakref, which either returns existing (singleton) weakref from the field, or creates it and caches in the field.
Of course, weakref points to the original object. So, then you just need to go thru all the available places where references to objects are handled and add transparent handling of weakrefs (i.e. automatically dereference it). ("Transparent" alternative is to add virtual "access" method which will be identity for most objects, and actual dereference for weakref.)
And as object has pointer to its weakref, then the object can NULLify the weakref in own destructor.
This implementation is pretty clean (no magic "calls into GC" and stuff) and has O(1) runtime cost. Of course, it's pretty greedy of memory - need to add +1 pointer field to each object, even though typically for 90+% objects that would be NULL. Of course, VHLLs already have large memory overhead per object, and there may be chance to compact different "extra" fields. For example, object type is typically a small enumeration, so it may be possible to merge type and some kind of weakref reference into single machine word (say, keep weakref objects in a separate arena, and use index to that).
The normal approach, I think, is for the system to maintain some sort of list of weak references. When the garbage collector executes, before dead objects are removed, the system iterates through the list of weak references and invalidates any reference whose target has not been tagged live. Depending upon the system, this may occur before or after the system temporarily resurrects objects which are eligible for immediate finalization (in the case of .net, there are two kinds of WeakReference--one of which is effectively processed before the system scans for finalizers, meaning that it will become invalid when its target becomes eligible for finalization, and one of which is processed after).
Incidentally, if I were designing a gc-based framework, I would add a couple of other goodies: (1) a means of declaring a reference-type storage location as holding a reference that's primarily of interest to someone else, and (2) A variety of WeakReference which would could indicate that the only references to an object are in "of interest to someone else" storage locations. Although WeakReference is a useful type, the act of turning a weak reference into a strong reference may prevent the system from ever recognizing that nobody would mind if its target disappeared.