A friend of mine and I have the following bet going:
It is possible to get the Object again from the memory by using the Identity Hashcode received for that Object using System.identityHashCode() in Java. With the restriction that it has not yet been cleaned up by the Garbage Collector.
I have been looking for an answer for quite some while now and am not able to find a definite one.
I think that it might be possible to do so using the JVMTI, but I havn't yet worked with it.
Does anyone of you have an answer to that? Will buy you a coffie, if I can do so on your site ;)
Thanks in advance,
Felix
p.s: I am saying this behaviour can be achieved and the friend of mine says it is not possible
In theory it is possible however you have some issues.
it is randomly generated so it is not unique. Any number of objects (though unlikely) could have the same identity hash code.
it is not a memory location, it doesn't change when moved from Eden, around the Survivors spaces or in tenured space.
you need to find all the object roots to potentially find it.
If you can assume it is visible to a known object like a static collection, it should be easy to navigate via reflection.
BTW Once the 64-bit OpenJDK/Oracle JVM, the identity hash code is stored in the header from offset 1, this means you can read it, or even change it using sun.misc.Unsafe. ;)
BTW2 The 31-bit hashCode (not 32-bit) stored in the header is lazily set and is also used for biased locking. i.e. once you call Object.hashCode() or System.identityHashCode() you disable biased locking for the object.
I think your friend is going to win this bet. Java/the JVM manages the memory for you and there is no way to access it once you drop all your references to something.
Phantom References, Weak References, etc are all designed to allow just what you are describing - so if you keep a Weak or Phantom reference to something you can. identityHashCode is neither though.
C and C++ might let you do this since you have more direct control of the memory, but even then you would need the memory location not a hash of it.
No, because the identityHashCodes are not necessarily unique. They are not pointers to the objects.
No. The identityHashCode is not necessarily a memory address: it is only the default implementation of hashCode. It is also not guaranteed to be unique for all objects (but different instances should have different identityHashCodes).
Even if the identityHashCode is derived from a memory address, the object may be reallocated (but the identityHashCode cannot change, by definition).
Related
What contributes to the size of a single object in memory?
I know that primitives and references would, but is there anything else?
Would the number of methods and the length of them matter?
This is completely implementation-dependent, but there are a few factors that influence object size in Java.
First, the number and types of the fields in the Java object definitely influence space usage, since you need to have at least as much storage space as is necessary to hold all of the object's fields. However, due to padding, alignment, and pointer compression optimizations, there is no direct formula you can use to compute precisely how much space is being used this way.
As for methods, typically speaking the number of methods in an object has no impact on its size. Methods are often implemented using a feature called virtual function tables (or "vtables") that make it possible to invoke methods through a base class reference in constant time. These tables are usually stored by having a single instance of the vtable shared across multiple objects, then having each object store a single pointer to the vtable.
Interface methods complicate this picture a bit, because there are several different implementations possible. One implementation adds a new vtable pointer for each interface, so the number of interfaces implemented may affect object size, while others do not. Again, it's implementation dependent how things are actually put together in memory, so you can't know for certain whether or not this will have a memory cost.
To the best of my knowledge there are no implementations of the JVM in existence today in which the length of a method influences the size of an object. Typically, only one copy of each method is stored in memory, and the code is then shared across all instances of a particular object. Having longer methods might require more total memory, but should not impact the per-object memory for instances of a class. That said, the JVM spec makes no promises that this must be the case, but I can't think of a reasonable implementation that would expend extra space per object for method code.
In addition to fields and methods, many other factors could contribute to the size of an object. Here's a few:
Depending on what type of garbage collector (or collectors) that the JVM is using, each object might have extra storage space to hold information about whether the object is live, dead, reachable, etc. This can increase storage space, but it's out of your control. In some cases, the JVM might optimize object sizes by trying to store the object on the stack instead of the heap. In this case, the overhead may not even be present for some types of objects.
If you use synchronization, the object might have extra space allocated for it so that it can be synchronized on. Some implementations of the JVM don't create a monitor for an object until it's necessary, so you may end up having smaller objects if you don't use synchronization, but you cannot guarantee that this will be the case.
Additionally, to support operators like instanceof and typecasting, each object may have some space reserved to hold type information. Typically, this is bundled with the object's vtable, but there's no guarantee that this will be true.
If you use assertions, some JVM implementations will create a field in your class that contains whether or not assertions are enabled. This is then used to disable or enable assertions at runtime. Again, this is implementation-specific, but it's good to keep in mind.
If your class is a nonstatic inner class, it may need to hold a reference to the class that contains it so that it can access its fields. However, the JVM might optimize this away if you never end up using this.
If you use an anonymous inner class, the class may need to have extra space reserved to hold the final variables that are visible in its enclosing scope so that they can be referenced inside the class. It's implementation-specific whether this information is copied over into the class fields or just stored locally on the stack, but it can increase object size.
Some implementations of Object.hashCode() or System.identityHashCode(Object) may require extra information to be stored in each object that contains the value of that hash code if it can't compute it any other way (for example, if the object can be relocated in memory). This might increase the size of each object.
To add a bit of (admittedly vague) data to #templatetypedef's excellent answer. These numbers are for typical recent 32-bit JVMs, but they are implementation specific:
The header overhead for each object typically 2 words for a regular object and 3 words for an array. The header includes GC related flags, and some kind of pointer to the object's actual class. For an array, an extra word is needed to hold the array size.
If you've called (directly or indirectly) System.identityHashCode() on an object, and it has survived a GC cycle, then add an extra word to store the hashcode value. (Modern JVMs use a clever trick to avoid reserving a hashcode header field for all objects ...)
The storage allocation granularity may be a multiple of words; e.g. 2.
Fields of an object are typically word aligned; i.e. they are not packed.
Elements of an array of a primitive type are packed, but booleans are typically represented by a byte in packed form.
References occupy 4 bytes both as fields and as array elements.
Things are a bit more complicated for 64-bit JVMs because of pointer compression (OOPS) in some JVMs. Also, I'm not sure if fields 32 or 64 bit aligned.
(Note: the above are based on what I've heard / read in various places from various "knowledgeable people". There is no definitive source for this kind of information apart from Oracle / Sun, and (AFAIK) they haven't published anything.)
Check out java.sizeOf in sourceforge here: http://sizeof.sourceforge.net/
AFAIK, in HBase source code, there is some caculation about object size based on the some common known rules how different fields occupies the spaces. And it will be different in 32bit or 64bit OS. At least above people all know. But I didn't look into details why they do that. But they really did it in the source code.
Besides,Java.lang.intrument.Intrumentation Class can do it also by getObjectSize(). I guess the open source project is also based on it.
In this link,there is details of how to use it.
In Java, what is the best way to determine the size of an object?
As a comment. Actually I am also interested in if you do it in the source code, what will be the most meaningful use case?
I was reading the JavaDoc for Object.hashCode method, it says that
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer [...])
But whatever its implementation is, hashCode method always returns a (let's assume positive) integer, so given Integer.MAX+1 different objects, two of them are going to have the same hashcode.
Why is the JavaDoc here "denying" collisions? Is it a practical conclusion given that internal address is used and "come on, you're never going to have Integer.MAX+1 objects in memory at once, so we can say it's practically always unique"?
EDIT
This bug entry (thank you Sleiman Jneidi) gives an exact idea of what I mean (it seems to be a more that 10 years old discussion):
appears that many, perhaps majority, of programmers take this to mean that the default implementation, and hence System.identityHashCode, will produce unique hashcodes.
The qualification "As much as is reasonably practical," is, in practice, insufficient to make clear that hashcodes are not, in practice, distinct.
The docs are misleading indeed, and there is a bug opened ages ago that says that the docs are misleading especially that the implementation is JVM dependent, and in-practice especially with massive heap sizes it is so likely to get collisions when mapping object identities to 32-bit integers
there is an interesting discussion of hashcode collisions here:
http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/
In particular, this highlights that your practical conclusion, "you're never going to have Integer.MAX+1 objects in memory at once, so we can say it's practically always unique" is a long way from accurate due to the birthday paradox.
The conclusion from the link is that, assuming a random distribution of hashCodes, we only need 77,163 objects before we have a 50/50 chance of hashCode collision.
When you read this carefully, you'll notice that this only means objects should try to avoid collisions ('as much as reasonably practical'), but also that you are not guaranteed to have different hashcodes for unequal objects.
So the promise is not very strong, but it is still very useful. For instance when using the hashcode as a quick indication of equality before doing the full check.
For instance ConcurrentHashMap which will use (a function performed on) the hashcode to assign a location to an object in the map. In practice the hashcode is used to find where an object is roughly located, and the equals is used to find the precisely located.
A hashmap could not use this optimization if objects don't try to spread their hashcodes as much as possible.
I try to do my best to explain my question. Maybe it's a bit abstract.
I read some literature about not invoking GC explictly in Java code, finalize method, pointing to null, etc.
I have some large XMLs files (customer invoices). Using Jaxb, the file marshals in a complex Java object. Its attributes are basic types (Integer, BigDecimal, String, etc.) but also class of other complex classes, list of other classes, list of classes with list as attribute, etc.
When I do my stuff with the object, I need to remove it from the memory. Some XML are very large and I can avoid a memory leak or OutOfMemoryError situation.
So, my questions are:
Is it enough to assign big object to null? I read that, if there are soft references, GC will not free the object.
Should I do a in deep clearing of the object, clearing all list, assigning null to the attributes, etc.?
What about JaxB (I'm using Java6, so JaxB is built in) and soft references? JaxB is faster than old JibX marshaller, but I don't know if it's worse in memory usage.
Should I wrap the megacomplex JaxB class with WeakReference or something like this?
Excuse me for mixing Java memory usage concepts, JaxB, etc. I'm studying the stability of a large running process, and the .hprof files evidence that all customers data of all invoices remains in memory.
Excuse me if it's a simple, basic or rare question.
Thanks in advance
Unless something else points to parts of your big object (graph), assigning the big object reference null is enough.
Safest though, would be to use a profiler after your application has been running for a while, and look at the object references, and see if there's something that isn't properly GC'ed.
Is it enough to assign big object to null? I read that, if there are soft references, GC will not free the object.
The short answer is yes. It is enough to assign (all strong references to) a big object to null - if you do this, the object will no longer be considered "strongly reachable" by the Garbage Collector.
Soft references will not be a problem in your case, because it's guaranteed that softly reachable objects will be garbage collected before an OutOfMemoryError is thrown. They might well prevent the garbage collector from collecting the object immediately (if they didn't, they'd act exactly the same as weak references). But this memory use would be "temporary", in that it would be freed up if it were needed to fulfil an allocation request.
Should I do a in deep clearing of the object, clearing all list, assigning null to the attributes, etc.?
That would probably be a bad idea. If the field values are only referenced by the outer big object, then they will also be garbage collected when the big object is collected. And if they are not, then the other parts of the code that reference them will not be happy to see that you're removing members from a list they're using!
In the best case this does nothing, and in the worst case this will break your program. Don't let the lure of this distract you from addressing the sole actual issue of whether your object is strongly-reachable or not.
What about JaxB (I'm using Java6, so JaxB is built in) and soft references? JaxB is faster than old JibX marshaller, but I don't know if it's worse in memory usage.
I'm not especially familiar with the relative time and space performance of those libraries. But in general, it's safe to assume a very strong "innocent until proven guilty" attitude with core libraries. If there were a memory leak bug, it would probably have been found, reported and fixed by now (unless you're doing something very niche).
If there's a memory leak, I'm 99.9% sure that it's your own code that's at fault.
Should I wrap the megacomplex JaxB class with WeakReference or something like this?
This sounds like you may be throwing GC "fixes" at the problem without thinking through what is actually needed.
If the JaxB class ought to be weakly referenced, then by all means this is a good idea (and it should be there already). But if it shouldn't, then definitely don't do this. Weak referencing is more a question of the overall semantics, and shouldn't be something you introduce specifically to avoid memory issues.
If the outer code needs a reference to the object, then it needs a reference - there's no magic you can do to have the intance be garbage collected yet still available. If it doesn't need a reference (beyond a certain point), then it doesn't need one at all - better to just nullify a standard [strong] reference, or let it fall out of scope. Weak references are a specialist situation, and are generally used when you don't have full control over the point where an object ceases to be relevant. Which is probably not the case here.
the .hprof files evidence that all customers data of all invoices remains in memory.
This suggests that they are indeed being referenced longer than is necessary.
The good news is that the hprof file will contain details of exactly what is referencing them. Look at an invoice instance that you would expect to have been GCed, and see what is referencing it and preventing it from being GCed. Then look into the class in question to see how you expect that reference to be freed, and why it hasn't been in this case.
All good performance/memory tweaking is based on measurements. Taking heap dumps, and inspecting the instances and references to them, is your measurements. Do this and act on the results, rather than trying to wrap things in WeakReferences on the hope that it might help.
You wrote
hprof files evidence that all customers data of all invoices remains in memory.
You should analyse it using mat. Some good notes at http://memoryanalyzer.blogspot.in/
I have a User object which strongly refers to a Data object.
If I create a Map<Data, User> (with Guava MapMaker) with weak keys, such a key would only be removed if it's not referenced anywhere else. However, it is always refered to by the User object that it maps to, which is in turn only removed from the map when the Data key is removed, i.e. never, unless the GC's circular reference detection also works when crossing a map (I hope you understand what I mean :P)
Will Users+Datas be garbage collected if they're no longer used elsewhere in the application, or do I need to specify weak values as well?
The GC doesn't detect circular references because it doesn't need to.
The approach it takes is to keep all the objects which are strongly referenced from root nodes e.g. Thread stacks. This way objects not accessible strongly (with circular references or not) are collected.
EDIT: This may help explain the "myth"
http://www.javacoffeebreak.com/articles/thinkinginjava/abitaboutgarbagecollection.html
Reference counting is commonly used to explain one kind of garbage collection but it doesn't seem to be used in any JVM implementations.
This is an interesting link http://www.ibm.com/developerworks/library/j-jtp10283/
In documentation you see:
weakKeys()
Specifies that each key (not value) stored in the map should be wrapped in a WeakReference (by default, strong references are used).
since it is weakReferenced it will be collected.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html
Would a hashtable/hashmap use a lot of memory if it only consists of object references and int's?
As for a school project we had to map a database to objects (that's what being done by orm/hibernate nowadays) but eager to find a good way not to store id's in objects in order to save them again we thought of putting all objects we created in a hashmap/hashtable, so we could easily retrieve it's ID. My question is if it would cost me performance using this, in my opinion more elegant way to solve this problem.
Would a hashtable/hashmap use a lot of
memory if it only consists of object
references and int's?
"a lot" depends on how many objects you have. For a few hundreds or a few thousands, you're not going to notice.
But typically the default Java collections are really incredibly inefficient when you're working with primitives (because of the constant boxing/unboxing from "primitive to wrapper" going on, like say "int to Integer") , both from a performances and memory standpoint (the two being related but not identical).
If you have a lot of entries, like hundreds of thousands or millions, I suggest using for example the Trove collections.
In your case, you'd use this:
TIntObjectHashMap<SomeJavaClass>
or this:
TObjectIntHashMap<SomeJavaClass>
In any case, that shall run around circle the default Java collections perf-wise and cpu-wise (and it shall trigger way less GC, etc.).
You're dodging the unnecessary automatic (un)boxing from/to int/Integer, the collections are creating way less garbage, resizing in a much smarter way, etc.
Don't even get me started on the default Java HashMap<Integer,Integer> compared to Trove's TIntIntHashMap or I'll go berzerk ;)
Minimally, you'd need an implementation of the Map.Entry interface with a reference to the key object and a reference to the value object. If either the the key or value are primitive types, such as int, you'll need a wrapper type (e.g. Integer) to wrap it as well. The Map.Entrys are stored in an array and allocated in blocks.
Take a look at this question for more information on how to measure your memory consumption in Java.
It's impossible to answer this without some figures. How many objects are you looking to store? Don't forget you're storing the objects already, so the key/object reference combination should be fairly small.
The only sensible thing to do is to try this and see if it works for you. Don't forget that the JVM will have a default maximum memory allocation and you can increase this (if you need) via -Xmx