What contributes to the size of a single object in memory?
I know that primitives and references would, but is there anything else?
Would the number of methods and the length of them matter?
This is completely implementation-dependent, but there are a few factors that influence object size in Java.
First, the number and types of the fields in the Java object definitely influence space usage, since you need to have at least as much storage space as is necessary to hold all of the object's fields. However, due to padding, alignment, and pointer compression optimizations, there is no direct formula you can use to compute precisely how much space is being used this way.
As for methods, typically speaking the number of methods in an object has no impact on its size. Methods are often implemented using a feature called virtual function tables (or "vtables") that make it possible to invoke methods through a base class reference in constant time. These tables are usually stored by having a single instance of the vtable shared across multiple objects, then having each object store a single pointer to the vtable.
Interface methods complicate this picture a bit, because there are several different implementations possible. One implementation adds a new vtable pointer for each interface, so the number of interfaces implemented may affect object size, while others do not. Again, it's implementation dependent how things are actually put together in memory, so you can't know for certain whether or not this will have a memory cost.
To the best of my knowledge there are no implementations of the JVM in existence today in which the length of a method influences the size of an object. Typically, only one copy of each method is stored in memory, and the code is then shared across all instances of a particular object. Having longer methods might require more total memory, but should not impact the per-object memory for instances of a class. That said, the JVM spec makes no promises that this must be the case, but I can't think of a reasonable implementation that would expend extra space per object for method code.
In addition to fields and methods, many other factors could contribute to the size of an object. Here's a few:
Depending on what type of garbage collector (or collectors) that the JVM is using, each object might have extra storage space to hold information about whether the object is live, dead, reachable, etc. This can increase storage space, but it's out of your control. In some cases, the JVM might optimize object sizes by trying to store the object on the stack instead of the heap. In this case, the overhead may not even be present for some types of objects.
If you use synchronization, the object might have extra space allocated for it so that it can be synchronized on. Some implementations of the JVM don't create a monitor for an object until it's necessary, so you may end up having smaller objects if you don't use synchronization, but you cannot guarantee that this will be the case.
Additionally, to support operators like instanceof and typecasting, each object may have some space reserved to hold type information. Typically, this is bundled with the object's vtable, but there's no guarantee that this will be true.
If you use assertions, some JVM implementations will create a field in your class that contains whether or not assertions are enabled. This is then used to disable or enable assertions at runtime. Again, this is implementation-specific, but it's good to keep in mind.
If your class is a nonstatic inner class, it may need to hold a reference to the class that contains it so that it can access its fields. However, the JVM might optimize this away if you never end up using this.
If you use an anonymous inner class, the class may need to have extra space reserved to hold the final variables that are visible in its enclosing scope so that they can be referenced inside the class. It's implementation-specific whether this information is copied over into the class fields or just stored locally on the stack, but it can increase object size.
Some implementations of Object.hashCode() or System.identityHashCode(Object) may require extra information to be stored in each object that contains the value of that hash code if it can't compute it any other way (for example, if the object can be relocated in memory). This might increase the size of each object.
To add a bit of (admittedly vague) data to #templatetypedef's excellent answer. These numbers are for typical recent 32-bit JVMs, but they are implementation specific:
The header overhead for each object typically 2 words for a regular object and 3 words for an array. The header includes GC related flags, and some kind of pointer to the object's actual class. For an array, an extra word is needed to hold the array size.
If you've called (directly or indirectly) System.identityHashCode() on an object, and it has survived a GC cycle, then add an extra word to store the hashcode value. (Modern JVMs use a clever trick to avoid reserving a hashcode header field for all objects ...)
The storage allocation granularity may be a multiple of words; e.g. 2.
Fields of an object are typically word aligned; i.e. they are not packed.
Elements of an array of a primitive type are packed, but booleans are typically represented by a byte in packed form.
References occupy 4 bytes both as fields and as array elements.
Things are a bit more complicated for 64-bit JVMs because of pointer compression (OOPS) in some JVMs. Also, I'm not sure if fields 32 or 64 bit aligned.
(Note: the above are based on what I've heard / read in various places from various "knowledgeable people". There is no definitive source for this kind of information apart from Oracle / Sun, and (AFAIK) they haven't published anything.)
Check out java.sizeOf in sourceforge here: http://sizeof.sourceforge.net/
AFAIK, in HBase source code, there is some caculation about object size based on the some common known rules how different fields occupies the spaces. And it will be different in 32bit or 64bit OS. At least above people all know. But I didn't look into details why they do that. But they really did it in the source code.
Besides,Java.lang.intrument.Intrumentation Class can do it also by getObjectSize(). I guess the open source project is also based on it.
In this link,there is details of how to use it.
In Java, what is the best way to determine the size of an object?
As a comment. Actually I am also interested in if you do it in the source code, what will be the most meaningful use case?
Related
Well, each JVM implementation may have a different strategy to layout objects and arrays in memory.
The HotSpot JVM uses a data structure called Ordinary Object Pointers (OOPS) to represent pointers to objects.
Each oopDesc describes the pointer with the following information:
One mark word
One, possibly compressed, klass word
A mark word describes the object header. The HotSpot JVM uses this word to store identity hashcode, biased locking pattern, locking information, and GC metadata.
But I can't understated where associated with object wait set is stored.. Can anyone explain?
Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.
The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).
Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?
Some techniques I use to reduce memory:
Make your own IntArrayList (etc) class that prevents boxing
Make your own IntHashMap (etc) class where keys are primitives
Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
Don't use pooling because pools keep unused instances explicitly alive.
Use threads scarcely, they're super memory hungry (in native memory, outside heap)
When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.
Depends on the application, but generally speaking
Layout data structures in (parallel) arrays of primitives
Try to make big "flat" objects, inlining otherwise sensible sub-structures
Specialize collections of primitives
Reuse objects, use object pools, ThreadLocals
Go off-heap
I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.
But, they certainly allow to lower memory footprint and GC pressure.
One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.
Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.
One example for a memory leak if you are implementing, for instance, a stack:
Integer stack[];
stack = new Integer[10];
int stackPtr = 0;
// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);
// and pop from the stack again
--stackPtr;
--stackPtr;
// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.
The correct solution would have been:
stack[--stackPtr] = null;
If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.
Some are:
HPPC
GNU Trove
Apache Commons Primitives
Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?
Luís Bianchin already gave you a few libraries which implement optimal collections in Java.
Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.
Cache
You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.
Persistent Collections
Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs.
MapDB and PCollections are for me the best libraries.
Profile memory usage
On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.
In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.
The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.
Memory optimal data is nice with the network.
Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.
Well there is a lot of things you can do.
Here are a few problems and solutions:
When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:
When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:
All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.
The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.
Beware of early optimisation.
See When is optimisation premature?
While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.
I was looking into this and just wondering does Java provide any construct to find out the size of an object?
Unfortunately not. It's relatively complex. e.g. if I create a String object, I have to consider:
the size of the fields of the objects. For primitives etc. that's simple
the size of objects referred to. Each member object is a reference, and not actually contained exclusively within the object under question. e.g. String contains a reference to a char array, but that char array can be shared across multiple Strings (see the source of substring() to understand how - this is known as the flyweight pattern)
the size of any native implementation details in the JVM
No. It goes against the concept of the language. In a real Object Oriented Programming language (not a hacked-together OOP support like C++), Objects are abstract concepts, not bits on the computer. Until and unless you serialize the object, it's treated like an actual object and not a sequence of bits.
Actually i think you can get the size of an object with the help of the Instrumentation class, but the process is a little more complex(you have to specify in the manifest file the premain class, define a Instrumetation agent etc.) compared to the one in C++ where you have at your use the sizeof() function.
One disadvantage is you get the size of the object but not the size of the referred objects(if it has any).
Another way would be, but this is rudimentary, to have your object under test serialised and wrote in a file and have the size of the file(use the ObjectOutputStream and get its size).For further documentation related to this subject and not only read a little bit about Java agents in general and Java probing.Probing the JVM is a very helpful technique especially if you want to make an analisys on performance(objects sizes, running threads, memory leaks etc.).
Just a bit of idle curiosity here.
Basically, if I have an object that only has a few primitive data members, it takes up a small amount of memory and doesn't take very long at all to create. However, what happens if I have a lot of methods associated with that object? Does object instantiation have to take those into account at all?
For example, let's say I have a Class with (insert absurdly large number here) number of distinct methods I can call. Does the JVM take any longer to make an instance of that class than if I had no methods?
No, Class with methods is stored once in a separate memory location (namely PermGen) and each object of a given class has only a single reference to its type (Class).
Thus it doesn't matter how many methods your object has: two or two thousand - the object creation will take exactly the same amount of time.
BTW the same holds true for method invocation - there is no performance hit when calling methods of an object having plenty of them compared to object having only few.
See also
What's the method representation in memory?
I can't speak for java, but in C++ etc. non-virtual methods can be stored as global functions (wth approriate name mangling) and don't need extra space at instantiation time. Virtual methods will have to be filled into the VMT, which can probably be built at compile time and a single pointer stored in the object at instantiation.
So no, I don't see any hit for large numbers of methods.
No, I don't believe there's a performance hit that'll be measurable or matter to you. I'd say no and defy you or anyone else to come back with meaningful data to the contrary.
If your object is that big, I'd say it's time to refactor.
Would a hashtable/hashmap use a lot of memory if it only consists of object references and int's?
As for a school project we had to map a database to objects (that's what being done by orm/hibernate nowadays) but eager to find a good way not to store id's in objects in order to save them again we thought of putting all objects we created in a hashmap/hashtable, so we could easily retrieve it's ID. My question is if it would cost me performance using this, in my opinion more elegant way to solve this problem.
Would a hashtable/hashmap use a lot of
memory if it only consists of object
references and int's?
"a lot" depends on how many objects you have. For a few hundreds or a few thousands, you're not going to notice.
But typically the default Java collections are really incredibly inefficient when you're working with primitives (because of the constant boxing/unboxing from "primitive to wrapper" going on, like say "int to Integer") , both from a performances and memory standpoint (the two being related but not identical).
If you have a lot of entries, like hundreds of thousands or millions, I suggest using for example the Trove collections.
In your case, you'd use this:
TIntObjectHashMap<SomeJavaClass>
or this:
TObjectIntHashMap<SomeJavaClass>
In any case, that shall run around circle the default Java collections perf-wise and cpu-wise (and it shall trigger way less GC, etc.).
You're dodging the unnecessary automatic (un)boxing from/to int/Integer, the collections are creating way less garbage, resizing in a much smarter way, etc.
Don't even get me started on the default Java HashMap<Integer,Integer> compared to Trove's TIntIntHashMap or I'll go berzerk ;)
Minimally, you'd need an implementation of the Map.Entry interface with a reference to the key object and a reference to the value object. If either the the key or value are primitive types, such as int, you'll need a wrapper type (e.g. Integer) to wrap it as well. The Map.Entrys are stored in an array and allocated in blocks.
Take a look at this question for more information on how to measure your memory consumption in Java.
It's impossible to answer this without some figures. How many objects are you looking to store? Don't forget you're storing the objects already, so the key/object reference combination should be fairly small.
The only sensible thing to do is to try this and see if it works for you. Don't forget that the JVM will have a default maximum memory allocation and you can increase this (if you need) via -Xmx