There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
The class needs to be final to let the JVM guess the length of one array entry
The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
How to initialize the array (calling default constructor or leaving the members intialized to their default values)
Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with new are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.
Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.
The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).
So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.
Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
Point p = new Point(0, 0);
Point[] compressedA = {p}; //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p}; //assuming 'p' is "optimally" stored as {0,0}
compressedA[0].setX(5)
compressedB[0].setX(1)
System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);
...you would get:
0
5
1
...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.
Isn't this tantamount to providing trivial classes such as the following?
class Fixed {
float hiddenArr[];
Point pointArray(int position) {
return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
}
}
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?
Related
Imagine that I define a class with dozens of reference fields (instead of using reference arrays such as Object[]), and instantiate this class pretty heavily in an application.
Is it going to affect the performance of garbage collector in Hotspot JVM, when it traverses the heap to calculate reachable objects? Or, maybe, it would lead to significant extra memory consumption, for some JVM's internal data structures or class metadata? Or, is it going to affect the efficiency of an application in some other way?
Are those aspects specific to each garbage collector algorithm in Hotspot, or those parts of Hotspot's mechanics are shared and used by all garbage collectors alike?
Let me rephrase the question. "Is it better to have class A or class B, below?"
class A {
Target[] array;
}
class B {
Target a, b, c, ..., z;
}
The usual maintainability issues notwithstanding... From VM side of view, given the resolved reference to class B, it requires one dereference to reach Target field. While in class A, it requires two derferences, because we also need to read through the array.
The handling of object references in two cases is subtly different: in class A, VM knows there is an contiguous array of references, and so it does not need to know anything else. In class B, VM has to know which fields are references (because there could be non-reference fields, for example), which requires maintaining the oop maps in the class metadata:
// InstanceKlass embedded field layout (after declared fields):
...
// [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
// The embedded nonstatic oop-map blocks are short pairs (offset, length)
// indicating where oops are located in instances of this
Note that while footprint overhead is there, it is unlikely to matter very much, unless you have lots of classes of this weird shape, but even then the cost would be per-class, not per-instance.
Oop-maps are built during class parsing, by the shared runtime code. The visitors that walk the "oop"-s for the particular object looks into those oop-maps to find the offsets for references, and that code is also the part of shared runtime. So, this overhead is independent of GC implementation.
Considerations for performance:
Oop-maps are chunked: the runs of adjacent reference fields would form a continuous oop-map block that would be visited pretty much like we would with continuous oop block in reference array.
The GC (marking) performance is dependent on the number of references it has to follow, and memory latency on dereferences would be the first-order effect. Note that in class A, we have to traverse more references.
The null-checks and array bounds checks would probably matter in class A case, if requested indices are not constant and array lengths are not known on critical code paths. In comparison, fields are bound statically, and their offsets are always known.
So, it probably makes little sense to ask about the difference in GC/runtime handling of separate fields vs arrays. Taking care of locality of reference quite probably gives a bigger bang for the buck. Which tips the scale to class B, with associated maintainability overheads -- as quite a few performance tricks do.
While I was thinking over the memory usage of various types, I started to become a bit confused of how Java utilizes memory for integers when passed to a method.
Say, I had the following code:
public static void main (String[] args){
int i = 4;
addUp(i);
}
public static int addUp(int i){
if(i == 0) return 0;
else return addUp(i - 1);
}
In this following example, I am wondering if my following logic was correct:
I have made a memory initially for integer i = 4. Then I pass it to a method. However, since primitives are not pointed in Java, in the addUp(i == 4), I create another integer i = 4. Then afterwards, there is another addUp(i == 3), addUp(i == 2), addUp(i == 1), addUp(i == 0) in which each time, since the value is not pointed, a new i value is allocated in the memory.
Then for a single "int i" value, I have used 6 integer value memories.
However, if I were to always pass it through an array:
public static void main (String[] args){
int[] i = {4};
// int tempI = i[0];
addUp(i);
}
public static int addUp(int[] i){
if(i[0] == 0) return 0;
else return addUp(i[0] = i[0] - 1);
}
- Since I create an integer array of size 1 and then pass that to addUp which will again be passed for addUp(i[0] == 3), addUp(i[0] == 2), addUp(i[0] == 1), addUp(i[0] == 0), I have only had to use 1 integer array memory space and hence far more cost efficient. In addition, if I were to make a int value beforehand to store the initial value of i[0], I still have my "original" value.
Then this leads me to the question, why do people pass primitives like int in Java methods? Isn't it far more memory efficient to just pass the array values of those primitives? Or is the first example somehow still just O(1) memory?
And on top of this question, I just wonder the memory differences of using int[] and int especially for a size of 1. Thank you in advance. I was simply wondering being more memory efficient with Java and this came to my head.
Thanks for all the answers! I'm just now quickly wondering if I were to "analyze" big-oh memory of each code, would they both be considered O(1) or would that be wrong to assume?
What you are missing here: the int values in your example go on the stack, not on the heap.
And it is much less overhead to deal with fixed size primitive values existing on the stack - compared to objects on the heap!
In other words: using a "pointer" means that you have to create a new object on the heap. All objects live on the heap; there is no stack for arrays! And objects becomes subject to garbage collection immediately after you stopped using them. Stacks on the other hand come and go as you invoke methods!
Beyond that: keep in mind that the abstractions that programming languages provide to us are created to help us writing code that is easy to read, understand and maintain. Your approach is basically to do some sort of fine tuning that leads to more complicated code. And that is not how Java solves such problems.
Meaning: with Java, the real "performance magic" happens at runtime, when the just-in-time compiler kicks in! You see, the JIT can inline calls to small methods when the method is invoked "often enough". And then it becomes even more important to keep data "close" together. As in: when data lives on the heap, you might have to access memory to get a value. Whereas items living on the stack - might still be "close" (as in: in the processor cache). So your little idea to optimize memory usage could actually slow down program execution by orders of magnitude. Because even today, there are orders of magnitude between accessing the processor cache and reading main memory.
Long story short: avoid getting into such "micro-tuning" for either performance or memory usage: the JVM is optimized for the "normal, typical" use cases. Your attempts to introduce clever work-arounds can therefore easily result in "less good" results.
So - when you worry about performance: do what everybody else is doing. And if you one really care - then learn how the JVM works. As it turns out that even my knowledge is slightly outdated - as the comments imply that a JIT can inline objects on the stack. In that sense: focus on writing clean, elegant code that solves the problem in straight forward way!
Finally: this is subject to change at some point. There are ideas to introduce true value value objects to java. Which basically live on the stack, not the heap. But don't expect that to happen before Java10. Or 11. Or ... (I think this would be relevant here).
Several things:
First thing will be splitting hairs, but when you pass an int in java you are allocating 4 bytes onto the stack, and when you pass an array (because it is a reference) you are actually allocating 8 bytes (assuming an x64 architecture) onto the stack, plus the additional 4 bytes that store the int into the heap.
More importantly, the data that lives in the array is allocated into the heap, whereas the reference to the array itself is allocated onto the stack, when passing an integer there is no heap allocation required the primitive is only allocated into the stack. Over time reducing the heap allocations will mean that the garbage collector will have fewer things to clean up. Whereas the cleanup of stack-frames is trivial and doesn't require additional processing.
However, this is all moot (imho) because in practice when you have complicated collections of variables and objects you are likely going to end up grouping them together into a class. In general, you should be writing to promote readability and maintainability rather than trying to squeeze every last drop of performance out of the JVM. The JVM is pretty quick as it is, and there is always Moore's Law as a backstop.
It would be difficult to analyze the the Big-O for each because in order to get a true picture you would have to factor in the behavior of the garbage collector and that behavior is highly dependent on both the JVM itself and any runtime (JIT) optimizations that the JVM has made to your code.
Please remember Donald Knuth's wise words that "premature optimization is the root of all evil"
Write code that avoids micro-tuning, code that promotes readability and maintainability will fare better over the long run.
If your assumption is that arguments passed to functions necessarily consume memory (which is false by the way), then in your second example that passes an array, a copy of the reference to the array is made. That reference may actually be larger than an int, it's unlikely to be smaller.
Whether these methods take O(1) or O(N) depends on the compiler. (Here N is the value of i or i[0], depending.) If the compiler uses tail-recursion optimization then the stack space for the parameters, local variables, and return address can be reused and the implementation will then be O(1) for space. Absent tail-recursion optimization the space complexity is the same as the time complexity, O(N).
Basically tail-recursion optimization amounts (in this case) to the compiler rewriting your code as
public static int addUp(int i){
while(i != 0) i = i-1 ;
return 0;
}
or
public static int addUp(int[] i){
while(i[0] != 0) i[0] = i[0] - 1 ;
return 0 ;
}
A good optimizer might further optimize away the loops.
As far as I know, no Java compilers implement tail-recursion optimization at present, but there is no technical reason that it can't be done in many cases.
Actually, when you pass an array as a parameter to a method - a reference to this array is passed under the hood. The array itself is stored on the heap. And the reference can be 4 or 8 bytes in size (depending on CPU architecture, JVM implementation, etc.; even more, JLS doesn't say anything about how big a reference is in memory).
On the other hand, primitive int value always consumes only 4 bytes and resides on the stack.
When you pass an array, the content of the array may be modified by the method that receives the array. When you pass int primitives, those primitives may not be modified by the method that receives them. That's why sometimes you may use primitives and sometimes arrays.
Also in general, in Java programming you tend to favor readability and let this kind of memory optimizations be done by the JIT compiler.
The int array reference actually takes up more space in the stack frames than an int primitive (8 bytes vs 4). You're actually using more space.
But I think the primary reason people prefer the first way is because it's clearer and more legible.
People actually do do things a lot closer to the second when more ints are involved.
As we know when memory is moved to L caches on cpu it is moved with cachelines, thus the whole cache strading performance optimization...
Well in java when we define an array jmm guarantees that memory for each element will be allocated sequentially. However if we have array of references, those references can point randomly to different places in the memory.
My question is does java allocate actual objects memory sequentially? What optimizations do we have under the hood for this?
For example if we declare int[] we are confident those are all actually sequential in memory, but if we define a NewType (like struct) that has two int fields in it, and declare NewType[] will java figure out and keep actual memory sequentially or not?
My question is does java allocate actual objects memory sequentially?
This is not guaranteed, but most of the time the OpenJDK/Oracle JVM does. Some of the times it doesn't are;
when you allocate a large object in tenured space,
your TLAB is full and you need to get another one.
However, within the TLAB, it just allocates sequentially in memory.
declare NewType[] will java figure out and keep actual memory sequentially or not?
Java doesn't figure out anything, nor does it go out of it's way to allocate objects randomly in memory. In general, each new object will be immediately after the last one.
but if we define a NewType (like struct) that has two int fields in it, and declare NewType[] will java figure out and keep actual memory sequentially or not?
In this scenario java is not very cache-friendly because apart from primitive types java arrays are not packed data structures, they are arrays of references pointing to objects allocated elsewhere in memory.
I.e. there will be at least one level of indirection from the array to the object itself. This problem is often referred to as "pointer chasing".
I.e. usually the memory layout will look like this:
HlRRRRRRRRRRRRRRRRRRRRRRRRR0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0
Array | Obj | Obj | Obj | Obj | Obj | Obj | Obj |
H = object header
l = array length
R = reference
i = int
0 = various types of padding
You can use jol to inspect the memory layout of objects.
The JDK devs are working on Value types as part of project valhalla that will eventually allow packed arrays to exist, which may be needed as part of project panama, but this still is far off into the future.
In the meantime there are 3rd-party projects aim to provide similar features:
https://github.com/ObjectLayout/ObjectLayout
https://github.com/RichardWarburton/packed-objects-experiments
Other projects either use off-heap storage (e.g. via sun.misc.Unsafe) or views on ByteBuffer / byte[] arrays to create packed, cache-friendly data structures at the expense of more complicated APIs.
I have a bottleneck method which attempts to add points (as x-y pairs) to a HashSet. The common case is that the set already contains the point in which case nothing happens. Should I use a separate point for adding from the one I use for checking if the set already contains it? It seems this would allow the JVM to allocate the checking-point on stack. Thus in the common case, this will require no heap allocation.
Ex. I'm considering changing
HashSet<Point> set;
public void addPoint(int x, int y) {
if(set.add(new Point(x,y))) {
//Do some stuff
}
}
to
HashSet<Point> set;
public void addPoint(int x, int y){
if(!set.contains(new Point(x,y))) {
set.add(new Point(x,y));
//Do some stuff
}
}
Is there a profiler which will tell me whether objects are allocated on heap or stack?
EDIT: To clarify why I think the second might be faster, in the first case the object may or may not be added to the collection, so it's not non-escaping and cannot be optimized. In the second case, the first object allocated is clearly non-escaping so it can be optimized by the JVM and put on stack. The second allocation only occurs in the rare case where it's not already contained.
Marko Topolnik properly answered your question; the space allocated for the first new Point may or may not be immediately freed and it is probably foolish to bank on it happening. But I want to expand on why you're currently in a deep state of sin:
You're trying to optimise this the wrong way.
You've identified object creation to be the bottleneck here. I'm going to assume that you're right about this. You're hoping that, if you create fewer objects, the code will run faster. That might be true, but it will never run very fast as you've designed it.
Every object in Java has a pretty fat header (16 bytes; an 8-byte "mark word" full of bit fields and an 8-byte pointer to the class type) and, depending on what's happened in your program thus far, possibly another pretty fat trailer. Your HashSet isn't storing just the contents of your objects; it's storing pointers to those fat-headers-followed-by-contents. (Actually, it's storing pointers to Entry classes that themselves store pointers to Points. Two levels of indirection there.)
A HashSet lookup, then, figures out which bucket it needs to look at and then chases one pointer per thing in the bucket to do the comparison. (As one great big chain in series.) There probably aren't very many of these objects, but they almost certainly aren't stored close together, making your cache angry. Note that object allocation in Java is extremely cheap---you just increment a pointer---and that this is quite probably a bigger source of slowness.
Java doesn't provide any abstraction like C++'s templates, so the only real way to make this fast and still provide the Set abstraction is to copy HashSet's code, change all of the data structures to represent your objects inline, modify the methods to work with the new data structures, and, if you're still worried, make copies of the relevant methods that take a list of parameters corresponding to object contents (i.e. contains(int, int)) that do the right thing without constructing a new object.
This approach is error-prone and time-consuming, but it's necessary unfortunately often when working on Java projects where performance matters. Take a look at the Trove library Marko mentioned and see if you can use it instead; Trove did exactly this for the primitive types.
With that out of the way, a monomorphic call site is one where only one method is called. Hotspot aggressively inlines calls from monomorphic call sites. You'll notice that HashSet.contains punts to HashMap.containsKey. You'd better pray for HashMap.containsKey to be inlined since you need the hashCode call and equals calls inside to be monomorphic. You can verify that your code is being compiled nicely by using the -XX:+PrintAssembly option and poring over the output, but it's probably not---and even if it is, it's probably still slow because of what a HashSet is.
As soon as you have written new Point(x,y), you are creating a new object. It may happen not to be placed on the heap, but that's just a bet you can lose. For example, the contains call should be inlined for the escape analysis to work, or at least it should be a monomorphic call site. All this means that you are optimizing against a quite erratic performance model.
If you want to avoid allocation the solid way, you can use Trove library's TLongHashSet and have your (int,int) pairs encoded as single long values.
Possible duplicate
Proper stack and heap usage in C++?
I'm beginning to learn C++ from a Java background, and one big difference is the fact that I'm no longer forced to:
dynamically allocate memory for objects
always use pointers to handle objects
as is the case in Java. But I'm confused as to when I should be doing what - can you advise?
Currently I'm tempted to start out doing everything Java-style like
Thing *thing = new Thing();
thing->whatever();
// etc etc
Don't use pointers unless you know why you need them. If you only need an object for a while, allocate it on stack:
Object object;
object.Method();
If you need to pass an object to a function use references:
int doStuff( Object& object )
{
object.Method();
return 0;
}
only use pointers if you need
graph-like complex data structures or
arrays of different object types or
returning a newly created object from a function or
in situations when you sometimes need to specify that "there's no object" - then you use a null pointer.
If you use pointers you need to deallocate objects when those objects are no longer needed and before the last pointer to the object becomes unreacheable since C++ has no built-in garbage collection. To simplify this use smart pointers line std::auto_ptr or boost::shared_ptr.
That's bad. You're bound to forget to free it and if you're determined not to you'd have to handle exceptions because it won't get freed on stack unwinding automatically. Use shared_ptr at the very least.
shared_ptr<Thing> thing( new Thing() );
thing->whatever();
But it actually depends on the object size and the scope. If you're going to use it in one function and the object is not oversized, I'd suggest allocating it in stack frame.
Thing thing;
thing.whatever();
But the good thing is that you can decide whenever you want to allocate a new object ;-)
Do not use the new operator if you can otherwise avoid it, that way lies memory leaks and headaches remembering your object lifetimes.
The C++ way is to use stack-based objects, that cleanup after themselves when they leave scope, unless you copy them. This technique (called RAII) is a very powerful one where each object looks after itself, somewhat like how the GC looks after your memory for you in Java, but with the huge advantage of cleaning up as it goes along in a deterministic way (ie you know exactly when it will get cleaned).
However, if you prefer your way of doing objects, use a share_ptr which can give you the same semantics. Typically you'd use a shared_ptr only for very expensive objects or ones that are copies a lot.
One situation where you might need to allocate an instance on the heap is when it is only known at run-time which instance will be created in the first place (common with OOP):
Animal* animal = 0;
if (rand() % 2 == 0)
animal = new Dog("Lassie");
else
animal = new Monkey("Cheetah");
Another situation where you might need that is when you have a non-copyable class whose instances you have to store in a standard container (which requires that its contents be copyable). A variation of that is where you might want to store pointers to objects that are expensive to copy (this decision shouldn't be done off-hand, though).
In all cases, using smart pointers like shared_ptr and unique_ptr (which are being added to the standard library) are preferable, as they manage the objects lifetime for you.