Java Object[] and cache strading - java

As we know when memory is moved to L caches on cpu it is moved with cachelines, thus the whole cache strading performance optimization...
Well in java when we define an array jmm guarantees that memory for each element will be allocated sequentially. However if we have array of references, those references can point randomly to different places in the memory.
My question is does java allocate actual objects memory sequentially? What optimizations do we have under the hood for this?
For example if we declare int[] we are confident those are all actually sequential in memory, but if we define a NewType (like struct) that has two int fields in it, and declare NewType[] will java figure out and keep actual memory sequentially or not?

My question is does java allocate actual objects memory sequentially?
This is not guaranteed, but most of the time the OpenJDK/Oracle JVM does. Some of the times it doesn't are;
when you allocate a large object in tenured space,
your TLAB is full and you need to get another one.
However, within the TLAB, it just allocates sequentially in memory.
declare NewType[] will java figure out and keep actual memory sequentially or not?
Java doesn't figure out anything, nor does it go out of it's way to allocate objects randomly in memory. In general, each new object will be immediately after the last one.

but if we define a NewType (like struct) that has two int fields in it, and declare NewType[] will java figure out and keep actual memory sequentially or not?
In this scenario java is not very cache-friendly because apart from primitive types java arrays are not packed data structures, they are arrays of references pointing to objects allocated elsewhere in memory.
I.e. there will be at least one level of indirection from the array to the object itself. This problem is often referred to as "pointer chasing".
I.e. usually the memory layout will look like this:
HlRRRRRRRRRRRRRRRRRRRRRRRRR0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0HR0iii0
Array | Obj | Obj | Obj | Obj | Obj | Obj | Obj |
H = object header
l = array length
R = reference
i = int
0 = various types of padding
You can use jol to inspect the memory layout of objects.
The JDK devs are working on Value types as part of project valhalla that will eventually allow packed arrays to exist, which may be needed as part of project panama, but this still is far off into the future.
In the meantime there are 3rd-party projects aim to provide similar features:
https://github.com/ObjectLayout/ObjectLayout
https://github.com/RichardWarburton/packed-objects-experiments
Other projects either use off-heap storage (e.g. via sun.misc.Unsafe) or views on ByteBuffer / byte[] arrays to create packed, cache-friendly data structures at the expense of more complicated APIs.

Related

Understanding how to memcpy with TheUnsafe

I read stuff about TheUnsafe, but I get confused by the fact that, unlike C/C++ we have to work out the offset of stuff, and there's also the 32bits VM vs the 64bits VM, which may or may not have different pointers sizes depending on a particular VM setting being turned on or off (also, I'm assuming all offsets to data are actually based on pointer arithmetic this would influence them to).
Unfortunately, it seems all the stuff ever written about how to use TheUnsafe stems from one article only (the one who happened to be the first) and all the others copy pasted from it to a certain degree. Not many of them exist, and some are not clear because the author apparently did not speak English.
My question is:
How can I find the offset of a field + the pointer to the instance that owns that field (or field of a field, or field, of a field, of a field...) using TheUnsafe
How can I use it to perform a memcpy to another pointer + offset memory address
Considering the data may have several GB in size, and considering the heap offers no direct control over data alignment and it may most certainly be fragmented because:
1) I don't think there's nothing stoping the VM from allocating field1 at offset + 10 and field2 at offset sizeof(field1) + 32, is there?
2) I would also assume the GC would move big chunks of data around, leading to a field with 1GB in size being fragmented sometimes.
So is the memcpy operation as I described even possible?
If data is fragmented because of GC, of course the heap has a pointer to where the next chunk of data is, but using the simple process described above doesn't seem to cover that.
so must the data be off-heap for this to (maybe) work? If so, how to allocate off-heap data using TheUnsafe, making such data work as a field of an instance and of course freeing the allocated memory once done with it?
I encourage anyone who didn't quite understand the question to ask for any specifics they need to know.
I also urge people to refrain from answering if their whole idea is "put all objects you need to copy in an array and useSystem.arraycopy. I know it's common practice in this wonderful forum to, instead of answering what's been asked, offering a complete alternate solution that, in principle, has nothing to do with the original question apart from the fact that it gets the same job done.
Best regards.
First a big warning: “Unsafe must die” http://blog.takipi.com/still-unsafe-the-major-bug-in-java-6-that-turned-into-a-java-9-feature/
Some prerequisites
static class DataHolder {
int i1;
int i2;
int i3;
DataHolder d1;
DataHolder d2;
public DataHolder(int i1, int i2, int i3, DataHolder dh) {
this.i1 = i1;
this.i2 = i2;
this.i3 = i3;
this.d1 = dh;
this.d2 = this;
}
}
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
DataHolder dh1 = new DataHolder(11, 13, 17, null);
DataHolder dh2 = new DataHolder(23, 29, 31, dh1);
The basics
To get the offset of a field (i1), you can use the following code:
Field fi1 = DataHolder.class.getDeclaredField("i1");
long oi1 = unsafe.objectFieldOffset(fi1);
and the access the field value of instance dh1 you can write
System.out.println(unsafe.getInt(dh1, oi1)); // will print 11
You can use similar code to access an object reference (d1):
Field fd1 = DataHolder.class.getDeclaredField("d1");
long od1 = unsafe.objectFieldOffset(fd1);
and you can use it to get the reference to dh1 from dh2:
System.out.println(dh1 == unsafe.getObject(dh2, od1)); // will print true
Field ordering and alignment
To get the offsets of all declared fields of a object:
for (Field f: DataHolder.class.getDeclaredFields()) {
if (!Modifier.isStatic(f.getModifiers())) {
System.out.println(f.getName()+" "+unsafe.objectFieldOffset(f));
}
}
On my test it seems that the JVM reorders fields as it sees fit (i.e. adding a field can yield completely different offsets on the next run)
An Objects address in native memory
It's important to understand that the following code is going to crash your JVM sooner or later, because the Garbage Collector will move your objects at random times, without you having any control on when and why it happens.
Also it's important to understand that the following code depends on the JVM type (32 bits versus 64 bits) and on some start parameters for the JVM (namely, usage of compressed oops on 64 bit JVMs).
On a 32 bit VM a reference to an object has the same size as an int. So what do you get if you call int addr = unsafe.getInt(dh2, od1)); instead of unsafe.getObject(dh2, od1))? Could it be the native address of the object?
Let's try:
System.out.println(unsafe.getInt(null, unsafe.getInt(dh2, od1)+oi1));
will print out 11 as expected.
On a 64 bit VM without compressed oops (-XX:-UseCompressedOops), you will need to write
System.out.println(unsafe.getInt(null, unsafe.getLong(dh2, od1)+oi1));
On a 64 bit VM with compressed oops (-XX:+UseCompressedOops), things are a bit more complicated. This variant has 32 bit object references that are turned into 64 bit addresses by multiplying them with 8L:
System.out.println(unsafe.getInt(null, 8L*(0xffffffffL&(dh2, od1)+oi1));
What is the problem with these accesses
The problem is the Garbage Collector together with this code. The Garbage Collector can move around objects as it pleases. Since the JVM knows about it's object references (the local variables dh1 and dh2, the fields d1 and d2 of these objects) it can adjust these references accordingly, your code will never notice.
By extracting object references into int/long variables you turn these object references into primitive values that happen to have the same bit-pattern as an object reference, but the Garbage Collector does not know that these were object references (they could have been generated by a random generator as well) and therefore does not adjust these values while moving objects around. So as soon as a Garbage Collection cycle is triggered your extracted addresses are no longer valid, and trying to access memory at these addresses might crash your JVM immediately (the good case) or you might trash your memory without noticing on the spot (the bad case).

Is array in java virtually sequential memory data structure? or physically sequential?

I try to find what is difference between primitive java 'array' and 'List' data structure (like ArrayList), and find articles or Q&A like this (Difference between List and Array). Many articles including that link point that java primitive 'array' is 'sequential memory'. In this point, what is sequential exactly? is this really sequential in physical memory? or sequential in virtual memory? My guess is sequential in virtual memory, because OS assigns physical memory in general and application(JVM) doesn't care about specific memory allocation. But I do not know exact answer.
A Java array is sequential in virtual memory not necessarily in physical memory.
A user-space application (such as a JVM) has no say over whether the physical pages that make up its virtual address space are contiguous in memory. And in fact, it has no way of even knowing this in typical modern operating systems. This is all hidden from a user-space application by the machine's virtual memory hardware and (user-space) instruction set architecture.
Looking at the JVM spec is not going to be instructive on the physical memory issue. It is simply not relevant / out of scope.
The JVM spec doesn't mandate that arrays are contiguous in virtual memory. However, (hypothetical) array implementations that involved non-contiguous virtual memory would lead to expensive array operations, so you are unlikely to find a mainstream JVM that does this.
References:
JVM Spec 2.7 says:
"The Java Virtual Machine does not mandate any particular internal structure for objects."
Other parts of the spec imply that "objects" refers here to BOTH instances of classes AND arrays.
JVM Spec 2.4 talks about arrays, but it doesn't mention how they are represented in memory.
The difference between arrays and ArrayLists are at a higher level. Arrays have a fixed size. ArrayLists have a variable size. But under the hood, an ArrayList is implemented using a (single) array ... which can be reallocated (i.e. replaced) if the list grows too big.
You would have to look at the JVM specs to see whether any such requirement is made (whether arrays need to be sequential memory or not), but for efficiency purposes it makes sense that an array would be allocated in a malloc type way.
As for virtual vs. physical, everything (above the OS) works with virtual memory. The JVM isn't low level enough to have access to something the kernel does at Ring-0.
And lastly, why are you interested, are you writing your own JVM?
JVM gets virtual sequential memory from OS. Only at OS level it is possible to assign physical memory sequentially.
Also it's important to not confuse between sequential memory allocation and sequential access - sequential access means that a group of elements is accessed in a predetermined, ordered sequence. A data structure is said to have sequential access if one can only visit the values it contains in one particular order. The canonical example is the linked list.
Whereas sequential memory meaning assigning of sequential memory (not necessarily physically sequential, but virtually sequential).
Besides the link you posted some major differences between Array and ArrayList are:
Array is fixed in size, ArrayList is dynamic in size
Array can store primitives, ArrayList can only store Objects (Wrapper
types for primitives)
You can use generics with ArrayList
You can use add() method to insert element into ArrayList and you can
simply use assignment operator to store element into Array
References: Java67 article, Wikipedia
this might be an interesting article explaining your question.
Arrays are also objects in Java, so how an object looks like in memory applies to an array.
To summarise:
class A {
int x;
int y;
}
public void m1() {
int i = 0;
m2();
}
public void m2() {
A a = new A();
}
When m1 is invoked, a new frame (Frame-1) is pushed into the stack, and local variable i is also created in Frame-1.
Then m2 is invoked inside of m1, another new frame (Frame-2) is pushed into the stack. In m2, an object of class A is created in the heap and reference variable is put in Frame-2.
Physical memory locations are out of your hands and will be assigned by the OS
http://www.programcreek.com/2013/04/what-does-a-java-array-look-like-in-memory/

Are arrays of 'structs' theoretically possible in Java?

There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.
Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.
My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?
E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc
Update:
I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.
Conclusions what the JVM would need to know:
The class needs to be final to let the JVM guess the length of one array entry
The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
How to initialize the array (calling default constructor or leaving the members intialized to their default values)
Java doesn't provide a way to do this because it's not a language-level choice to make. C, C++, and the like expose ways to do this because they are system-level programming languages where you are expected to know system-level features and make decisions based on the specific architecture that you are using.
In Java, you are targeting the JVM. The JVM doesn't specify whether or not this is permissible (I'm making an assumption that this is true; I haven't combed the JLS thoroughly to prove that I'm right here). The idea is that when you write Java code, you trust the JIT to make intelligent decisions. That is where the reference types could be folded into an array or the like. So the "Java way" here would be that you cannot specify if it happens or not, but if the JIT can make that optimization and improve performance it could and should.
I am not sure whether this optimization in particular is implemented, but I do know that similar ones are: for example, objects allocated with new are conceptually on the "heap", but if the JVM notices (through a technique called escape analysis) that the object is method-local it can allocate the fields of the object on the stack or even directly in CPU registers, removing the "heap allocation" overhead entirely with no language change.
Update for updated question
If the question is "can this be done at all", I think the answer is yes. There are a few corner cases (such as null pointers) but you should be able to work around them. For null references, the JVM could convince itself that there will never be null elements, or keep a bit vector as mentioned previously. Both of these techniques would likely be predicated on escape analysis showing that the array reference never leaves the method, as I can see the bookkeeping becoming tricky if you try to e.g. store it in an object field.
The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).
So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.
Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:
Point p = new Point(0, 0);
Point[] compressedA = {p}; //assuming 'p' is "optimally" stored as {0,0}
Point[] compressedB = {p}; //assuming 'p' is "optimally" stored as {0,0}
compressedA[0].setX(5)
compressedB[0].setX(1)
System.out.println(p.x);
System.out.println(compressedA[0].x);
System.out.println(compressedB[0].x);
...you would get:
0
5
1
...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.
Isn't this tantamount to providing trivial classes such as the following?
class Fixed {
float hiddenArr[];
Point pointArray(int position) {
return new Point(hiddenArr[position*2], hiddenArr[position*2+1]);
}
}
Also, it's possible to implement this without making the programmer explicitly state that they'd like it; the JVM is already aware of "value types" (POD types in C++); ones with only other plain-old-data types inside them. I believe HotSpot uses this information during stack elision, no reason it couldn't do it for arrays too?

How much memory does a Java object use when all its members are null?

Is it correct to assume a Java object only takes up the 8 bytes for the object reference as long as all it's members are set to null or does the definition of members already use up space in the instance for some reason?
In other words, if I have a large collection of objects that I want to be space efficient, can I count on leaving unused members set to null for reducing memory footprint?
No, you need either 4 or 8 bytes ( depending whether it's a 32 or 64 bit system ) for each null you are storing in a field. How would the object know its field was null if there wasn't something stored somewhere to tell it so?
No, null is also information and has also to be stored.
The object references themselves will still take up some memory, so the only real memory savings will be from the heap objects that would have been referred to.
So, if you have
MyClass {
public Object obj;
MyClass(Object o) { obj = o; }
}
MyClass a = new MyClass( new HashMap() );
MyClass b = new MyClass( null );
the two objects a and b themselves take up the same amount of memory (they both hold one reference), but the Object obj referred to by a takes up memory on the heap, so this memory could have been saved by setting the reference to null.
The Java Language Specification and the Java Virtual Machine Specification do not say anything about the size of Java objects in memory - it is deliberately left undefined, because it depends on the implementation of the JVM. This means that you cannot make any assumptions on how much memory a Java object uses in general or what the memory layout of a Java object in memory is.
As others have already said: A reference (variable) takes up memory, whether the reference is null or not. Your objects will not shrink and take up less memory when you set member variables to null.
What about class, object and method names? As you can call, access or instantiate them using Java reflect, the names have to be stored somewhere too, right? But CMIIW.
Kage
When a Java object has all its members are null then to it will consume memory because
It also has some additional memory requirement for headers, referencing and housekeeping.
The heap memory used by a Java object includes
memory for primitive fields, according to their size.
memory for reference fields (4 bytes each).
an object header, consisting of a few bytes of "housekeeping" information.
Objects in java also requires some "housekeeping" information, such as recording an object's class, ID and status flags such as whether the object is currently reachable, currently synchronization-locked etc.
Java object header size varies on 32 and 64 bit jvm.
Although these are the main memory consumers jvm also requires additional fields sometimes like for alignment of the code e.t.c.
So this is the reason when your Java object has all its members are null then to it will consume memory.

How should I choose where to store an object in C++?

Possible duplicate
Proper stack and heap usage in C++?
I'm beginning to learn C++ from a Java background, and one big difference is the fact that I'm no longer forced to:
dynamically allocate memory for objects
always use pointers to handle objects
as is the case in Java. But I'm confused as to when I should be doing what - can you advise?
Currently I'm tempted to start out doing everything Java-style like
Thing *thing = new Thing();
thing->whatever();
// etc etc
Don't use pointers unless you know why you need them. If you only need an object for a while, allocate it on stack:
Object object;
object.Method();
If you need to pass an object to a function use references:
int doStuff( Object& object )
{
object.Method();
return 0;
}
only use pointers if you need
graph-like complex data structures or
arrays of different object types or
returning a newly created object from a function or
in situations when you sometimes need to specify that "there's no object" - then you use a null pointer.
If you use pointers you need to deallocate objects when those objects are no longer needed and before the last pointer to the object becomes unreacheable since C++ has no built-in garbage collection. To simplify this use smart pointers line std::auto_ptr or boost::shared_ptr.
That's bad. You're bound to forget to free it and if you're determined not to you'd have to handle exceptions because it won't get freed on stack unwinding automatically. Use shared_ptr at the very least.
shared_ptr<Thing> thing( new Thing() );
thing->whatever();
But it actually depends on the object size and the scope. If you're going to use it in one function and the object is not oversized, I'd suggest allocating it in stack frame.
Thing thing;
thing.whatever();
But the good thing is that you can decide whenever you want to allocate a new object ;-)
Do not use the new operator if you can otherwise avoid it, that way lies memory leaks and headaches remembering your object lifetimes.
The C++ way is to use stack-based objects, that cleanup after themselves when they leave scope, unless you copy them. This technique (called RAII) is a very powerful one where each object looks after itself, somewhat like how the GC looks after your memory for you in Java, but with the huge advantage of cleaning up as it goes along in a deterministic way (ie you know exactly when it will get cleaned).
However, if you prefer your way of doing objects, use a share_ptr which can give you the same semantics. Typically you'd use a shared_ptr only for very expensive objects or ones that are copies a lot.
One situation where you might need to allocate an instance on the heap is when it is only known at run-time which instance will be created in the first place (common with OOP):
Animal* animal = 0;
if (rand() % 2 == 0)
animal = new Dog("Lassie");
else
animal = new Monkey("Cheetah");
Another situation where you might need that is when you have a non-copyable class whose instances you have to store in a standard container (which requires that its contents be copyable). A variation of that is where you might want to store pointers to objects that are expensive to copy (this decision shouldn't be done off-hand, though).
In all cases, using smart pointers like shared_ptr and unique_ptr (which are being added to the standard library) are preferable, as they manage the objects lifetime for you.

Categories