I was reading about data locality and want to use it to improve my game engine that I'm writing.
Let's say that I have created five objects at different times that are now all in different places in the memory not next to each other. If I add them all to an array, will that array only hold pointers to those objects and they will stay in the same place in the memory or will adding them all to an array rearrange them and make them contiguous.
I ask this because I thought that using arrays would be a good way to make them contiguous, but I don't know if an array will fix my problem!
tl;dr
Manipulating an array of references to objects has no effect on the objects, and has no effect on the objects’ location in memory.
Objects
An array of objects is really an array of references (pointers) to objects. A pointer is an address to another location in memory.
We speak of the array as holding objects, but that is not technically accurate. Because Java does not expose pointers themselves to us as programmers, we are generally unaware of their presence. When we access an element in the array, we are actually retrieving a pointer, but Java immediately follows that pointer to locate the object elsewhere in memory.
This automatic look-up, following the pointer to the object, makes the array of pointers feel like an array of objects. The Java programmer thinks of her array as holding her objects when in reality the objects are a hop-skip-and-a-jump away.
Arrays in Java are implemented as contiguous blocks of memory. For an array of objects, the pointers to those objects are being stored in contiguous memory. But when we access the elements, we are jumping to another location in memory to access the actual object that we want.
Adding elements may be “cheap” in that if memory happens to be available next door in memory, it can be allocated to the array to make room for more elements. In practice this is unlikely. Chances are a new array must be built elsewhere in memory, with all the pointers being copied over to the new array and then discarding the original array.
Such a new-array-and-copy-over is “expensive”. When feasible, we want to avoid this operation. If you know the likely maximum size of your array, specify that size when declaring the array. The entire block of contiguous memory is claimed immediately, with empty content in the array until you later assign a pointer to the elements.
Inserting into the middle of an array is also expensive. Either a new array is built and elements copied over, or all the elements after the insertion point must be moved down into their neighboring position.
None of these operations to the array affect the objects. The objects are floating around in the ether of memory. The objects know nothing of the array. Operations on the array do not affect the objects nor their position in memory. The only relationship is that if the reference held in the array is the last reference still pointing to the object, then when that array element is cleared or deleted, the object becomes a candidate for garbage-collection.
Primitives
In Java, the eight primitive types (byte, short, int, long, float, double, boolean, and char) are not objects/classes and are not Object-Oriented Programming. One advantage is that they are fast and take little memory, compared to objects.
An array of primitives hold the values within the array itself. So these values are stored next to one another, contiguous in memory. No references/pointers. No jumping around in memory.
As for adding or inserting, the same behavior discussed above applies. Except that instead of pointers being shuffled around, the actual primitive values are being shuffled around.
Tips
In business apps, it is generally best to use objects.
That means using the wrapper classes instead of primitives. For example, Integer instead of int. The auto-boxing facility in Java makes this easier by automatically converting between primitive values and their object wrapper.
And preferring objects means using a Collection instead of arrays, usually a List, specifically a ArrayList. Or for immutable use, a List implementation returned from the new List.of method.
In contrast to business apps, in extreme situations where speed and memory usage are paramount, such as your game engine, then make the most of arrays and primitives.
In the future, the distinction between objects and primitives may blur if the work done in Project Valhalla comes to fruition.
The data or the values are stored in the objects and the values are retrieved using the references of the objects. lemme clear one more thing arrays in Java are stored in the form of objects. so there is no doubt that objects stores values and accessed using reference variable of that particular object. Hope you got it.
Java deals with references to objects only. As such, there's no guarantee that the elements of an array will be contiguous in memory.
Edit: Guess this answer wasn't that clear. My bad. I meant that there's no guarantee that the objects themselves will be contiguous, in spite of the fact that the references will be, as 1-D arrays are stored contiguously. Still, Basil Bourque's answer perfectly explains how this works.
Related
Since interfaces only specify methods and not instance variables, how is storage allotted to something like:
Comparable[] aux = new Comparable[20];
How much per location storage (i.e. not counting array overhead) will be allocated?
The array is only allocating enough contiguous memory for the pointers to the objects, it doesn't need to allocate memory for the actual objects itself.
We can sometimes forget, Java still uses "pointers" (aka references), it just doesn't provide the same level of access to those pointers that other languages do
Objects are reference types, therefore every Object subtypes (including Comparator and every other interface) are reference types. It means that the size of every array item is the size of an object reference. It doesn't make a difference what kind of object it is.
What determines whether one should be used over the other?
I used to think that the deciding factor is whether you know the size of the things you want to store but I think there might be more to it than that.
Some more differences:
First and Major difference between Array and ArrayList in Java is that Array is a fixed length data structure while ArrayList is a variable length Collection class. You can not change length of Array once created in Java but ArrayList re-size itself when gets full depending upon capacity and load factor. Since ArrayList is internally backed by Array in Java, any resize operation in ArrayList will slow down performance as it involves creating new Array and copying content from old array to new array.
Another difference between Array and ArrayList in Java is that you can not use Generics along with Array, as Array instance knows about what kind of type it can hold and throws ArrayStoreException, if you try to store type which is not convertible into type of Array. ArrayList allows you to use Generics to ensure type-safety.
One more major difference between ArrayList and Array is that, you can not store primitives in ArrayList, it can only contain Objects. While Array can contain both primitives and Objects in Java. Though Autoboxing of Java 5 may give you an impression of storing primitives in ArrayList, it actually automatically converts primitives to Object.
Java provides add() method to insert element into ArrayList and you can simply use assignment operator to store element into Array e.g. In order to store Object to specified position.
One more difference on Array vs ArrayList is that you can create instance of ArrayList without specifying size, Java will create Array List with default size but its mandatory to provide size of Array while creating either directly or indirectly by initializing Array while creating it. By the way you can also initialize ArrayList while creating it.
Use array when you know the exact size of the collection and you don't expect to add/remove elements.
Use List (ArrayList) when you don't know the exact size of the collection and you expect to alter it at some point.
If you're using Java8, there is the Stream API, which helps to significantly reduce the boilerplate code when working with collections. This is another plus for ArrayList (and all Collections and Maps).
More info:
Arrays vs ArrayList in performance
Unless speed is critical (really critical, like every microsecond counts), use ArrayList whenever possible. It's so much easier to use, and that's usually the most important thing to consider.
Generally, I use ArrayList, not arrays, because they offer a lot of several methods that are very usefull. I think you can use array if performance is very important, in very special cases.
Array is fixed, ArrayList is growable.If the number of elements is fixed, use an array
Also one of the great benefits of collection implementations is they give you a lot of flexibility. So depending on your need, you can have a List behave as an ArrayList or as a LinkedList and so on. Also if you look at the Collection API, you'd see you have methods for almost everything you'd ever need to do.
We're trying to tweak some Oracle JVM garbage collection options and one developer tried to use -XX:PretenureSizeThreshold to make sure a large array of objects was put in Tenured right away. I'm pretty sure the assumption was that the array size equals or exceeds the total size of all the objects in it.
But in Java, aren't arrays of objects just arrays of references? I.e. each object in the array, as well as the array object itself, is separate in memory and treated as separate by the garbage collector? I think the array object can still get fairly large if there are millions of entries, but it shouldn't be anywhere near the total size of the objects it "contains" if each object is much bigger than a reference.
I think there's confusion because AFAIK, in C:
It's possible to have an array of structs that really does store the structs.
It's also possible to have an array of pointers to structs.
I'm pretty sure Java always uses 1. for arrays of primitive types and always uses 2. for arrays of objects, while C can use either for any type...?
What if I use an ArrayList with frequent append()s (as we are in the case at hand)? Is only the array copied, and not the objects in the array? Also, when the array is copied, even if the old array was in Tenured the new one starts in Eden, right?
But in Java, aren't arrays of objects just arrays of references?
Just references. All objects are allocated on the heap, never in arrays or on the stack (at least officially, the optimizer may use stack allocation if possible, but this is transparent).
it shouldn't be anywhere near the total size of the objects it "contains" if each object is much bigger than a reference.
Yes, in Java whenever you say "assign/store an object", you mean the reference (pointer in C terminology).
What if I use an ArrayList with frequent append()s (as we are in the case at hand)? Is only the array copied, and not the objects in the array?
The array gets only copied when resizing is needed, i.e., very rarely and the amortized cost is proportional to the number of inserts. The referenced objects gets never copied.
Also, when the array is copied, even if the old array was in Tenured the new one starts in Eden, right?
Yes!
Using -XX:PretenureSizeThreshold for tuning is unlikely to help you.
This parameter applies only to direct Eden allocation, while most allocation is happening in TLAB (Thread Local Allocation Buffer) and -XX:PretenureSizeThreshold is ignored.
TLAB could be quite large for thread actively allocating memory (few megabytes).
You can tweak TLAB sizing, to reduce this effect, but that would probably do more harm than good.
But in Java, aren't arrays of objects just arrays of references? I.e.
each object in the array, as well as the array object itself, is
separate in memory and treated as separate by the garbage collector?
Yes.
I think there's confusion because AFAIK, in C:
It's possible to have an array of structs that really does store the structs.
It's also possible to have an array of pointers to structs.
I'm pretty sure Java always uses 1. for arrays of primitive types and
always uses 2. for arrays of objects, while C can use either for any
type...?
Java, like C, typically stores arrays of primitive types as actual arrays with elements of those types. So an int[] array with 10 elements is typically going to reserve 10×4 bytes for the array, plus overhead for the entire array object.
Arrays of objects, however, are as you say, arrays of references. So an object[] of 10 elements is going to typically take up 10×4 bytes (or perhaps 10×8 bytes on 64-bit CPUs) for the array, plus overhead, plus space for each object that each non-null element references. This corresponds in C to an array of pointers.
(I use the term "typically", because even though that's how most JVMs do it, they are not required to allocate memory in any particular fashion.)
Also be aware that Java does not have true multi-dimensional arrays like C (or C#). An int[][] in Java is actually a one-dimensional array, where each element is a reference to its own int[] subarray. In C, an int[][] really is a two-dimensional array of integers (where the lengths of all but the first dimension must be known at compile time).
Addendum
Also note that, like you say, C can have true arrays of structs, which are neither primitive types nor pointers. Java does not have this capability.
Programing languages like C,C++ will not store array values in Heap rather it keeps the value in STACK. But in Java why there is a necessity to keep array values in heap?
In Java, arrays (just like all other objects) are passed around by reference: When you pass an array to a method, it will get a reference pointing to the same location in memory, no copy is being made. This means that the array needs to remain "alive" after the method that created it, and so cannot be stored in the stack frame for the method. It needs to managed by the garbage collector, just like all other objects.
There is some research going in to optimize JVM memory allocation using "escape analysis": If an object (such as an array) can be guaranteed to never leave the current scope, it becomes possible to in fact allocate it on the stack, which is more efficient.
A short answer is that an array in Java is a reference type, and reference types live on the heap. It's worth noting that in C#, one can switch to unsafe mode and initialise arrays with stackalloc which will create the array on the stack. It's therefore quite probable that the VM would allow you to make an array on the stack, and it's merely an implementation detail that means arrays all live on the heap.
int[]a={10,20,30};
this array stores on stack, but the following array stores in heap:
`int[]num=new int num[2];`//here we build the object of array , object always located in heap
always treat arrays like java object, so do not be confused by the fact that arrays don't store on ram when we have a declaration like the one above.
In Java, we can always use an array to store object reference. Then we have an ArrayList or HashTable which is automatically expandable to store objects. But does anyone know a native way to have an auto-expandable array of object references?
Edit: What I mean is I want to know if the Java API has some class with the ability to store references to objects (but not storing the actual object like XXXList or HashTable do) AND the ability of auto-expansion.
Java arrays are, by their definition, fixed size. If you need auto-growth, you use XXXList classes.
EDIT - question has been clarified a bit
When I was first starting to learn Java (coming from a C and C++ background), this was probably one of the first things that tripped me up. Hopefully I can shed some light.
Unlike C++, Object arrays in Java do not store objects. They store object references.
In C++, if you declared something similar to:
String myStrings[10];
You would get 10 String objects. At this point, it would be perfectly legal to do something like println(myStrings[5].length); - you'd get '0' - the default constructor for String creates an empty string with length 0.
In Java, when you construct a new array, you get an empty container that can hold 10 String references. So the call:
String[] myStrings = new String[10];
println(myStringsp[5].length);
would throw a null pointer exception, because you haven't actually placed a String reference into the array yet.
If you are coming from a C++ background, think of new String[10] as being equivalent to new (String *)[10] from C++.
So, with that in mind, it should be fairly clear why ArrayList is the solution for an auto expanding array of objects (and in fact, ArrayList is implemented using simple arrays, with a growth algorithm built in that allocates new expanded arrays as needed and copies the content from the old to the new).
In practice, there are actually relatively few situations where we use arrays. If you are writing a container (something akin to ArrayList, or a BTree), then they are useful, or if you are doing a lot of low level byte manipulation - but at the level that most development occurs, using one of the Collections classes is by far the preferred technique.
All the classes implementing Collection are expandable and store only references: you don't store objects, you create them in some data space and only manipulate references to them, until they go out of scope without reference on them.
You can put a reference to an object in two or more Collections. That's how you can have sorted hash tables and such...
What do you mean by "native" way? If you want an expandable list f objects then you can use the ArrayList. With List collections you have the get(index) method that allows you to access objects in the list by index which gives you similar functionality to an array. Internally the ArrayList is implemented with an array and the ArrayList handles expanding it automatically for you.
Straight from the Array Java Tutorials on the sun webpage:
-> An array is a container object that holds a fixed number of values of a single type.
Because the size of the array is declared when it is created, there is actually no way to expand it afterwards. The whole purpose of declaring an array of a certain size is to only allocate as much memory as will likely be used when the program is executed. What you could do is declare a second array that is a function based on the size of the original, copy all of the original elements into it, and then add the necessary new elements (although this isn't very 'automatic' :) ). Otherwise, as you and a few others have mentioned, the List Collections is the most efficient way to go.
In Java, all object variables are references. So
Foo myFoo = new Foo();
Foo anotherFoo = myFoo;
means that both variables are referring to the same object, not to two separate copies. Likewise, when you put an object in a Collection, you are only storing a reference to the object. Therefore using ArrayList or similar is the correct way to have an automatically expanding piece of storage.
There's no first-class language construct that does that that I'm aware of, if that's what you're looking for.
It's not very efficient, but if you're just appending to an array, you can use Apache Commons ArrayUtils.add(). It returns a copy of the original array with the additional element in it.
if you can write your code in javascript, yes, you can do that. javascript arrays are sparse arrays. it will expand whichever way you want.
you can write
a[0] = 4;
a[1000] = 434;
a[888] = "a string";