I'm trying to get a general idea of the memory cost difference between an Integer array and int array. While there seems to be a lot of information out there about the differences between a primitive int and Integer object, I'm still a little confused as to how to calculate the memory costs of an int[] and Integer[] array (overhead costs, padding, etc).
Any help would be appreciated. Thanks!
In addition to storing the length of the array, an array of ints needs space for N 4-byte elements, while an array of Integers needs space for N references, whose size is platform-dependent; commonly, that would be 4 bytes on 32-bit platforms or 8 bytes on 64-bit platforms.
As far as int[] goes, there is no additional memory required to store data. Integer[], on the other hand, needs objects of type Integer, which could be all distinct or shared (e.g. through interning of small numbers implemented by the Java platform itself). Therefore, Integer[] requires up to N additional objects, each one containing a 4-byte int.
Assuming that all Integers in an Integer[] array are distinct objects, the array and its content will take two to three times the space of an int[] array. On the other hand, if all objects are shared, and the memory costs of shared objects are accounted for, there may be no additional overhead at all (on 32-bit platforms) or the there would be a 2x overhead on 64-bit platforms.
Here is a comparison on jdk6u26 of the size of an array of 1024 Integers as opposed to 1024 ints. Note that in the case of anInteger[] array containing low number Integers, these can be shared with other uses of these Integers in the JVM by the auto-box cache.
Related
I am parsing data where precision is not my main concern. I often get java.lang.OutOfMemoryError even if I use maximum Java heap size. So my main concern here is memory usage, and java heap space. Should I use double or float data type?
I consistently get OOM exceptions because I use a great number of ArrayLists with numbers.
Well that is your problem!
An ArrayList of N 32-bit floating point values takes at least1 20 * N bytes in a 32-bit JVM and 24 * N bytes in a 64-bit JVM2.
An ArrayList of N 64-bit floating point values takes the same amount of space3.
The above only accounts for the backing array and the list elements. If you have huge numbers of small ArrayList objects, the overhead of the ArrayList object itself may be significant. (Add 16 or 24 bytes for each ArrayList object`.)
If you make use of dynamic resizing, this may generate object churn as the backing array grows. At some points, the backing array may be as much as twice as large as it needs to be.
By contrast:
An array of 32-bit floating point values takes approximately 4 * N bytes4.
An array of 64-bit floating point values takes approximately 8 * N bytes4.
There is no wastage due to dynamic resizing.
Solutions:
ArrayList<Float> versus ArrayList<Double> makes no difference. It is NOT a solution
For maximal saving, use float[] or double[] depending on your precision requirements. Preallocate the arrays to hold the exact number of elements required.
If you want the flexibility of dynamic resizing there are 3rd-party libraries that implement space efficient lists of primitive types. Alternatively implement your own. However, you won't be able to use the standard List<...> API because that forces you down the path of using Float OR Double.
1 - The actual space used depends on how the ArrayList was created and populated. If you pre-allocate an ArrayList with exactly the correct capacity, you will use the space I said above. If you build the array by repeatedly appending to an ArrayList with the default initial capacity, you will use on average N * 2 bytes extra space for a 32-bit JVM. This is due to the heuristic that ArrayList uses to grow the backing array when it is full.
2 - On a 64-bit JVM, a pointer occupies 8 bytes rather than 4 ... unless you are using compressed oops.
3 - The reason it takes the same amount of bytes is that on a typical JVM a Float and a Double are both 16 bytes due to heap node padding.
4 - There is a header overhead of (typically) 12 bytes per array, and the array's heap node size is padded to a multiple of 8 bytes.
If your memory usage is related to a huge amount (many millions) of floating-point numbers (which can be verified with a decent memory profiler), then you're most probably storing them in some data structures like arrays or lists.
Recommendations (I guess, you are already following most of them...):
Prefer float over double if number range and precision are sufficient, as that consumes only half the size.
Do not use the java.lang.Float or java.lang.Double classes for storage, as they hav a considerable memory overhead compared to the naked scalar values.
Be sure to use arrays, not collections like java.util.List, as they store boxed java.lang.Float instances instead of the naked numbers.
But above that, have a decent memory profiler show you which instances occupy most of your memory. Maybe there are other memory consumers besides the float/double data.
EDIT:
The OP's recent comment "I consistently get OOM exceptions because I use a great number of ArrayLists with numbers" makes it clear. ArrayList<Float> wastes a lot of memory when compared to float[] (Stephen C gave detailed numbers in his answer), but gives the benefit of dynamic resizing.
So, I see the following possibilities:
If you can tell the array size from the beginning, then immediately use float[] arrays.
If you need the dynamic size while initializing instances, use ArrayList<Float> while building one object (when size still increases), and then copy the contents to a float[] array for long-term storage. Then the wasteful ArrayLists exist only for a limited timespan.
If you need dynamic sizes over the whole lifespan of your data, create your own FloatArrayList class based on a float[] array, resembling the ArrayList<Float> as far as your code needs it (that can range from a very shallow implementation up to a full-featured List, maybe based on AbstractList).
So I'm working on project for my algorithms class that I'm currently taking. I'm doing some research online and see that some people are using an ArrayList<Integer> and some people are using an int array[]. My question is, what's better to use for a min heap and why. The project requires me to keep the top 10000 largest numbers in the list from a very large list of numbers
If you know the array size at compile time, using a bare int[] array is faster. Of course, the performance difference is probably negligible -- but the idea is that ArrayList is internally implemented as an Object[] array, so you're saving yourself that overhead, plus the overhead of dealing with Integer vs int.
An int[] will consume less memory than an ArrayList<Integer>. Part of that is simply the overhead which is added from an Integer which adds ~16 bytes per instance. This video goes through memory impact of various objects and collections in 32bit and 64bit jvms. At about the 9:30 mark it talks about memory associated with each object. At about the 11:15 mark it talks about how much memory various types (including Object references) take.
For an int[], you have 1 Object (the int[]) and it will actually contain all of the individual int values as contiguous memory.
For an ArrayList<Integer>, you have the ArrayList object, the Object[] object and all of the Integer objects. Additionally, the Object[] doesn't actually contain the Integer objects in contiguous memory, rather it contains object references in contiguous memory. The Integer objects themselves are elsewhere on the heap.
So the end result is that an ArrayList<Integer> requires ~6x the amount of memory as an int[]. The backing Object[] and the int[] take the same amount of memory (~40,000 bytes). The 10k Integer objects take ~20 bytes each for a total of 200,000 bytes. So the ArrayList will be a minimum of 240,000 bytes compared to the int[] at approximately 40,000 bytes.
Would a boolean array of size 32 take more space than an integer variable, for example? If so, then why and by how much?
CLARIFICATION:
In java (if that is relevant, forgive me - I am not sure). Would this line:
boolean arr=new boolean[32];
take more space than this line:
int num;
An array of 32 booleans in Java takes about eight times the space of a Java int. This is because in most computer architectures the smallest addressable unit of memory is an eight-bit byte, so making an array of "packed" booleans requires additional overhead.
If you would like to use one bit per boolean, use BitSet class instead of an array of booleans. Note that you would get some overhead in addition to the data itself, so using such data structures for only 32 bits may not be economical enough to justify switching away from a simple array.
If I create an array of bytes with byte[], what would be the size of each element? Can they be resized/merged?
Thanks,
Not sure what you meant by resized and merged
from the documentation:
byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.
Edit: If by resized/merged you are talking about the array itself, there's nothing special about a byte array compared to other arrays.
There are two ways to allocate an array.
A) allocate an empty array of a given size:
byte[] ba1 = new byte[18]; // 18 elements
B) allocate an array by specifying the contents
byte[] ba2 = {1,2,3,4,5}; // 5 elements
The size would be a byte per element.
They can not be re-sized. However you can merge them yourself using System.arrayCopy() by creating a new array and copying your source arrays into the new array.
Edit 1:
There is also an 8-byte overhead for the object header and a 4-byte overhead for the array length, for a total overhead of 12 bytes. So small arrays are relatively expensive.
Check out GNU Trove and Fastutil. They are libraries that make working with primitive collections easier.
Edit 2:
I read in one of your response that you're doing object serialization. You might be interested in ByteBuffers. Those make it easy to write out various primitive types to a wrapped array and get the resulting array. Also check out Google protocol buffers if you want easily serialized structured data types.
An array of ints in java is stored as a block of 32-bit values in memory. How is an array of Integer objects stored? i.e.
int[] vs. Integer[]
I'd imagine that each element in the Integer array is a reference to an Integer object, and that the Integer object has object storage overheads, just like any other object.
I'm hoping however that the JVM does some magical cleverness under the hood given that Integers are immutable and stores it just like an array of ints.
Is my hope woefully naive? Is an Integer array much slower than an int array in an application where every last ounce of performance matters?
No VM I know of will store an Integer[] array like an int[] array for the following reasons:
There can be null Integer objects in the array and you have no bits left for indicating this in an int array. The VM could store this 1-bit information per array slot in a hiden bit-array though.
You can synchronize in the elements of an Integer array. This is much harder to overcome as the first point, since you would have to store a monitor object for each array slot.
The elements of Integer[] can be compared for identity. You could for example create two Integer objects with the value 1 via new and store them in different array slots and later you retrieve them and compare them via ==. This must lead to false, so you would have to store this information somewhere. Or you keep a reference to one of the Integer objects somewhere and use this for comparison and you have to make sure one of the == comparisons is false and one true. This means the whole concept of object identity is quiet hard to handle for the optimized Integer array.
You can cast an Integer[] to e.g. Object[] and pass it to methods expecting just an Object[]. This means all the code which handles Object[] must now be able to handle the special Integer[] object too, making it slower and larger.
Taking all this into account, it would probably be possible to make a special Integer[] which saves some space in comparison to a naive implementation, but the additional complexity will likely affect a lot of other code, making it slower in the end.
The overhead of using Integer[] instead of int[] can be quiet large in space and time. On a typical 32 bit VM an Integer object will consume 16 byte (8 byte for the object header, 4 for the payload and 4 additional bytes for alignment) while the Integer[] uses as much space as int[]. In 64 bit VMs (using 64bit pointers, which is not always the case) an Integer object will consume 24 byte (16 for the header, 4 for the payload and 4 for alignment). In addition a slot in the Integer[] will use 8 byte instead of 4 as in the int[]. This means you can expect an overhead of 16 to 28 byte per slot, which is a factor of 4 to 7 compared to plain int arrays.
The performance overhead can be significant too for mainly two reasons:
Since you use more memory, you put on much more pressure on the memory subsystem, making it more likely to have cache misses in the case of Integer[]. For example if you traverse the contents of the int[] in a linear manner, the cache will have most of the entries already fetched when you need them (since the layout is linear too). But in case of the Integer array, the Integer objects itself might be scattered randomly in the heap, making it hard for the cache to guess where the next memory reference will point to.
The garbage collection has to do much more work because of the additional memory used and because it has to scan and move each Integer object separately, while in the case of int[] it is just one object and the contents of the object doesn't have to be scanned (they contain no reference to other objects).
To sum it up, using an int[] in performance critical work will be both much faster and memory efficient than using an Integer array in current VMs and it is unlikely this will change much in the near future.
John Rose working on fixnums in the JVM to fix this problem.
I think your hope is woefully naive. Specifically, it needs to deal with the issue that Integer can potentially be null, whereas int can not be. That alone is reason enough to store the object pointer.
That said, the actual object pointer will be to a immutable int instance, notably for a select subset of integers.
It won't be much slower, but because an Integer[] must accept "null" as an entry and int[] doesn't have to, there will be some amount of bookkeeping involved, even if Integer[] is backed by an int[].
So if every last ounce of performance matters, user int[]
The reason that Integer can be null, whereas int cannot, is because Integer is a full-fledged Java object, with all of the overhead that includes. There's value in this since you can write
Integer foo = new Integer();
foo = null;
which is good for saying that foo will have a value, but it doesn't yet.
Another difference is that int performs no overflow calculation. For instance,
int bar = Integer.MAX_VALUE;
bar++;
will merrily increment bar and you end up with a very negative number, which is probably not what you intended in the first place.
foo = Integer.MAX_VALUE;
foo++;
will complain, which I think would be better behavior.
One last point is that Integer, being a Java object, carries with it the space overhead of an object. I think that someone else may need to chime in here, but I believe that every object consumes 12 bytes for overhead, and then the space for the data storage itself. If you're after performance and space, I wonder whether Integer is the right solution.