Maximum size matrix without getting OutOfMemory error?

Maximum size matrix without getting OutOfMemory error? - java

Admittedly, I could have figured it out through trial and error, but I would also like to know whether or not this number varies and how (computer performance, other data structures present, compiler type, etc.). Thanks!

You will need to increase you JVM heap size if you run out of memory, read this. Nothing you can do if your matrix requires a lot of memory (if there is no memory leaks) other than increasing the heap size.
You can change the size of your matrix as large as you want (but not bigger than the maximum value of integer which is used as index) if you have enough memory. Integer is 32bits so, you have the maximum theoretical limit there.

While the maximum array size is the limited by a 32-bit signed value. i.e. 2^31-1 or about 2 billion, most matrices are implemented as two dimensional arrays so the maximum size is 2 billion * 2 billion. You could use float or double, but if you had a matrix that big your accumulated rounding error would be enormous. ~ 2^62 bits which is more than the accuracy of a double, so you would have to use BigDecimal in any case. Say the each cell took about 128 (2^7) bytes of memory you would need a total of 2^69 bytes or 512 Exa-bytes (32x the theoretical limit of the memory a 64-bit processor can handle)

it also depends upon the memory of your machine and how much memory you allocate for the process using -Xmx.

Related

Why maximum size of an java array is Integer.MAX_VALUE/7?

I am little surprised to see why on my machine, the maximum size of an array is Integer.MAX_VALUE/7
I know that the arrays are indexed by integers, and so the array size cannot be greater than Integer.MAX_VALUE. I also read some stackoverflow discussions where I found that it varies on the JVM, and some(5-8 bites) are used by JVM.
In that case also, the maximum values should be Integer.MAX_VALUE-8.
Any value in between Integer.MAX_VALUE-2 and Integer.MAX_VALUE/7 gives me the error: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
int[] arr = new int[Integer.MAX_VALUE/7];
This is the largest value I can assign to array on my machine. Are specific reasons for this?
Update:
I am running the code from eclipse in which the default heap size is 1024Mb. Below are the more details from my environment:
System.out.println(Runtime.getRuntime().totalMemory()/(1024*3));
System.out.println(Runtime.getRuntime().freeMemory()/(1024*3));
System.out.println(Runtime.getRuntime().maxMemory()/(1024*3));
give the output:
40618
40195
594773

As mentioned already by cloudworker, the real limits for array are explained here: Do Java arrays have a maximum size?
In your case 1GB is just not enough heap space for array that huge.
I do not know what exact processes are run in JVM, but from what I am able to count:
Integer.MAX_VALUE= ~2 billions
int = 4bytes
2billions*4bytes=8billions bytes = 8GB memory
With 1GB heapspace you should be able to have ~ /8 of MAX_VALUE. (I think that reason that you can actually get more than /8 is some optimization in JVM)

Float or double type in terms of storage and memory

I am parsing data where precision is not my main concern. I often get java.lang.OutOfMemoryError even if I use maximum Java heap size. So my main concern here is memory usage, and java heap space. Should I use double or float data type?

I consistently get OOM exceptions because I use a great number of ArrayLists with numbers.
Well that is your problem!
An ArrayList of N 32-bit floating point values takes at least1 20 * N bytes in a 32-bit JVM and 24 * N bytes in a 64-bit JVM2.
An ArrayList of N 64-bit floating point values takes the same amount of space3.
The above only accounts for the backing array and the list elements. If you have huge numbers of small ArrayList objects, the overhead of the ArrayList object itself may be significant. (Add 16 or 24 bytes for each ArrayList object`.)
If you make use of dynamic resizing, this may generate object churn as the backing array grows. At some points, the backing array may be as much as twice as large as it needs to be.
By contrast:
An array of 32-bit floating point values takes approximately 4 * N bytes4.
An array of 64-bit floating point values takes approximately 8 * N bytes4.
There is no wastage due to dynamic resizing.
Solutions:
ArrayList<Float> versus ArrayList<Double> makes no difference. It is NOT a solution
For maximal saving, use float[] or double[] depending on your precision requirements. Preallocate the arrays to hold the exact number of elements required.
If you want the flexibility of dynamic resizing there are 3rd-party libraries that implement space efficient lists of primitive types. Alternatively implement your own. However, you won't be able to use the standard List<...> API because that forces you down the path of using Float OR Double.
1 - The actual space used depends on how the ArrayList was created and populated. If you pre-allocate an ArrayList with exactly the correct capacity, you will use the space I said above. If you build the array by repeatedly appending to an ArrayList with the default initial capacity, you will use on average N * 2 bytes extra space for a 32-bit JVM. This is due to the heuristic that ArrayList uses to grow the backing array when it is full.
2 - On a 64-bit JVM, a pointer occupies 8 bytes rather than 4 ... unless you are using compressed oops.
3 - The reason it takes the same amount of bytes is that on a typical JVM a Float and a Double are both 16 bytes due to heap node padding.
4 - There is a header overhead of (typically) 12 bytes per array, and the array's heap node size is padded to a multiple of 8 bytes.

If your memory usage is related to a huge amount (many millions) of floating-point numbers (which can be verified with a decent memory profiler), then you're most probably storing them in some data structures like arrays or lists.
Recommendations (I guess, you are already following most of them...):
Prefer float over double if number range and precision are sufficient, as that consumes only half the size.
Do not use the java.lang.Float or java.lang.Double classes for storage, as they hav a considerable memory overhead compared to the naked scalar values.
Be sure to use arrays, not collections like java.util.List, as they store boxed java.lang.Float instances instead of the naked numbers.
But above that, have a decent memory profiler show you which instances occupy most of your memory. Maybe there are other memory consumers besides the float/double data.
EDIT:
The OP's recent comment "I consistently get OOM exceptions because I use a great number of ArrayLists with numbers" makes it clear. ArrayList<Float> wastes a lot of memory when compared to float[] (Stephen C gave detailed numbers in his answer), but gives the benefit of dynamic resizing.
So, I see the following possibilities:
If you can tell the array size from the beginning, then immediately use float[] arrays.
If you need the dynamic size while initializing instances, use ArrayList<Float> while building one object (when size still increases), and then copy the contents to a float[] array for long-term storage. Then the wasteful ArrayLists exist only for a limited timespan.
If you need dynamic sizes over the whole lifespan of your data, create your own FloatArrayList class based on a float[] array, resembling the ArrayList<Float> as far as your code needs it (that can range from a very shallow implementation up to a full-featured List, maybe based on AbstractList).

How many GB is allocated for a 3D array of size 10,000 x 10,000 x 10,000?

I asked How do I make a large 3D array without running out of memory?, and I received many answers. All of the answers basically said the same thing, but with one minor difference. Each answer claimed a different size for my 10,000 x 10,000 x 10,000 array. (931 GB, 931 * 4 GB, 4 TB, 3725 GB, etc.)
How do you determine the memory that needs to be allocated for a 3D array?

As java implements multi-dimensional arrays as array of arrays, there are relatively many variables to account for.
The innermost (last dimension) generally follows the formula:
sizeOf = (number of bytes per element) * (number of elements) + (fixed overhead)
Fixed overhead is approximately 12 bytes and the number may need rounding to account for memory granularity, too. For larger arrays both factors contribute very little to array size.
For all other dimensions its the same formula, but the element type is always Reference. Bytes per element works out to either 4 (for 32 bit VM's and 64 bit VM's using compressed OOPS) or 8 (64 bit VM without compressed OOPS).
From there its simple math.

Why are JVM memory parameters usually in multiples of 256?

I have seen almost all of the JVM memory parameters usually in multiples of 256 or a round binary value - e.g. 256m, 512m, 1024m etc. I know, it may be linked to the fact that physical memory (RAM) is usually a binary number, such as 256 MB, 1 GB etc.
My question is does it really help the memory management in anyway if JVM memory is set to a value that is a multiple of 256 or any binary value for that matter? Does it hurt to keep the JVM memory a round decimal value, such as 1000m, instead of 1024m - though I have never seen any JVM using such a value that is considered round in terms of decimal.
The OS would allocate the mentioned memory to JVM when it is launched, so I guess, it is more of a question for the JVM whether it can manage a round decimal memory size (e.g. 1000 MB) efficiently or would there be any shortcomings.
EDIT: I know, we CAN use decimal values for JVM memory, but my question is SHOULD we use decimal values?
EDIT2: For opinions/guesses about JVM being equally efficient of handling every memory size, please share any relevant links that you used for coming to that conclusion. I have seen enough WAR on this topic among fellow developers, but I haven't seen much of concrete reasoning to back either - decimal or binary value.

It is not necessary to use a multiple of 2 for the JVM memory parameters. It is just common use for memory allocation to double the value if the old one isn't enough.
If you increase your assigned memory value in 1MB steps you will have to adjust the value several (hundred) times before the configuration matches you requirements. So it it just more comfortable to double the old value.
This relies on the fact that memory is a cheap ressource in those days.
EDIT:
As you already mentioned it is possible to assign values like 1000 MB or 381 MB. The JVM can handle every memory size that is big enough to host the permGenSpace, the stack and the heap.

It does not matter. There is no special treatment of rounded values.
You can specify the memory size within 1-byte accuracy - JVM itself will round the size up to the value it is comfortable with. E.g. the heap size is rounded to the 2 MB boundary. See my other answer: https://stackoverflow.com/a/24228242/3448419

There is not real requirement those values be a multiplier of 2. It is just a way of use. you can use what ever values there.
-Xms1303m -Xmx2303m -XX:MaxPermSize=256m // my configs

I think it is mainly a way of thinking, something like: I have a 1 GB memory, I will give JVM a half, which is 512 MB.

It is just a way to ensure the size that you specify (as argument) fits to the actual allocated memory, because the machine allocates the memory in blocks of size power of 2.

Java : How is memory usage of a program calculated by a java compiler?

I am solving this problem. It desires a memory limit of 50000bytes . So if I allocate a 2D array of int of size 1000 X 1000 , shouldn't it exceed the memory bounds ?
PS : I saw this solution to the problem and the programmer has allocated a 2D array of size m X m . And if m is equal to 1000, then I think the memory bound will be exceeded. But codechef has accepted his solution.
Is there a faulty mechanism of codechef compiler or am I missing something?

From the site:
Source Limit: 50000 Bytes
This limit applies to the size of your source code, not to the amount of memory the program allocates. The two are completely unrelated.

50000 bytes is the maximum size that your source code can have, it is not at all related to the memory that your program uses. The 2D array of size 1000*1000 will be allocated to your program from the RAM (Primary memory).
By the way on cadechef, limits for maximum size of a single array is around 10^7 to 10^8 as it is very difficult to allot contiguous memory locations.
You can refer to this discussion on codechef for further details.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.