I have a program where I will be using a very large short[] array:
import java.lang.Math;
public class HPTest {
public static void main(String[] args) {
int n = 30;
short[] a = new short[(int)Math.pow(2,n)];
}
}
As far as I know, a short[] array should use 2 bytes per element, and so an array with 2^30 elements should need about 2 GiB of RAM.
In order to run the program, I therefore tried
java -Xms2000m HPTest
but still got a heap space error. Even at 3000m I got the same error, but at 4000m it worked.
Any ideas as to why I had to go so far above the estimated limit of 2000m?
EDIT:
As has been pointed out by many users, I made a very embarrassing error in declaring that a short needs 1 byte rather than 2 bytes. The question then should be why it doesn't suffice with 2000m.
Something this large, will be much happier outside the heap. You would be better off looking in to NIO and using direct byte buffers to back your large Short array. This memory can be kept out of the heap, and away from the mitts of the garbage collector (who may at times feel inclined to want to copy your buffer from one area to the other).
See java.nio.ShortBuffer and start digging from there.
Related
I am doing some experiments on memory. The first problem I met is how to allocate given amount of memory during runtime, say 500MB. I need the program's process hold it until the program exit.
I guess there may be several ways to achieve this? I prefer a simple but practical one.
Well, Java hides memory management from you, so there are two answers to your question:
Create the data structures of this size, you are going to need and hold a reference to them in some thread, until the program exits, because, once there is no reference to data on the heap in an active thread it becomes garbage collectable. On a 32-bit system 500MB should be roughly enough for an int array of 125000 cells, or 125 int arrays of 1000 cells.
If you just want to have the memory allocated and available, but not filled up, then start the virtual machine with -Xms=512M. This is going to make the VM allocate 512 M of memory for your program on startup, but it is going to be empty (just allocated) until you need it (do point 1). Xmx sets the maximum allocatable memory by your program.
public static void main( String[] args ) {
final byte[] x = new byte[500*1024 ]; // 500 Kbytes
final byte[] y = new byte[500*1024*1024]; // 500 Mbytes
...
System.out.println( x.length + y.length );
}
jmalloc lets you do it, but I wouldn't recommend it unless you're truly an expert. You're giving up something that's central to Java - garbage collection. You might as well be writing C.
Java NIO allocates byte buffers off heap this way. I think this is where Oracle is going for memory mapping JARs and getting rid of perm gen, too.
I wish to make a large int array that very nearly fills all of the memory available to the JVM. Take this code, for instance:
final int numBuffers = (int) ((runtime.freeMemory() - 200000L) / (BUFFER_SIZE));
System.out.println(runtime.freeMemory());
System.out.println(numBuffers*(BUFFER_SIZE/4)*4);
buffers = new int[numBuffers*(BUFFER_SIZE / 4)];
When run with a heap size of 10M, this throws an OutOfMemoryException, despite the output from the printlns being:
9487176
9273344
I realise the array is going to have some overheads, but not 200k, surely? Why does java fail to allocate memory for something it claims to have enough space for? I have to set that constant that is subtracted to something around 4M before Java will run this (By which time the printlns are looking more like:
9487176
5472256
)
Even more bewilderingly, if I replace buffers with a 2D array:
buffers = new int[numBuffers][BUFFER_SIZE / 4];
Then it runs without complaint using the 200k subtraction shown above - even though the amount of integers being stored is the same in both arrays (And wouldn't the overheads on a 2D array be larger than that of a 1D array, since it's got all those references to other arrays to store).
Any ideas?
The VM will divide the heap memory into different areas (mainly for the garbage collector), so you will run out of memory when you attempt to allocate a single object of nearly the entire heap size.
Also, some memory will already have been used up by the JRE. 200k is nothing with todays memory sizes, and 10M heap is almost unrealistically small for most applications.
The actual overhead of an array is relatively small, on a 32bit VM its 12 bytes IIRC (plus what gets wasted if the size is less than the minimal granularity, which is AFAIK 8 bytes). So in the worst case you have something like 19 bytes overhead per array.
Note that Java has no 2D (multi-dimensional) arrays, it implements this internally as an array of arrays.
In the 2D case, you are allocating more, smaller objects. The memory manager is objecting to the single large object taking up most of the heap. Why this is objectionable is a detail of the garbage collection scheme-- it's probably because something like it can move the smaller objects between generations and the heap won't accomodate moving the single large object around.
This might be due to memory fragmentation and the JVM's inability to allocate an array of that size given the current heap.
Imagine your heap is 10 x long:
xxxxxxxxxx
Then, you allocate an object 0 somehere. This makes your heap look like:
xxxxxxx0xx
Now, you can no longer allocate those 10 x spaces. You can not even allocate 8 xs, despite the fact that available memory is 9 xs.
The fact is that an array of arrays does not suffer from the same problem because it's not contiguous.
EDIT: Please note that the above is a very simplistic view of the problem. When in need of space in the heap, Java's garbage collector will try to collect as much memory as it can and, if really, really necessary, try to compact the heap. However, some objects might not be movable or collectible, creating heap fragmentation and putting you in the above situation.
There are also many other factors that you have to consider, some of which include: memory leaks either in the VM (not very likely) or your application (also not likely for a simple scenario), unreliability of using Runtime.freeMemory() (the GC might run right after the call and the available free memory could change), implementation details of each particular JVM, etc.
The point is, as a rule of thumb, don't always expect to have the full amount of Runtime.freeMemory() available to your application.
Java has a tendency to create a large number objects that needs to be garbage collected when processing large data set. This happens fairly frequently when streaming a amounts of data from the database, creating reports, etc. Is there a strategy to reduce the memory churn.
In this example, the object based version spends significant amount of times (2+ seconds) generating objects and performing garbage collection whereas the boolean array version completes in a fraction of a section without any garbages collection whatsoever.
How do I reduce the memory churn (the need for large number of garbage collections) when processing large data sets?
java -verbose:gc -Xmx500M UniqChars
...
----------------
[GC 495441K->444241K(505600K), 0.0019288 secs] x 45 times
70000007
================
70000007
import java.util.HashSet;
import java.util.Set;
public class UniqChars {
static String a=null;
public static void main(String [] args) {
//Generate data set
StringBuffer sb=new StringBuffer("sfdisdf");
for (int i =0; i< 10000000; i++) {
sb.append("sfdisdf");
}
a=sb.toString();
sb=null; //free sb
System.out.println("----------------");
compareAsSet();
System.out.println("================");
compareAsAry();
}
public static void compareAsSet() {
Set<String> uniqSet = new HashSet<String>();
int n=0;
for(int i=0; i<a.length(); i++) {
String chr = a.substring(i,i);
uniqSet.add(chr);
n++;
}
System.out.println(n);
}
public static void compareAsAry() {
boolean uniqSet[] = new boolean[65536];
int n=0;
for(int i=0; i<a.length(); i++) {
int chr = (int) a.charAt(i);
uniqSet[chr]=true;
n++;
}
System.out.println(n);
}
}
Well as pointed out by one of the comments it's your code, not Java at fault for memory churn. So let's see you've written this code that builds an insanely large String from a StringBuffer. Calls toString() on it. Then calls substring() on that insanely large string which is in a loop and creating new a.length() Strings. Then does some in place junk on an array that really will perform pretty damn fast since there is no object creation, but ultimately writes to true to the same 5-6 locations in a huge array. Waste much? So what did you think would happen? Ditch StringBuffer and use StringBuilder since it's not fully synchronized which will be a little faster.
Ok so here's where your algorithm is probably spending its time. See the StringBuffer is allocating an internal character array to store stuff in each time you call append(). When that character array fills entirely up, it has to allocate a larger character array, copy all that junk you just wrote to it into the new array, then append what you originally called it with. So your code is allocating filling up, allocating a bigger chunk, copying that junk to the new array, then repeating that process until it does that 1000000 times. You can speed that up by pre-allocating the character array for the StringBuffer. Roughly that's 10000000 * "sfdisdf".length(). That will keep Java from creating tons of memory that it just dumps over and over.
Next is the compareAsSet() mess. Your line String chr = a.substring(i,i); is creating NEW strings a.length() times. Well since you're doing a.substring(i,i) is only a character you could just charAt(i) then there's no allocating happen. There's also an option of CharSequence which doesn't create a new String with it's own character array but simply points to the original underlying char[] with an offset and length. String.subSequence()
You plug this same code in any other language and it'll suck there too. In fact I'd say far far worse. Just try this is C++ and watch it be significantly worse than Java should you allocate and deallocate this much. See Java memory allocation is way way way faster than C++ because everything in Java is allocated from a memory pool so creating objects is magnitudes faster. But, there are limits. Furthermore, Java compresses its memory should it become too fragmented, C++ doesn't. So as you allocate memory and dump it, just in the same way, you'll probably run the risk of fragmenting the memory in C++. That could mean your StringBuffer might run out of the ability to grow large enough to finish and would crash.
In fact that might also explain some of the performance issues with GC because it's having to make room more a continuous block big enough after lots of trash has been taken out. So Java is not only cleaning up the memory its also having to compress the memory address space so it can get a block big enough for your StringBuffer.
Anyway, I'm sure your just testing the tires, but testing with code like this isn't really smart because it'll never perform well because it's unrealistic memory allocation. You know the old adage Garbage In Garbage Out. And that's what you got Garbage.
In your example your two methods are doing very different things.
In compareAsSet() you are generating the same 4 Strings ("s", "d", "f" and "i") and calling String.hashCode() and String.equals(String) (HashSet does this when you try to add them) 70000007 times. What you end up with is a HashSet of size 4. While you are doing this you are allocating String objects each time String.substring(int, int) returns which will force a minor collection every time the 'new' generation of the garbage collector gets filled.
In compareAsAry() you've allocated a single array 65536 elements wide changed some values in it and and then it goes out of scope when the method returns. This is a single heap memory operation vs 70000007 done in compareAsSet. You do have a local int variable being changed 70000007 times but this happens in stack memory not in heap memory. This method does not really generate that much garbage in the heap compared to the other method (basically just the array).
Regarding churn your options are recycle objects or tuning the garbage collector.
Recycling is not really possible with Strings in general as they are immutable, though the VM may perform interning operations this only reduces total memory footprint not garbage churn. A solution targeted for the above scenario that recycles could be generated but the implementation would be brittle and inflexible.
Tuning the garbage collector so that the 'new' generation is larger could reduce the total number of collections that has to be performed during your method call and thus increase the throughput of the call, you could also just increase the heap size in general which would accomplish the same thing.
For futher reading on garbage collector tuning in Java 6 I recommend the Oracle white paper linked below.
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
For comparison, if you wrote this it would do the same thing.
public static void compareLength() {
// All the loop does is count the length in a complex way.
System.out.println(a.length());
}
// I assume you intended to write this.
public static void compareAsBitSet() {
BitSet uniqSet = new BitSet();
for(int i=0; i<a.length(); i++)
uniqSet.set(a.charAt(i));
System.out.println(uniqSet.size());
}
Note: the BitSet uses 1 bit per element, rather than 1 byte per element. It also expands as required so say you have ASCII text, the BitSet might use 128-bits or 16 bytes (plus 32-byte overhead) The boolean[] uses 64 KB which is much higher. Ironically, using a boolean[] can be faster as it involves less bit shifting and only the portion of the array used needs to be in memory.
As you can see, with either solution, you get a much more efficient result because you use a better algorithm for what needs to be done.
I'm trying to find a class for storing a vector of bytes in Java, which supports: random access (so I can get or set a byte anywhere), resizing (so I can either append stuff to the end or else manually change the size), reasonable efficiency (I might be storing megabytes of data in these things), all in memory (I don't have a file system). Any suggestions?
So far the candidates are:
byte[]. Not resizable.
java.util.Vector<Byte>. Evil. Also painfully inefficient.
java.io.ByteArrayOutputStream. Not random-access.
java.nio.ByteBuffer. Not resizable.
org.apache.commons.collections.primitives.ArrayByteList. Not resizable. Which is weird, because it'll automatically resize if you add stuff to it, you just can't change the size explicitly!
pure RAM implementation of java.nio.channels.FileChannel. Can't find one. (Remember I don't have a file system.)
Apache Commons VFS and the RAM filesystem. If I must I will, but I really like something lighter weight...
This seems like such an odd omission that I'm sure I must have missed something somewhere. I just figure out what. What am I missing?
I would consider a class that wraps lots of chunks of byte[] arrays as elements of an ArrayList or Vector.
Make each chunk a multiple of e.g. 1024 bytes, so you accessor functions can take index >> 10 to access the right element of the ArrayList, and then index & 0x3ff to access the specific byte element of that array.
This will avoid the wastage of treating each byte as a Byte object, at the expensive of wastage of whatever's left over at the end of the last chunk.
In those cases, I simply initialize a reference to an array of reasonable length, and when it gets too small, create and copy it to a larger one by calling Arrays.copyOf(). e.g.:
byte[] a = new byte[16];
if (condition) {
a = Arrays.copyOf(a, a.length*2);
}
And we may wrap that in a class if needed...
ArrayList is more efficient than Vector because it's not synchronized.
To improve efficiency you can start with a decent initial capacity. See the constructor:
ArrayList(int initialCapacity)
I think the default is only 16, while you probably may need much bigger size.
Despite what it seems, ArrayList is very efficient even with a million record! I wrote a small test program that adds a million record to an ArrayList, without declaring an initial capacity and on my Linux, 5 year old laptop (Core 2 T5200 cpu), it takes around 100 milliseconds only to fill the whole list. If I declare a 1 million bytes initial space it takes around 60-80 ms, but If I declare 10,000 items it can take around 130-160ms, so maybe is better to not declare anything unless you can make a really good guess of the space needed.
About the concern for memory usage, it take around 8 Mb of memory, which I consider totally reasonable, unless you're writing phone software.
import java.util.ArrayList;
import java.util.List;
import java.util.Vector;
import org.obliquid.util.StopWatch;
public class ArrayListTest {
public static void main(String args[]) {
System.out.println("tot=" + Runtime.getRuntime().totalMemory()
+ " free=" + Runtime.getRuntime().freeMemory());
StopWatch watch = new StopWatch();
List<Byte> bytes = new ArrayList<Byte>();
for (int i = 0; i < 1000000; i++) {
bytes.add((byte) (i % 256 - 128));
}
System.out.println("Elapsed: "
+ watch.computeElapsedMillisSeconds());
System.out.println("tot=" + Runtime.getRuntime().totalMemory()
+ " free=" + Runtime.getRuntime().freeMemory());
}
}
As expected, Vector performs a little worse in the range of 160-200ms.
Example output, without specifying a start size and with ArrayList implementation.
tot=31522816 free=31023176
Elapsed: 74
tot=31522816 free=22537648
I want to test how much memory takes a class(foo) in java.In the constructor of foo I have the followings new:
int 1 = new int[size]
int 2 = new int[size]
....
int 6 = new int[size]
The size begins from 100 and increases until 4000.
So my code is:
Runtime r = Runtime.getRuntime();
for(int i=0;i<10;i++)
r.gc();
double before = r.TotalMemory()-r.freeMemory();
Foo f = new Foo();
double after = r.TotalMemory()-r.freeMemory();
double result = after-before;
The problem is unti 2000 I have an increasing good result.But after 2000 I have a number which is < of the result of 2000. I guess that the gc is triggered.And sometimes I get the same number as it doesn't see the difference. I did run with -Xms2024m -Xmx2024m which is my full pc memory. But I get the same behaviour. I did run with -Xmn2023m -Xmx2024m and I get some strange results such as: 3.1819152E7.
Please help me on this.Thanks in advance.
All these “I need to know how much memory object A takes” question are usually a symptom of premature optimization.
If you are optimizing prematurely (and I assume that much) please stop what you’re doing right now and get back to what you really should be doing: completing the application you’re currently working on (another assumption by me).
If you are not optimizing prematurely you probably still need to stop right now and start using a profiler that will tell you which objects actually use memory. Only then can you start cutting down memory requirements for objects or checking for objects you have forgotten to remove from some collection.
Garbage collectors are clever beasts. They don't need to collect everything everytime. They can defer shuffling things around. You could read about Generational Gabage Collection.
If you want to know how much memory your class is taking, why bother to introduce undertainty by asking for garbage collection. Hold on the the successively bigger objects and examine how big your app gets. Look at the increments in size.
List myListOfBigObjects
for ( for sizes up to 100 or more ) {
make an object of current size
put it in the list
now how big are we?
}
Or you could just say "an int is so many bytes and we have n x that many bytes" there's some constant overhead for an object, but just increasing the array size will surely increase the object by a predictable amount.