In my project, I get entries of a form from two servers and keeping them in a hashmap.
key is serverName and value is 2d ArrayList (ArrayList<ArrayList<Object>>)
In ArrayList, I keep the values of fields that belong to the form on that server.
I compare these values in two server and print them to an excel file.
My problem is that When I get a form with 12000 entries and 100 fields, This map use around 400M of memory. I don't want my program to use this much memory. Can you suggest me anything?
I doubt it's the hashmap that is causing you problems, but the ArrayList, since it allocates room for 10 entries by default. If you're only storing one or two values for each index, then that will be wasteful.
You could try setting the initial size to 1 or 2 to see if that helps. A potential downside is that if the size is too small, it will cause frequent reallocation. But you will see yourself if that causes any significant slowdown.
The HashMap ist not at all the problem here. What objects are actually contained in the ArrayList<ArrayList<Object>>?
You really should use VisualVM and do some heap profiling to see what actually takes up your memory. That's much better than the guesswork here, and you may be surprised by the result.
I suppose that much of the memory waste results from using a lot of ArrayLists. They are designed for dynamic use (additions & removals), so they usually have many unused positions. If your matrix is static, consider using 2d array instead of a list of a lists. Otherwise, try to set the capacity of the ArrayList to some estimated value, instead of the default value.
The problem is obviously not the Hashmap itself, because it has no more then two entries (the keys are your two server names). You just have to handle a large amount of data(2 x 12000 x 100 values, if I get it right plus the result, which is an 'excel file'). It just needs some memory. The big objects are the two 2D arrays lists. The map just has references to those data structures.
Usually I wouldn't care and just increase the max heap size to 512M or even 1G.
Related
I have to count the number of repeats for different strings in Java. Those strings can be huge, come from several data sources, and a large number of the strings are repeated.
I need to get the only 20 from those strings with most frequency of every hour.
I considered counting the occurrence of each string, store them in a huge HashMap, with a PriorityQueue to keep the top string occurrence, but that will also consume a lot of memory. At the start of every hour, the old hash map will be dropped, a new hash map would be created to count for the new hour's 20-top-frequent strings. This could cause the JVM spend a long time to garbage collect that memory.
String#intern could help a little, but the HashMap is also a problem for the memory, and in the future I also want to store the aggregate data off-heap, but the uncertain distinct number of strings make it hard to estimate off-heap memory and the way to store those strings. Is there any advice to avoid map in off-heap?
I'm also interested in cardinality estimation,but seems hard to use it to count the number of replications of each string.
A HashMap is the answer. It uses less memory than you think, because the map holds references to unique Strings, and uses O(1) space per entry. There's no getting around having to store one copy of each string, so a map won't cost much more memory than the (unique) strings themselves. Just accumulate the total occurrences of each string and use it to find the top 20.
If you run out of memory, you'll have to implement the map on disk, eg a relational database or NoSql, or something else. The principle of using a map (or map like structure) is the way to go.
I argue that a SortedMultiset from Guava would be easier to use in this case. You can pass it a custom Comparator so that you can easily grab the first 20 entries (the most frequent strings). It uses the same amount of memory as a Map implementation, and it automatically handles the accumulation for you.
Generally, They say that we have moved from Array to ArrayList for the following Reason
Arrays are fixed size where as Array Lists are not .
One of the disadvantages of ArrayList is:
When it reaches it's capacity , ArrayList becomes 3/2 of it's actual size. As a result , Memory can go wasted if we donot utilize the space properly.In this scenario, Arrays are preferred.
If we use ArrayList.TrimSize(), will that make Array List a unanimous choice? Eliminating the only advantage(fixed size) Array has over it?
One short answer would be: trimToSize doesn't solve everything, because shrinking an array after it has grown - is not the same as preventing growth in the first place; the former has the cost of copying + garbage collection.
The longer answer would be: int[] is low level, ArrayList is high level which means it's more convenient but gives you less control over the details. Thus in a business-oriented code (e.g. manipulating a short list of "Products") i'll prefer ArrayList so that I can forget about the technicalities and focus on the business. In a mathematically-oriented code i'll probably go for int[].
There are additional subtle differences, but i'm not sure how relevant they are to you. E.g. concurrency: if you change the data of ArrayList from several threads simultaneously, it will intentionally fail, because that's the intuitive requirement for most business code. An int[] will allow you to do whatever you want, leaving it up to you to make sure it makes sense. Again, this can all be summarized as "low level"...
If you are developing an extremely memory critical application, need resizability as well and performance can be traded off, then trimming array list is your best bet. This is the only time, array list with trimming will be unanimous choice.
In other situations, what you are actually doing is:
You have created an array list. Default capacity of the list is 10.
Added an element and applied trim operation. So both size and capacity is now 1. How trim size works? It basically creates a new array with actual size of the list and copies old array data to new array. Old array is left for grabage collection.
You again added a new element. Since list is full, it will be reallocated with more 50% spaces. Again, procedure similar to 2 will be followed.
Again you call TrimSize and it follows same procedure as 2.
Things repeats...
So you see, we are incurring lots of performance overhead just to keep list capacity and size same. Fixed size is not offering you anything advantageous here except saving few more extra spaces which is hardly an issue in modern machines.
In a nutshell, if you want resizability without writing lots of boilerplate code, then array list is unanimous choice. But if size never changes and you don't need any dynamic function such as removal operation, then array is better choice. Few extra bytes are hardly an issue.
My first question is I want to select 100000 elements from database,can list store that many elements?
My second question is I want to fetch all the elements from database in minimum time?Is list is the best way to store elements or is there any other way which can improve performance?
1) Yes, list can store 100000+ elements.The maximum capacity of an List is limited only by the amount of memory the JVM has available.
2) For performance issues, it depends on the type of data to be stored. Normaly HashMaps are used for databases.
i normally use list over your quota, and lists is a good way. If you use string is really great but what about raw type?
you can store Integer.MAX_VALUE elements in List I suspect since the value of index can not accept more than this.
List can store more that 100000 elements. The list capacity is only bound by the JVM memory capacity or Integer.MAX_VALUE whichever is less.
However, If you use know the number of elements that will be retrieved, then, using a simple array gives far better performance.
The maximum size of a List is limited by the maximum value of a Java integer, because integers are used to index the list and to return the size of the list in the method int size();. The maximum value of an int in Java is Integer.MAX_VALUE which is 2147483647.
A particular implementation of List could have a lower limit, but for java.util.ArrayList, that is the limit.
Of course you could run out of memory long before that, that really depends on the memory of your computer and whether you are using a 64-bit version of the JVM or the 32-bit version.
For your second question: the time it takes to transfer data from the database is almost always far higher than the time taken to store the data in the memory of the computer, so if you only worry about the time it takes to store the data in the list, then you should not worry.
If you however are thinking about the time it takes to retrieve the data, then it really depends on how you are retrieving the data from the collection (using a particular key for example).
In many cases, an implementation of java.util.Map such as java.util.HashMap will have better performance when you are retrieving data by a particular key.
can list store that many elements?
Many implementations of java.util.List do not restrict the number of elements, i.e. the number of elements is only limited by available heap memory.
The most commonly used List implementation, ArrayList, is limited to about 2 billion elements (Integer.MAX_VALUE), because that is the maximum length of a Java array.
Other List implementations, such as the Lists returned by Arrays.asList(), Collections.emptyList(), or Collections.singletonList(), have a fixed size, and can not be added to.
Is list is the best way to store elements or is there any other way which can improve performance?
If all you need is to store the elements for later iteration, an ArrayList is probably the best choice, but compared to the cost of communicating with a database, the overhead of any collection implementation will be insignificant, as the database will generally have to perform disk I/O, which is far slower than writing the data to main memory, and writing the actual data (the objects in the list) will take longer than writing the Collection itself.
I want to select 100000 elements from database,can list store that many elements?
Yes. There is an upper limit on the size of an ArrayList (2^31) ... but you are a long way off that. And some other List implementations don't have that limit.
I want to fetch all the elements from database in minimum time? Is list is the best way to store elements or is there any other way which can improve performance?
Most of the CPU time will be spent performing the query and reading from the resultset rather than appending to the list.
The performance of the collection will depend on the element type (object or primitive), and on whether or not you know how many elements there will be. A bare array will give you the best performance if you know the element count beforehand, and an ArrayList if you don't know1. For the case of a primitive types, consider using the "trove" list type instead to avoid the overhead of using primitive type wrappers.
1 - That is ... unless you are prepared to implement an ArrayList-like expansion algorithm for your array based collection.
I want to read some XML-files and convert it to a graph (no graphics, just a model). But because the files are very large (2,2 GB) my model object, which holds all the information, becomes even larger (4x the size of the file...).
Googling through the net I tried to find ways to reduce the object size. I tried different collection types but would like to stick to a HashMap (because I have to have random access). The actuall keys and values make up just a small amount of the allocated memory. Most of the hash table is empty...
If I'm not totally wrong a garbage collection doesn't help me to free the allocated memory and reduce the size of the hashmap. Is there and other way to release unused memory and shrink the hashmap? Or is there a way to do perfect hashing? Or shoud I just use another collection?
Thanks in advance,
Sebastian
A HashMap is typically just a large array of references filled to a certain percentage of capacity. If only 80% of the map is filled, the remaining 20% of the array cells are unused (i.e., are null). The extra overhead is really only just the empty (null) cells.
On a 32-bit CPU, each array cell is usually 4 bytes in size (although some JVM implementations may allocate 8 bytes). That's not really that much unused space overall.
Once your map is filled, you can copy it to another HashMap with a more appropriate (smaller) size giving a larger fill percentage.
Your question seems to imply that there are more allocated but unused objects that you're worried about. But how is that the case?
Addendum
Once a map is filled almost to capacity (typically more than 95% or so), a larger array is allocated, the old array's contents are copied to the new array, and then the smaller array is left to be garbage collected. This is obviously an expensive operation, so choosing a reasonably large initial size for the map is key to improving performance.
If you can (over)estimate the number of cells needed, preallocating a map can reduce or even eliminate the resizing operations.
What you are asking is not so clear, it is not clear if memory is taken by the objects that you put inside the hasmap or by the hashmap itself, which shouldn't be the case since it only holds references.
In any case take a look at the WeakHashMap, maybe it is what you are looking for: it is an hashmap which doesn't guarantee that keys are kept inside it, it should be used as a sort of cache but from your description I don't really know if it is your case or not.
If you get nowhere with reducing the memory footprint of your hashmap, you could always put the data in a database. Depending on how the data is accessed, you might still get reasonable performance if you introduce a cache in front of the db.
One thing that might come into play is that you might have substrings that are referencing old larger strings, and those substrings are then making it impossible for the GC to collect the char arrays that are too big.
This happens when you are using some XML parsers that are returning attributes/values as substring from a larger string. (A substring is only a limited view of the larger string).
Try to put your strings in the map by doing something like this:
map.put(new String(key), new String(value));
Note that the GC then might get more work to do when you are populating the map, and this might not help you if you don't have that many substrings that are referencing larger strings.
If you're really serious about this and you have time to spare, you can make your own implementation of the Map interface based on minimal perfect hashing
If your keys are Strings, then there apparently is a map available for you here.
I haven't tried it myself but it brags about reduced memory usage.
You might give the Trove collections a shot. They advertise it as a more time and space efficient drop-in replacement for the java.util Collections.
Say I instantiate 100 000 of Vectors
a[0..100k] = new Vector<Integer>();
If i do this
a[0..100k] = new Vector<Integer>(1);
Will they take less memory? That is ignoring whether they have stuff in them and the overhead of expanding them when there has to be more than 1 element.
According to the Javadoc, the default capacity for a vector is 10, so I would expect it to take more memory than a vector of capacity 1.
In practice, you should probably use an ArrayList unless you need to work with another API that requires vectors.
When you create a Vector, you either specify the size you want it to have at the start or leave some default value. But it should be noted that in any case everything stored in a Vector is just a bunch of references, which take really little place compared to the objects they are actually pointing at.
So yes, you will save place initially, but only by the amount which equals to the difference between the default size and the specified multiplied by the size of a reference variable. If you create a really large amount of vectors like in your case, initial size does matter.
Well, sort of yes. IIRC Vector initializes internally 16 elements by default which means that due to byte alignment and other stuff done by underlying VM you'll save a considerable amount of memory initially.
What are you trying to accomplish, though?
Yes, they will. Putting in reasonable "initial sizes" for collections is one of the first things I do when confronted with a need to radically improve memory consumption of my program.
Yes, it will. By default, Vector allocates space for 10 elements.
Vector()
Constructs an empty vector so that its internal data array has size 10
and its standard capacity increment is zero.increment is zero.
Therefore, it reserves memory for 10 memory references.
That being said, in real life situations, this is rarely a concern. If you are truly generating 100,000 Vectors, you need to rethink your designincrement is zero.