Which data structure uses more memory? - java

Which type of data structure uses more memory?
Hashtable
Hashmap
ArrayList
Could you please give me a brief explanation which one is less prone to memory leakage?

...which one to use for avoiding the memory leakage
The answer is all of them and none of them.
Memory leakage is not related to the data structure, but the way you use them.
The amount of memory used by each one is irrelevant when you aim to avoid "memory leakage".
The best thing you can do is: When you detect an object won't be used any longer in the application, you must remove it from the collection ( not only those you've listed, but any other you might use; List, Map, Set or even arrays ).
That way the garbage collector will be able to release the memory used by that object.
You can take a look at this article "How does garbage collector works" for further explanation on how Java release memory from the objects it uses.
Edit:
There are others data structures in Java which help for the references management such as WeakHashMap, but this may be considered as "advanced topics".

Most likely you should really just use a Collection that suits your current need. In the most common cases, if you need a List, use ArrayList, and if you need a Map, use HashMap. For a tutorial, see e.g. http://java.sun.com/docs/books/tutorial/collections/
When your profiler shows you there is an actual memory leak related to the use of Java Collections, then do something about it.

Your question is woefully underspecified because the concrete data structures you specify are not of comparable structure.
The HashMap/HashTable are comparable since they both function as maps (key -> value lookups).
ArrayLists (and lists in general) do not.
The HashMap/HashTable part is easy to answer as they are largely identical (the major difference is null keys) but the former is not synchronized and the latter is, thus HashMap will generally be faster (assuming the synchronization is not required) Modern JVM's are reasonably fast at uncontended locks though so the difference will be small in a micro benchmark.

Well, I've actually been, recently, in a situation where I had to hold onto large collections of custom objects, where the size of the collections was one of the applications limiting factors. If that's your situation, a few suggestions -
there are a few implementations of
collections using primitives (list
here). Played around a bit with
trove4j, and found a somewhat smaller
memory footprint (as long as you're
dealing with primitives, of course).
If you're dealing with large
collections, you'll probably get more
bang for your buck, in terms of
reducing memory footprint, by
optimizing the objects you're
holding. After all, you've got a lot
more of them, otherwise you wouldn't
need a collection, right?
Some collections are naturally smaller (e.g. LinkedList will be a bit smaller than an ArrayList) but the difference will probably be swamped by the differences in how they're used)
Most of the java collections can be manually sized - you can set your arraylist of 100 elements to be initialized to 100 elements, and you can set your maps to keep less open space at the cost of slower performance. All in the javadocs.
Ultimately the simplest thing to do is to test for yourself.

You're not comparing like with like: HashMap and Hashtable implement the Map interface, and ArrayList implements the List interface.
In a direct comparison between Hashtable and HashMap, HashMap will probably offer better performance because Hashtable is synchronized.
If you give some indication about what you're using the collections for, you might get a more insightful answer.

Hashtables (be it HashMap or HashTable) would take a little more memory than what they use to actually store the information.
Hashing performance comes at a price.

A java.util.Collection stores references to objects.
A java.util.Map stores references to Map.Entry instances, which hold references to keys and objects.
So a java.util.Collection holding N references to objects will require less memory than a java.util.Map holding onto the same N references, because the Map has to point to the keys as well.
Performance for reading and writing differs depending on the implementation of each of these interfaces.
I don't see any java.util.Collection analogous to WeakHashMap. I'd read about that class if you're worried about garbage collection and memory leaks.

As others have pointed out, your question is underspecified.
Still, sometimes an ArrayList based implementation can replace a HashMap based implementation (I would not consider Hashtable at all, it's obsolete). You might need to search the ArrayList linearly but for small Lists that might still be fast enough and the ArrayList will need less memory (for the same data), because it has less overhead.

In most languages it depends on how good you are at picking up your toys after you're done with them.
In Java, it doesn't matter since garbage collection is done automatically and you don't really need to worry about memory leaks.

Related

Lightweight Map implementation Java (little memory overhead)

I am currently writing some code in java meant to be a little framework for a project which revolves around a database with some billions of entries. I want to keep it high-level and the data retriueved from the database shoud be easily usable for statistic inference. I am resolved to use the Map interface in this project.
a core concept is mapping the attributes ("columns in the database") to values ("cells") when handling single datasets (with which I mean a columns in the database) for readable code: I use enum objects (named "Attribute") for the attribute types, which means mapping <Attribute, String>, because the data elements are all String (also not very large, maximum 40 characters or so).
There are 15 columns, so there are 15 enums, and the maps will have only so much entries, or less.
So it appears, I will be having a very large number of Map objects floating around, at times, but with comparatively little payload (15-). My goal is to not make the memory explode due to the implementation memory overhead, compared to the actual payload. (Stretch goal: do the same with cpu usage ;] )
I was not really familiar with all the different implementations of Java Collections to date, and when the problem dawned at me today, I looked into my to-date all-time favorite 'HashMap', and was not happy with how much memory overhead there was declared. I am sure, that additonal to the standard implementations, there are a number of implementations not shipped with Java. Googling my case brought not up much of a result, So I am asking you:
Do you know a good implementation of Map for my use case (low entry count, low value size, enumerable keys, ...)
I hope I made my use case clear, and am anxious for your input =)
Thanks a lot!
Stretch answer goal, absolutely optional and only if you got the time and knowledge:
What other implementations of collections are suitable for:
handling attribute (the String things) vectors, and matrices for inference data (counts/probabilities) (Matrices: here I am really clueless for now, Did really no serious math work with java to date)
math libraries for statistical inference, see above
Use EnumMap, this is the best map implementation if you have enums as key, for both performance and memory usage.
The trick is that this map implementation is the only one that that does not store the keys, it only needs a single array with the values (similar to an ArrayList of the values). There is only a little bit of overhead if there are keys that are not mapped to a value, but in most cases this won't be a problem because enums usually do not have too many instances.
Compared to HashMap, you additionally get a predictable iteration order for free.
Since you start off saying you want to store lots of data, eventually, you'll also want to access/modify that data. There are many high performance libraries out there.
Look at
Trove4j : https://bitbucket.org/robeden/trove/
HPPC: http://labs.carrotsearch.com/hppc.html
FastUtil: http://fastutil.di.unimi.it/
When you find a bottleneck, you can switch to using a lower level API (more efficient)
You'll many more choices if look a bit more: What is the most efficient Java Collections library?
EDIT: if your strings are not unique, you could save significant amounts of memory using String.intern() : Is it good practice to use java.lang.String.intern()?
You can squeeze out a bit of memory with a simple map implementation that uses two array lists (keys and values). For larger maps, that is going to mean insertion and look up speeds become much slower because you have to scan the entire list. However, for small maps it is actually faster this way since you don't have to calculate any hashcodes and only have to look at a small number of entries.
If you need an implementation, take a look at my SimpleMap in my jsonj project: https://github.com/jillesvangurp/jsonj/blob/master/src/main/java/com/github/jsonj/SimpleMap.java

Initializing ArrayList with a new operator in Java?

What is the best practice for initializing an ArrayList in Java?
If I initialize a ArrayList using the new operator then the ArrayList will by default have memory allocated for 10 buckets. Which is a performance hit.
I don't know, maybe I am wrong, but it seems to me that I should create a ArrayList by mentioning the size, if I am sure about the size!
Which is a performance hit.
I wouldn't worry about the "performance hit". Object creation in Java is very fast. The performance difference is unlikely to be measurable by you.
By all means use a size if you know it. If you don't, there's nothing to be done about it anyway.
The kind of thinking that you're doing here is called "premature optimization". Donald Knuth says it's the root of all evil.
A better approach is to make your code work before you make it fast. Optimize with data in hand that tells you where your code is slow. Don't guess - you're likely to be wrong. You'll find that you rarely know where the bottlenecks are.
If you know how many elements you will add, initialize the ArrayList with correct number of objects. If you don't, don't worry about it. The performance difference is probably insignificant.
This is the best advice I can give you:
Don't worry about it. Yes, you have several options to create an ArrayList, but using the new, the default option provided by the library, isn't a BAD choice, otherwise it'd be stupid to make it the default choice for everyone without clarifying what's better.
If it turns out that this is a problem, you'll quickly discover it when you profile. That's the proper place to find problems, when you profile your application for performance/memory problems. When you first write the code, you don't worry about this stuff -- that's premature optimization -- you just worry about writing good, clean code, with good design.
If your design is good, you should be able to fix this problem in no time, with little impact to the rest of the system. Effective Java 2nd Edition, Item 52: Refer to objects by their interfaces. You may even be able to switch to a LinkedList, or any other kind of List out there, if that turns out to be a better data structure. Design for this kinds of flexibility.
Finally, Effective Java 2nd Edition, Item 1: Consider static factory methods instead of constructors. You may even be able to combine this with Item 5: Avoid creating unnecessary objects, if in fact no new instances are actually needed (e.g. Integer.valueOf doesn't always create a new instance).
Related questions
Java Generics Syntax - in-depth about type inferring static factory methods (also in Guava)
On ArrayList micromanagement
Here are some specific tips if you need to micromanage an ArrayList:
You can use ArrayList(int initialCapacity) to set the initial capacity of a list. The list will automatically grow beyond this capacity if needed.
When you're about to populate/add to an ArrayList, and you know what the total number of elements will be, you can use ensureCapacity(int minCapacity) (or the constructor above directly) to reduce the number of intermediate growth. Each add will run in amortized constant time regardless of whether or not you do this (as guaranteed in the API), so this can only reduce the cost by a constant factor.
You can trimToSize() to minimize the storage usage.
This kind of micromanagement is generally unnecessary, but should you decide (justified by conclusive profiling results) that it's worth the hassle, you may choose to do so.
See also
Collections.singletonList - Returns an immutable list containing only the specified object.
If you already know the size of your ArrayList (approximately) you should use the constructor with capacity. But most of the time developers don't really know what will be in the List, so with a capacity of 10 it should be sufficient for most of the cases.
10 buckets is an approximation and isn't a performance hit unless you already know that your ArrayList contains tons of elements, in this case, the need to resize your array all the time will be the performance hit.
You don't need to tell initial size of ArrayList. You can always add/remove any element from it easily.
If this is a performance matter, please keep in mind following things :
Initialization of ArrayList is very fast. Don't worry about it.
Adding/removing element from ArrayList is also very fast. Don't worry about it.
If you find your code runs too slow. The first to blame is your algorithm, no offense. Machine specs, OS and Language indeed participate too. But their participation is considered insignificant compared to your algorithm participation.
If you don't know the size of theArrayList, then you're probably better off using a LinkedList, since the LinkedList.add() operation is constant speed.
However as most people here have said you should not worry about speed before you do some kind of profiling.
You can use this old, but good (in my opinion) article for reference.
http://chaoticjava.com/posts/linkedlist-vs-arraylist/
Since ArrayList is implemented by array underlying, we have to choose a initial size for the array.
If you really care you can call trimToSize() once you have constructed and populated the object. The javadoc states that the capacity will be at least as large as the list size. As previously stated, its unlikely you will find that the memory allocated to an ArrayList is a performance bottlekneck, and if it were, I would recommend you use an array instead.

How much memory does a hashtable use?

Would a hashtable/hashmap use a lot of memory if it only consists of object references and int's?
As for a school project we had to map a database to objects (that's what being done by orm/hibernate nowadays) but eager to find a good way not to store id's in objects in order to save them again we thought of putting all objects we created in a hashmap/hashtable, so we could easily retrieve it's ID. My question is if it would cost me performance using this, in my opinion more elegant way to solve this problem.
Would a hashtable/hashmap use a lot of
memory if it only consists of object
references and int's?
"a lot" depends on how many objects you have. For a few hundreds or a few thousands, you're not going to notice.
But typically the default Java collections are really incredibly inefficient when you're working with primitives (because of the constant boxing/unboxing from "primitive to wrapper" going on, like say "int to Integer") , both from a performances and memory standpoint (the two being related but not identical).
If you have a lot of entries, like hundreds of thousands or millions, I suggest using for example the Trove collections.
In your case, you'd use this:
TIntObjectHashMap<SomeJavaClass>
or this:
TObjectIntHashMap<SomeJavaClass>
In any case, that shall run around circle the default Java collections perf-wise and cpu-wise (and it shall trigger way less GC, etc.).
You're dodging the unnecessary automatic (un)boxing from/to int/Integer, the collections are creating way less garbage, resizing in a much smarter way, etc.
Don't even get me started on the default Java HashMap<Integer,Integer> compared to Trove's TIntIntHashMap or I'll go berzerk ;)
Minimally, you'd need an implementation of the Map.Entry interface with a reference to the key object and a reference to the value object. If either the the key or value are primitive types, such as int, you'll need a wrapper type (e.g. Integer) to wrap it as well. The Map.Entrys are stored in an array and allocated in blocks.
Take a look at this question for more information on how to measure your memory consumption in Java.
It's impossible to answer this without some figures. How many objects are you looking to store? Don't forget you're storing the objects already, so the key/object reference combination should be fairly small.
The only sensible thing to do is to try this and see if it works for you. Don't forget that the JVM will have a default maximum memory allocation and you can increase this (if you need) via -Xmx

Why not always use ArrayLists in Java, instead of plain ol' arrays?

Quick question here: why not ALWAYS use ArrayLists in Java? They apparently have equal access speed as arrays, in addition to extra useful functionality. I understand the limitation in that it cannot hold primitives, but this is easily mitigated by use of wrappers.
Plenty of projects do just use ArrayList or HashMap or whatever to handle all their collection needs. However, let me put one caveat on that. Whenever you are creating classes and using them throughout your code, if possible refer to the interfaces they implement rather than the concrete classes you are using to implement them.
For example, rather than this:
ArrayList insuranceClaims = new ArrayList();
do this:
List insuranceClaims = new ArrayList();
or even:
Collection insuranceClaims = new ArrayList();
If the rest of your code only knows it by the interface it implements (List or Collection) then swapping it out for another implementation becomes much easier down the road if you find you need a different one. I saw this happen just a month ago when I needed to swap out a regular HashMap for an implementation that would return the items to me in the same order I put them in when it came time to iterate over all of them. Fortunately just such a thing was available in the Jakarta Commons Collections and I just swapped out A for B with only a one line code change because both implemented Map.
If you need a collection of primitives, then an array may well be the best tool for the job. Boxing is a comparatively expensive operation. For a collection (not including maps) of primitives that will be used as primitives, I almost always use an array to avoid repeated boxing and unboxing.
I rarely worry about the performance difference between an array and an ArrayList, however. If a List will provide better, cleaner, more maintainable code, then I will always use a List (or Collection or Set, etc, as appropriate, but your question was about ArrayList) unless there is some compelling reason not to. Performance is rarely that compelling reason.
Using Collections almost always results in better code, in part because arrays don't play nice with generics, as Johannes Weiß already pointed out in a comment, but also because of so many other reasons:
Collections have a very rich API and a large variety of implementations that can (in most cases) be trivially swapped in and out for each other
A Collection can be trivially converted to an array, if occasional use of an array version is useful
Many Collections grow more gracefully than an array grows, which can be a performance concern
Collections work very well with generics, arrays fairly badly
As TofuBeer pointed out, array covariance is strange and can act in unexected ways that no object will act in. Collections handle covariance in expected ways.
arrays need to be manually sized to their task, and if an array is not full you need to keep track of that yourself. If an array needs to be resized, you have to do that yourself.
All of this together, I rarely use arrays and only a little more often use an ArrayList. However, I do use Lists very often (or just Collection or Set). My most frequent use of arrays is when the item being stored is a primitive and will be inserted and accessed and used as a primitive. If boxing and unboxing every become so fast that it becomes a trivial consideration, I may revisit this decision, but it is more convenient to work with something, to store it, in the form in which it is always referenced. (That is, 'int' instead of 'Integer'.)
This is a case of premature unoptimization :-). You should never do something because you think it will be better/faster/make you happier.
ArrayList has extra overhead, if you have no need of the extra features of ArrayList then it is wasteful to use an ArrayList.
Also for some of the things you can do with a List there is the Arrays class, which means that the ArrayList provided more functionality than Arrays is less true. Now using those might be slower than using an ArrayList, but it would have to be profiled to be sure.
You should never try to make something faster without being sure that it is slow to begin with... which would imply that you should go ahead and use ArrayList until you find out that they are a problem and slow the program down. However there should be common sense involved too - ArrayList has overhead, the overhead will be small but cumulative. It will not be easy to spot in a profiler, as all it is is a little overhead here, and a little overhead there. So common sense would say, unless you need the features of ArrayList you should not make use of it, unless you want to die by a thousands cuts (performance wise).
For internal code, if you find that you do need to change from arrays to ArrayList the chance is pretty straight forward in most cases ([i] becomes get(i), that will be 99% of the changes).
If you are using the for-each look (for( value : items) { }) then there is no code to change for that as well.
Also, going with what you said:
1) equal access speed, depending on your environment. For instance the Android VM doesn't inline methods (it is just a straight interpreter as far as I know) so the access on that will be much slower. There are other operations on an ArrayList that can cause slowdowns, depends on what you are doing, regardless of the VM (which could be faster with a stright array, again you would have to profile or examine the source to be sure).
2) Wrappers increase the amount of memory being used.
You should not worry about speed/memory before you profile something, on the other hand you shouldn't choose what you know to be a slower option unless you have a good reason to.
Performance should not be your primary concern.
Use List interface where possible, choose concrete implementation based on actual requirements (ArrayList for random access, LinkedList for structural modifications, ...).
You should be concerned about performance.
Use arrays, System.arraycopy, java.util.Arrays and other low-level stuff to squeeze out every last drop of performance.
Well don't always blindly use something that is not right for the job. Always start off using Lists, choose ArrayList as your implementation. This is a more OO approach. If you don't know that you specifically need an array, you'll find that not tying yourself to a particular implementation of List will be much better for you in the long run. Get it working first, optimize later.

Using Small (1-10 Items) Instance-Level Collections in Java

While creating classes in Java I often find myself creating instance-level collections that I know ahead of time will be very small - less than 10 items in the collection. But I don't know the number of items ahead of time so I typically opt for a dynamic collection (ArrayList, Vector, etc).
class Foo
{
ArrayList<Bar> bars = new ArrayList<Bar>(10);
}
A part of me keeps nagging at me that it's wasteful to use complex dynamic collections for something this small in size. Is there a better way of implementing something like this? Or is this the norm?
Note, I'm not hit with any (noticeable) performance penalties or anything like that. This is just me wondering if there isn't a better way to do things.
The ArrayList class in Java has only two data members, a reference to an Object[] array and a size—which you need anyway if you don't use an ArrayList. So the only advantage to not using an ArrayList is saving one object allocation, which is unlikely ever to be a big deal.
If you're creating and disposing of many, many instances of your container class (and by extension your ArrayList instance) every second, you might have a slight problem with garbage collection churn—but that's something to worry about if it ever occurs. Garbage collection is typically the least of your worries.
For the sake of keeping things simple, I think this is pretty much a non-issue. Your implementation is flexible enough that if the requirements change in the future, you aren't forced into a refactoring. Also, adding more logic to your code for a hybrid solution just isn't worth it taking into account your small data set and the high-quality of Java's Collection API.
Google Collections has collections optimized for immutable/small number of elements. See Lists.asList API as an example.
The overhead is very small. It is possible to write a hybrid array list that has fields for the first few items, and then falls back to using an array for longer list.
You can avoid the overhead of the list object entirely by using an array. To go even further hardcore, you can declare the field as Object, and avoid the array altogether for a single item.
If memory really is a problem, you might want to forget about using object instances at the low-level. Instead use a larger data structure at a larger level of granularity.

Categories