Basically i have an enum with an id.
Would it be faster to create a hashmap that would take the id and give you the enum or iterate over all the enums testing if the id provided is equal to the id of the enum, and if so returning it.
If it matters then there are 5 enums.
HashMaps are specifically designed for retrieving objects via a key, and are fast even with many entries. Usually, a sequential scan of a list or similar collection would be much slower.
But if you have only 5 items, there's no real difference.
Edit: on second thoughts, with so few objects you might be better off with a sequential scan because the extra work in calculating the hash codes might outweigh the advantage. But the difference is so small it's not worth bothering about.
A hashmap has a lookup complexity of O(1), while iterating naturally has O(n). That said, for only 5 enum values the iteration will probably be faster on average, and it will not need an extra data structure when you just iterate over .values() anyway.
If you have an Enum as a key, you should use an EnumMap. This basically wraps an array of values and is faster than using a HashMap.
Related
Case 1 :
One HashMap with 1,00,000 entries
Case 2 :
Two HashMaps with 50,000 entries each.
Which of the above cases will take more execution time and more memory? Or is there a significant difference between the two?
Is it feasible to replace one HashMap of large number of entries with two HashMaps of lesser number of entries?
You're better off with a single hash map.
Look-ups are very efficient in a hash map, and they're designed to have lots of elements in. It'll be slower overall if you have to put something in place to search one map and then look in the other one if you don't find it in the first one.
(There won't be much difference in memory usage either way.)
If it's currently too slow, have a check that your .hashCode() and .equals() are not inefficient.
The memory requirements should be similar for the two cases (since the HashMap storage is an array of Entries whose size is the capacity of the map, so two arrays of 50K would take the same space as one array of 100K).
The get() and put() methods should also have similar performance in both cases, since the calculation of the hash code of the key and the matching index is not affected by the size of the map. The only thing that may affect the performance of these operations is the
average number of keys mapped to the same index, and that should also be independent of the size of the map, since as the size grows, the Entries array grows, so the average number of Entries per index should stay the same.
On the other hand, if you use two smaller maps, you have to add logic to decide which map to use. You can try to search for a key in the first map and if not found, search in the second map. Or you can have a criteria that determines which map is used (for example, String keys starting in A to M will be stored in the first map, and the rest in the second map). Therefore, each operation on the map will have an additional overhead.
Therefore, I would suggest using a single HashMap.
The performance difference between using one or two HashMaps should not matter much, so go for the simpler solution, the single HashMap.
But since you ask this question, I assume that you have performance problems. Often, using a HashMap as a cache is a bad idea, because it keeps a reference to the cached objects alive, thus basically disabling garbage collection. Consider redesigning your cache, for example using SoftReferences (a class in the standard Java API), which allows the garbage collector to collect your cached objects while still being able to reuse the objects as long a they are not garbage collected yet.
as everyone mentioned, you should be using one single hash map. if you having trouble with 100k entries then there is a major problem with your code.
here is some heads up on hash map:
Don't use too complex objects as key (in my opinion using object string as key is as far as you should go for HashMap.
if you try to use some complex object as key make sure your equals and hashCode method are as efficient as possible as too much calculation within these method can reduce the efficiency of hashmap greatly
I have a list of module which has two fields: time and size and I need to find one module according to given time/size.
I have two solutions:
for(Module module: myModuleList)
I create a Map and use Map.get().
and I wonder which would be faster or consume less ressources? because this manipulation would be raise periodically with a more and more large module list.
Iterating through the list is O(myModuleList.size()) whereas using Map.get() is O(1) for a HashMap or O(log(myModuleList.size())) for a TreeMap. So, if you are optimizing for performance, then using a HashMap would be the best solution. In almost all cases, you should reach for a HashMap by default unless you also need to iterate over the elements in key order, in which case it would make sense to use a TreeMap. So, the short answer is that you should use a HashMap.
For a very large number of elements (which is not the case that you are describing), it's possible to improve the space usage of the HashMap (albeit at the expense of speed) by tuning the load factor and initial capacity (though you will not need to do this in ordinary usages). There are also alternative map datastructures that give different tradeoffs in performance and space, depending on your needs.
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.
Depends on your data;
HashSet gives you an O(1) contains() method but doesn't preserve order.
ArrayList contains() is O(n) but you can control the order of the entries.
Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.
The choice greatly depends on what do you want to do with it.
If it is what mentioned in your question:
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.
theoretically, and as SCJP6 Study guide says :D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)
It looks like you will want an HashMap that maps id's to counts. Particularly,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.
This question already has answers here:
Difference between HashMap, LinkedHashMap and TreeMap
(17 answers)
What is the difference between a HashMap and a TreeMap? [duplicate]
(8 answers)
Closed 8 years ago.
When to use hashmaps or treemaps?
I know that I can use TreeMap to iterate over the elements when I need them to be sorted.
But is just that? There is no optimization when I just want to consult the maps, or some optimal specific uses?
TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately.
Unless you need the entries to be sorted, I'd stick with HashMap. Or there's ConcurrentHashMap of course. I can't remember the details of the differences between all of them, but HashMap is a perfectly reasonable "default" option :)
For completeness, I should point out that there was a discussion on Stack Overflow a month or so ago about the internals of various maps. See the comments in this question, which I will copy into this answer if bestsss is happy for me to do so.
Hashtables (usually) perform search operations (look up) bounded within the complexity of O(n)<=T(n)<=O(1), with an average case complexity of O(1 + n/k); however, binary search trees, (BST's), perform search operations (lookup) bounded within the complexity of O(n)<=T(n)<=O(log_2(n)), with an average case complexity of O(log_2(n)). The implementation for each (and every) data structure should be known (by you), to understand the advantages, drawbacks, time complexity of operations, and code complexity.
For example, the number of entries in a hashtable often have some fixed number of entries (some part of which may not be filled at all) with lists of collisions. Trees, on the other hand, usually have two pointers (references) per node, but this can be more if the implementation allows more than two child nodes per node, and this allows the tree to grow as nodes are added, but may not allow duplicates. (The default implementation of a Java TreeMap does not allow for duplicates)
There are special cases to consider as well, for example, what if the number of elements in a particular data structure increases without bound or approaches the limit of an underlying part of the data structure? What about amortized operations that perform some rebalancing or cleanup operation?
For example, in a hashtable, when the number of elements in the table become sufficiently large, and arbitrary number of collisions can occur. On the other hand, trees usually require come re-balancing procedure after an insertion (or deletion).
So, if you have something like a cache (Ex. the number of elements in bounded, or size is known) then a hashtable is probably your best bet; however, if you have something more like a dictionary (Ex. populated once and looked up many times) then I'd use a tree.
This is only in the general case, however, (no information was given). You have to understand process that happen how they happen to make the right choice in deciding which data structure to use.
When I need a multi-map (ranged lookup) or sorted flattening of a collection, then it can't be a hashtable.
The largest difference between the two is the underlying structure used in the implementation.
HashMaps use an array and a hashing function to store elements. When you try to insert or delete an item in the array the hashing function converts the key into an index on the array where the object is/should be stored (ignoring conflicts). While hashmaps are generally very fast because they don't need to iterate over large amounts of data, they slow down when they're filled because they need to copy all the key/values into a new array.
TreeMaps store a the data in a sorted tree structure. While this means that they'll never have to allocate more space and copy over to it, operations require that part of the data already stored be iterated over. Sometimes changing large amounts of the structure.
Out of the two Hashmaps will generally have better performance when you don't need sorting.
Inserting new elements into a HashMap will, on average, be a good deal faster than inserting elements into a TreeMap. Unless you need your elements sorted, I'd go with the HashMap.
Don't forget there is also LinkedHashMap which is nearly as fast as HashMap for add/contains/remove operations but also maintains the insertion order.
Someone knows a nice solution for EnumSet + List
I mean I need to store enum values and I also need to preserve the order , and to be able to access its index of the enum value in the collection in O(1) time.
The closest thing I can come to think of, present in the API is the LinkedHashSet:
From http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashSet.html:
Hash table and linked list implementation of the Set interface, with predictable iteration order.
I doubt it's possible to do what you want. Basically, you want to look up indexes in constant time, even after modifying the order of the list. Unless you allow remove / reorder operations to take O(n) time, I believe you can't get away with lower than O(log n) (which can be achieved by a heap structure).
The only way I can see to satisfy ordering and O(1) access is to duplicate the data in a List and an array of indexes (wrapped in a nice little OrderedEnumSet, of course).