I have a list of module which has two fields: time and size and I need to find one module according to given time/size.
I have two solutions:
for(Module module: myModuleList)
I create a Map and use Map.get().
and I wonder which would be faster or consume less ressources? because this manipulation would be raise periodically with a more and more large module list.
Iterating through the list is O(myModuleList.size()) whereas using Map.get() is O(1) for a HashMap or O(log(myModuleList.size())) for a TreeMap. So, if you are optimizing for performance, then using a HashMap would be the best solution. In almost all cases, you should reach for a HashMap by default unless you also need to iterate over the elements in key order, in which case it would make sense to use a TreeMap. So, the short answer is that you should use a HashMap.
For a very large number of elements (which is not the case that you are describing), it's possible to improve the space usage of the HashMap (albeit at the expense of speed) by tuning the load factor and initial capacity (though you will not need to do this in ordinary usages). There are also alternative map datastructures that give different tradeoffs in performance and space, depending on your needs.
Related
Case 1 :
One HashMap with 1,00,000 entries
Case 2 :
Two HashMaps with 50,000 entries each.
Which of the above cases will take more execution time and more memory? Or is there a significant difference between the two?
Is it feasible to replace one HashMap of large number of entries with two HashMaps of lesser number of entries?
You're better off with a single hash map.
Look-ups are very efficient in a hash map, and they're designed to have lots of elements in. It'll be slower overall if you have to put something in place to search one map and then look in the other one if you don't find it in the first one.
(There won't be much difference in memory usage either way.)
If it's currently too slow, have a check that your .hashCode() and .equals() are not inefficient.
The memory requirements should be similar for the two cases (since the HashMap storage is an array of Entries whose size is the capacity of the map, so two arrays of 50K would take the same space as one array of 100K).
The get() and put() methods should also have similar performance in both cases, since the calculation of the hash code of the key and the matching index is not affected by the size of the map. The only thing that may affect the performance of these operations is the
average number of keys mapped to the same index, and that should also be independent of the size of the map, since as the size grows, the Entries array grows, so the average number of Entries per index should stay the same.
On the other hand, if you use two smaller maps, you have to add logic to decide which map to use. You can try to search for a key in the first map and if not found, search in the second map. Or you can have a criteria that determines which map is used (for example, String keys starting in A to M will be stored in the first map, and the rest in the second map). Therefore, each operation on the map will have an additional overhead.
Therefore, I would suggest using a single HashMap.
The performance difference between using one or two HashMaps should not matter much, so go for the simpler solution, the single HashMap.
But since you ask this question, I assume that you have performance problems. Often, using a HashMap as a cache is a bad idea, because it keeps a reference to the cached objects alive, thus basically disabling garbage collection. Consider redesigning your cache, for example using SoftReferences (a class in the standard Java API), which allows the garbage collector to collect your cached objects while still being able to reuse the objects as long a they are not garbage collected yet.
as everyone mentioned, you should be using one single hash map. if you having trouble with 100k entries then there is a major problem with your code.
here is some heads up on hash map:
Don't use too complex objects as key (in my opinion using object string as key is as far as you should go for HashMap.
if you try to use some complex object as key make sure your equals and hashCode method are as efficient as possible as too much calculation within these method can reduce the efficiency of hashmap greatly
I have two classes Foo and Bar.
class Foo
{
Set<Integer> bars; // Foo objects have collection of bars.
Set<Integer> adjacents; // Adjacency list of Foos.
}
class Bar
{
int foo; // ID of foo of which this object belongs to
Ipsum ipsum; // This an arbitrary class. But it must be present
Map<Integer, Float> adjacents; // Adjacency list of Bars
}
Number of Bars are predefined (up to 1000). Hence, I may use an array.
But number of Foos are undefined (at most #ofBars/4).
When you consider addition, deletion and get(), I need the one which is faster and takes less space (because I'm going to use serialization).
Here are my options (as far as I have thought)
Option 1: Don't define a class for Foo. Instead, use List<Set<Integer>> foo; and another map for Map> fooAdjacencies;
Option 2: Use Map<Integer, Set<Integer> foo if I want to get bars of i, I simply write foo.get(i).
Option 3: Dont define classes. Instead, use option 2 and for Bar class:
Map<Integer, Ipsum> bar;
Map<Integer, Map<Integer, Floar>> barAdjacencies;
Which option should I choose in terms of space and time efficiency?
This sounds like it'd be very helpful for you (specifically the Data Structures section): http://bigocheatsheet.com/
You say
I need my structure to be efficient while adding, removing and finding elements. No other behavior.
The problem is that Lists and Maps are usually used in totally different cases. Their names describe their use cases fairly well -- you use a List if you need to list something (probably in some sequential order), while a Map would be used if you need to map an input to an output. You can use a Map as a List by mapping Integers to your elements, but that's overcomplicating things a bit. However, even within List and Map you can have different implementations that differ wildly in asymptotic performance.
With few exceptions, data structures will take O(n) space, which makes sense. If memory serves, anything other than an ArrayList (or other collections backed only by a primitive array) will have a decent amount of space overhead as they use other objects (e.g. Nodes for LinkedLists and Entry objects for Maps) to organize the underlying structure. I wouldn't worry too much about this overhead though unless space really is at a premium.
For best-performance addition, deletion, and search, you want to look at how the data structure is implemented.
LinkedList-style implementation will net you O(1) addition and deletion (and with a good constant factor, too!), but will have a pretty expensive get() with O(n) time, because the list will have to be traversed every time you want to get something. Java's LinkedList implementation, though, removes in O(n) time; while the actual act of deletion is O(1), that's only if you have a reference to the actual node that you're removing. Because you don't, removals in Java's LinkedList are O(n) -- O(n) for searching for the node to remove, and O(1) for removal.
Data structures backed with a plain array will have O(1) get() because it's an array, but takes O(n) to add, and delete, because any addition/deletion other than at the last element requires all other elements to be shuffled (in Java's implementation at least). Searching for something using an object instead of an index is done in O(n) time because you have to iterate over the array to find the object.
The following two structures are usually Maps, and so usually require you to implement equals() (and hashCode() for HashMaps):
Data structures backed by a tree (e.g. TreeMap) will have amortized (I think) O(lg n) add/remove, as a good implementation should be self-balancing, making worst-case addition/deletions only have to go through the height of the tree at most. get() operations are O(lg n). Using a tree requires that your elements be sortable/comparable in some way, which could be a bonus or hinderance, depending on your usage.
Hash-based data structures have amortized (average) O(1) everything, albeit with a slightly higher constant factor due to the overhead of hashing (and following any chains if the hash spread is poor). HashMaps could start sucking if you write a bad hashCode() function, though, so you want to be careful with that, although the implementers of Java's HashMap did do some magic behind the scenes to try to at least partially negate the effect of bad hashCode() implementations.
Hope that rundown helped. If you clear up how your program is structured, I might be able to give a recommendation. Until then, the best I can do is show you the options and let you pick.
I find this problem description a little hard to follow, but I think you're just looking for general collections/data structures advice.
A list (say, an array list) easily allows you to add and iterate over elements. When it is expanded beyond the size of the underlying array, a one-off costly resize operation is executed to add more space; but that is fine because it happens rarely and the amortized time is not bad. Searching for a specific element in a list is slow because you need to traverse it in order; there is no implied ordering in most lists. Deleting elements depends on the underlying list implementation. An array list could be slow in this regard; but I'm guessing that they optimized it just by marking the underlying element as deleted and skipping it during iteration. When using lists you also have to consider where you are adding elements. Linked lists are slower to iterate but can easily add and remove elements at any position. Array lists cannot easily add an element anywhere but the end.
Per your requirements, if you are required to execute a "get" or find on an element, then you need some kind of searching functionality to speed it up. This would make a map better as you can locate elements in log(n) time instead of linear time as when searching an unordered list. Adding and removing elements in a list is also relatively fast, so that's probably your best option.
Most importantly, implement it more than one way and profile it yourself to learn more :) Lists are rarely a good choice when searching is required though.
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.
Depends on your data;
HashSet gives you an O(1) contains() method but doesn't preserve order.
ArrayList contains() is O(n) but you can control the order of the entries.
Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.
The choice greatly depends on what do you want to do with it.
If it is what mentioned in your question:
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.
theoretically, and as SCJP6 Study guide says :D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)
It looks like you will want an HashMap that maps id's to counts. Particularly,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.
Basically i have an enum with an id.
Would it be faster to create a hashmap that would take the id and give you the enum or iterate over all the enums testing if the id provided is equal to the id of the enum, and if so returning it.
If it matters then there are 5 enums.
HashMaps are specifically designed for retrieving objects via a key, and are fast even with many entries. Usually, a sequential scan of a list or similar collection would be much slower.
But if you have only 5 items, there's no real difference.
Edit: on second thoughts, with so few objects you might be better off with a sequential scan because the extra work in calculating the hash codes might outweigh the advantage. But the difference is so small it's not worth bothering about.
A hashmap has a lookup complexity of O(1), while iterating naturally has O(n). That said, for only 5 enum values the iteration will probably be faster on average, and it will not need an extra data structure when you just iterate over .values() anyway.
If you have an Enum as a key, you should use an EnumMap. This basically wraps an array of values and is faster than using a HashMap.
This question already has answers here:
Difference between HashMap, LinkedHashMap and TreeMap
(17 answers)
What is the difference between a HashMap and a TreeMap? [duplicate]
(8 answers)
Closed 8 years ago.
When to use hashmaps or treemaps?
I know that I can use TreeMap to iterate over the elements when I need them to be sorted.
But is just that? There is no optimization when I just want to consult the maps, or some optimal specific uses?
TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately.
Unless you need the entries to be sorted, I'd stick with HashMap. Or there's ConcurrentHashMap of course. I can't remember the details of the differences between all of them, but HashMap is a perfectly reasonable "default" option :)
For completeness, I should point out that there was a discussion on Stack Overflow a month or so ago about the internals of various maps. See the comments in this question, which I will copy into this answer if bestsss is happy for me to do so.
Hashtables (usually) perform search operations (look up) bounded within the complexity of O(n)<=T(n)<=O(1), with an average case complexity of O(1 + n/k); however, binary search trees, (BST's), perform search operations (lookup) bounded within the complexity of O(n)<=T(n)<=O(log_2(n)), with an average case complexity of O(log_2(n)). The implementation for each (and every) data structure should be known (by you), to understand the advantages, drawbacks, time complexity of operations, and code complexity.
For example, the number of entries in a hashtable often have some fixed number of entries (some part of which may not be filled at all) with lists of collisions. Trees, on the other hand, usually have two pointers (references) per node, but this can be more if the implementation allows more than two child nodes per node, and this allows the tree to grow as nodes are added, but may not allow duplicates. (The default implementation of a Java TreeMap does not allow for duplicates)
There are special cases to consider as well, for example, what if the number of elements in a particular data structure increases without bound or approaches the limit of an underlying part of the data structure? What about amortized operations that perform some rebalancing or cleanup operation?
For example, in a hashtable, when the number of elements in the table become sufficiently large, and arbitrary number of collisions can occur. On the other hand, trees usually require come re-balancing procedure after an insertion (or deletion).
So, if you have something like a cache (Ex. the number of elements in bounded, or size is known) then a hashtable is probably your best bet; however, if you have something more like a dictionary (Ex. populated once and looked up many times) then I'd use a tree.
This is only in the general case, however, (no information was given). You have to understand process that happen how they happen to make the right choice in deciding which data structure to use.
When I need a multi-map (ranged lookup) or sorted flattening of a collection, then it can't be a hashtable.
The largest difference between the two is the underlying structure used in the implementation.
HashMaps use an array and a hashing function to store elements. When you try to insert or delete an item in the array the hashing function converts the key into an index on the array where the object is/should be stored (ignoring conflicts). While hashmaps are generally very fast because they don't need to iterate over large amounts of data, they slow down when they're filled because they need to copy all the key/values into a new array.
TreeMaps store a the data in a sorted tree structure. While this means that they'll never have to allocate more space and copy over to it, operations require that part of the data already stored be iterated over. Sometimes changing large amounts of the structure.
Out of the two Hashmaps will generally have better performance when you don't need sorting.
Inserting new elements into a HashMap will, on average, be a good deal faster than inserting elements into a TreeMap. Unless you need your elements sorted, I'd go with the HashMap.
Don't forget there is also LinkedHashMap which is nearly as fast as HashMap for add/contains/remove operations but also maintains the insertion order.