In my Code I maintain a data heap whose basic component is a Map.The Map would be somewhat like:
Key -TableName\Field\Attribute1
Value-3
I used to retrieve my value by :
map.get(key)
Now I required a List instead value.The Map would be somewhat like:
Key -TableName\Field\Attribute1
Value-[3,30,300]
Now I need to retrieve my value by :
map.get(key).get(index)
How much would this change affect the performance of my Code?
I don't think there exist single answer to your question, its broad and you can consider all the answers. Basically, you need to understand time complexity of HashMap and ArrayList and that would give you an idea on how your code is going to perform.
The following image (source) gives you an idea of time complexity O(1), O(log n), O(n), ...
Now Hashmap have the following time complexity (see here for more details on how hash map work in java)
get() and put() - usually O(1) but O(n) worst case scenario
Best case scenario, get() and put() have O(1) cost in time complexity. If you have a inefficient hash function then the data is not distributed correctly across buckets hence, you might end with slow get() and put() methods. An efficient hash function distributes the data in all buckets in a balanced manner, if there is no balance between the buckets containing entries then you will have slower get() and put(). See here and here for details on how to design your hash function.
Note the HashMap performance improvement in Java 8.
ArrayList on the other hand have the following time complexity:
add() - O(1)
remove() - O(n)
get() - O(1)
contains() - O(n) (traversal)
Basically you perform O(1) (O (n) worst case) to get the list and then you are doing another O(1) to get the item from list. So both operations are in constant time provided your hash function is efficient.
As mentioned in other answers, there are a variety of other ways to improve the performance of your code by using array instead of list if possible, and so on.
Although it is nearly impossible to tell the impact on performance of code without seeing any code, performance overhead of storing a list instead of storing a single Integer could be relatively small: in both cases you would end up paying for unboxing the int from Integer and for the has look-up on the key, so the only additional operation is the get(index) on the list.
If the list inside the map is an ArrayList, the operation is fast O(1) lookup
If the list inside the map is a LinkedList, the operation is O(n), where n is the number of elements on the individual list.
It's worth noting that if the list has a small fixed size (say, three or four elements, as shown in your example) you may be better off defining a custom class for the four elements, and make these elements ints. This would save memory and reduce the overhead.
For lists of fixed large size you may want to consider int[] arrays, because they let you avoid boxing for a reduction in memory overhead.
best case for Lookup of hashmap will be O(1) and worst case will depend on hash collision and JDK8 has some nice improvement to handle that scenario, so from map perspective lookup cost will be same no matter if you put single value or list of values associated to key.
cost for look up on list of value by index depends on type of list you are using , if it is array based(i.e ArrayList) then it is constant but it is linked list then cost is O(N).
So it is really choosing why type of list you want to put depending on your performance goal.
Related
Of my knowledge, there are the following implementations:
ArrayList
LinkedList
Vector
Stack
(based on http://tutorials.jenkov.com/java-collections/list.html pls correct if wrong)
ArrayList is a dynamic array implementation, so, as array, get is O(1), LinkedList has O(1) for get from Head, Vector and Stack are based on ArrayList, hence, O(1).
So in EVERY case get(0) on any built-in (cause you can make your own, for a specific purpose on making get(0) TS of O(n!)) implementation of List if O(1)?
Is get(0) on java.util.List always O(1)?
Let us assume that there is a parameter N which stands for the length of the list1.
For the 4 implementations of List that you mentioned, get(0) is indeed an O(1) operation:
ArrayList, Vector and Stack all implement get(i) using array subscripting and that is an O(1) operation.
LinkedList.get(i) involves i link traversals which is O(i). But if i is a constant, that reduces to O(1).
However there are other "built in" implementations of List. Indeed, there are a considerable number of them if you include the various non-public implementations, such as the List classes that implement sublists, unmodifiable lists, and so on. Generalizing from those 4 to "all of them" is not sound2.
But get(0) won't be O(1) for all possible implementations of List.
Consider a simple linked list where the elements are chained in the reverse order. Since get(0) needs to traverse to the end of the list, which is N link traversals: O(N).
Consider a list that is fully populated from the rows in a database query's result set the first time that you attempt to retrieve a list element. The first get call will be at least O(N) because you are fetching N rows. (It could be worse than O(N) if the database query is not O(N).) So the worst case complexity for any call to get is O(N) ... or worse.
Indeed, with a some ingenuity, one could invent a custom list where get(0) has any Big-O complexity that you care to propose.
1 - I am being deliberately vague here. On the one hand, we need to identify a variable N denoting the "problem" size for complexity analysis to make sense. (The length of the list is the obvious choice.) On the other hand, the length of a List is a surprisingly "rubbery" concept when you consider all of the possible ways to implement the interface.
2 - I assume that you are asking this question because you want to write some library code that relies on List.get(0) being O(1). Since you can't prevent someone from using your library with a non-builtin list implementation, your "assume it is builtin" constraint in your question doesn't really help ... even if we could check all possible (past, current or future) builtin List implementations for you.
Ignoring custom implementations, and only looking at built-in implementations, like suggested at the end of the question, you still cannot say that get(0) will be O(1) regardless of list size.
As an example, calling get(0) on a sublist based on a LinkedList will be O(n):
List<Integer> list = new LinkedList<>(Arrays.asList(1,2,3,4,5,6,7,8,9));
List<Integer> subList = list.subList(4, 8);
Integer num = subList.get(0); // <===== O(n), not O(1)
In that code, subList.get(0) internally calls list.get(4), which has O(n) time complexity.
Yes, for all implementations of List you mentioned get(0) is O(1).
I have two classes Foo and Bar.
class Foo
{
Set<Integer> bars; // Foo objects have collection of bars.
Set<Integer> adjacents; // Adjacency list of Foos.
}
class Bar
{
int foo; // ID of foo of which this object belongs to
Ipsum ipsum; // This an arbitrary class. But it must be present
Map<Integer, Float> adjacents; // Adjacency list of Bars
}
Number of Bars are predefined (up to 1000). Hence, I may use an array.
But number of Foos are undefined (at most #ofBars/4).
When you consider addition, deletion and get(), I need the one which is faster and takes less space (because I'm going to use serialization).
Here are my options (as far as I have thought)
Option 1: Don't define a class for Foo. Instead, use List<Set<Integer>> foo; and another map for Map> fooAdjacencies;
Option 2: Use Map<Integer, Set<Integer> foo if I want to get bars of i, I simply write foo.get(i).
Option 3: Dont define classes. Instead, use option 2 and for Bar class:
Map<Integer, Ipsum> bar;
Map<Integer, Map<Integer, Floar>> barAdjacencies;
Which option should I choose in terms of space and time efficiency?
This sounds like it'd be very helpful for you (specifically the Data Structures section): http://bigocheatsheet.com/
You say
I need my structure to be efficient while adding, removing and finding elements. No other behavior.
The problem is that Lists and Maps are usually used in totally different cases. Their names describe their use cases fairly well -- you use a List if you need to list something (probably in some sequential order), while a Map would be used if you need to map an input to an output. You can use a Map as a List by mapping Integers to your elements, but that's overcomplicating things a bit. However, even within List and Map you can have different implementations that differ wildly in asymptotic performance.
With few exceptions, data structures will take O(n) space, which makes sense. If memory serves, anything other than an ArrayList (or other collections backed only by a primitive array) will have a decent amount of space overhead as they use other objects (e.g. Nodes for LinkedLists and Entry objects for Maps) to organize the underlying structure. I wouldn't worry too much about this overhead though unless space really is at a premium.
For best-performance addition, deletion, and search, you want to look at how the data structure is implemented.
LinkedList-style implementation will net you O(1) addition and deletion (and with a good constant factor, too!), but will have a pretty expensive get() with O(n) time, because the list will have to be traversed every time you want to get something. Java's LinkedList implementation, though, removes in O(n) time; while the actual act of deletion is O(1), that's only if you have a reference to the actual node that you're removing. Because you don't, removals in Java's LinkedList are O(n) -- O(n) for searching for the node to remove, and O(1) for removal.
Data structures backed with a plain array will have O(1) get() because it's an array, but takes O(n) to add, and delete, because any addition/deletion other than at the last element requires all other elements to be shuffled (in Java's implementation at least). Searching for something using an object instead of an index is done in O(n) time because you have to iterate over the array to find the object.
The following two structures are usually Maps, and so usually require you to implement equals() (and hashCode() for HashMaps):
Data structures backed by a tree (e.g. TreeMap) will have amortized (I think) O(lg n) add/remove, as a good implementation should be self-balancing, making worst-case addition/deletions only have to go through the height of the tree at most. get() operations are O(lg n). Using a tree requires that your elements be sortable/comparable in some way, which could be a bonus or hinderance, depending on your usage.
Hash-based data structures have amortized (average) O(1) everything, albeit with a slightly higher constant factor due to the overhead of hashing (and following any chains if the hash spread is poor). HashMaps could start sucking if you write a bad hashCode() function, though, so you want to be careful with that, although the implementers of Java's HashMap did do some magic behind the scenes to try to at least partially negate the effect of bad hashCode() implementations.
Hope that rundown helped. If you clear up how your program is structured, I might be able to give a recommendation. Until then, the best I can do is show you the options and let you pick.
I find this problem description a little hard to follow, but I think you're just looking for general collections/data structures advice.
A list (say, an array list) easily allows you to add and iterate over elements. When it is expanded beyond the size of the underlying array, a one-off costly resize operation is executed to add more space; but that is fine because it happens rarely and the amortized time is not bad. Searching for a specific element in a list is slow because you need to traverse it in order; there is no implied ordering in most lists. Deleting elements depends on the underlying list implementation. An array list could be slow in this regard; but I'm guessing that they optimized it just by marking the underlying element as deleted and skipping it during iteration. When using lists you also have to consider where you are adding elements. Linked lists are slower to iterate but can easily add and remove elements at any position. Array lists cannot easily add an element anywhere but the end.
Per your requirements, if you are required to execute a "get" or find on an element, then you need some kind of searching functionality to speed it up. This would make a map better as you can locate elements in log(n) time instead of linear time as when searching an unordered list. Adding and removing elements in a list is also relatively fast, so that's probably your best option.
Most importantly, implement it more than one way and profile it yourself to learn more :) Lists are rarely a good choice when searching is required though.
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.
Depends on your data;
HashSet gives you an O(1) contains() method but doesn't preserve order.
ArrayList contains() is O(n) but you can control the order of the entries.
Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.
The choice greatly depends on what do you want to do with it.
If it is what mentioned in your question:
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.
theoretically, and as SCJP6 Study guide says :D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)
It looks like you will want an HashMap that maps id's to counts. Particularly,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.
I have a web application that uses ArrayList's extensively to store and operate on data within itself. However,recently I understood that HashMap's may have been a better choice. Could anyone tell me what exactly is the algorithmic cost(Big O(n)) of adding, accessing and removing an element from both and whether it is wise to go into the code and change them for the sake of efficency?
For ArrayList:
The size, isEmpty, get, set, iterator, and listIterator operations run
in constant time. The add operation runs in amortized constant time,
that is, adding n elements requires O(n) time. All of the other
operations run in linear time (roughly speaking). The constant factor
is low compared to that for the LinkedList implementation.
From the documentation:
http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html
For HashMap:
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
From the documentation:
http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html
The documentation for ArrayList has a discussion of the performance of the performance of these operations. As for HashMap, it will be O(1) for all three. This does assume, however, that your hashCode method is well-implemented, and is also O(1).
Please have a look at http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
O(1) to get/add
O(n) for contains
O(1) for next
This question already has answers here:
Difference between HashMap, LinkedHashMap and TreeMap
(17 answers)
What is the difference between a HashMap and a TreeMap? [duplicate]
(8 answers)
Closed 8 years ago.
When to use hashmaps or treemaps?
I know that I can use TreeMap to iterate over the elements when I need them to be sorted.
But is just that? There is no optimization when I just want to consult the maps, or some optimal specific uses?
TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately.
Unless you need the entries to be sorted, I'd stick with HashMap. Or there's ConcurrentHashMap of course. I can't remember the details of the differences between all of them, but HashMap is a perfectly reasonable "default" option :)
For completeness, I should point out that there was a discussion on Stack Overflow a month or so ago about the internals of various maps. See the comments in this question, which I will copy into this answer if bestsss is happy for me to do so.
Hashtables (usually) perform search operations (look up) bounded within the complexity of O(n)<=T(n)<=O(1), with an average case complexity of O(1 + n/k); however, binary search trees, (BST's), perform search operations (lookup) bounded within the complexity of O(n)<=T(n)<=O(log_2(n)), with an average case complexity of O(log_2(n)). The implementation for each (and every) data structure should be known (by you), to understand the advantages, drawbacks, time complexity of operations, and code complexity.
For example, the number of entries in a hashtable often have some fixed number of entries (some part of which may not be filled at all) with lists of collisions. Trees, on the other hand, usually have two pointers (references) per node, but this can be more if the implementation allows more than two child nodes per node, and this allows the tree to grow as nodes are added, but may not allow duplicates. (The default implementation of a Java TreeMap does not allow for duplicates)
There are special cases to consider as well, for example, what if the number of elements in a particular data structure increases without bound or approaches the limit of an underlying part of the data structure? What about amortized operations that perform some rebalancing or cleanup operation?
For example, in a hashtable, when the number of elements in the table become sufficiently large, and arbitrary number of collisions can occur. On the other hand, trees usually require come re-balancing procedure after an insertion (or deletion).
So, if you have something like a cache (Ex. the number of elements in bounded, or size is known) then a hashtable is probably your best bet; however, if you have something more like a dictionary (Ex. populated once and looked up many times) then I'd use a tree.
This is only in the general case, however, (no information was given). You have to understand process that happen how they happen to make the right choice in deciding which data structure to use.
When I need a multi-map (ranged lookup) or sorted flattening of a collection, then it can't be a hashtable.
The largest difference between the two is the underlying structure used in the implementation.
HashMaps use an array and a hashing function to store elements. When you try to insert or delete an item in the array the hashing function converts the key into an index on the array where the object is/should be stored (ignoring conflicts). While hashmaps are generally very fast because they don't need to iterate over large amounts of data, they slow down when they're filled because they need to copy all the key/values into a new array.
TreeMaps store a the data in a sorted tree structure. While this means that they'll never have to allocate more space and copy over to it, operations require that part of the data already stored be iterated over. Sometimes changing large amounts of the structure.
Out of the two Hashmaps will generally have better performance when you don't need sorting.
Inserting new elements into a HashMap will, on average, be a good deal faster than inserting elements into a TreeMap. Unless you need your elements sorted, I'd go with the HashMap.
Don't forget there is also LinkedHashMap which is nearly as fast as HashMap for add/contains/remove operations but also maintains the insertion order.