Based on this post,
Time complexity of TreeMap operations- subMap, headMap, tailMap
subMap() itself is O(1), and O(n) comes from iterating the sub map.
So, why use get(key) then?
We can use subMap(key, true, key, true) instead,
which is O(1) and iterating this sub map is also O(1).
Faster than get(key), which is O(log(n)). Something wrong here...
We can use subMap(key, true, key, true) instead, which is O(1)
This is correct
and iterating this sub map is also O(1).
O(n) comes from the question. The answer says nothing to imply this, which is good, because it's not true.
Time complexity of iterating a subtree is O(log n + k), where n is the number of elements in the whole map, and k is the number of elements in the sub-map. In other words, it still takes O(log n) to get to the first position when you start iterating. Look up getFirstEntry() implementation to see how it is done.
This brings the overall complexity of your approach to O(log n), but it is bound to be slower than a simple get, because an intermediate object is created and discarded in the process.
The answer is a bit confusing. Technically it's true that creating the submap is constant operation. But that's just because it actually does nothing apart from setting the low and high keys and still shares the tree structure with the original tree.
As a result any operation on the tree is actually postponed until the specific method is invoked. So then get() still goes through the whole original map and only checks whether it didn't cross the low and high boundaries. Simply saying the get() is still O(n) where the n comes from the original map, not from the submap.
The construction of subMap takes O(1) time, however all retrieval operations take the same O(log n) time as in the original map because SubMap just wraps this object and eventually complete a range check and delegate the invocation of get() method to the original source map object.
I have to choose between two data structures, ArrayList and LinkedList.
I have two operations op_one, op_two.
If I choose ArrayList -
I will end up with
for op_one ------ O(n), and at maximum n re-allocations
for op_two ------ O(1), and at maximum n re-allocations
If I choose LinkedList -
I will end up with
for op_one ------ O(n), and zero re-allocations
for op_two ------ O(n), and zero re-allocations
I will be storing millions of comparable elements. And I will be doing both the operations equally likely. Which one should I choose.
I suggest you time them together and a realistic way and see which is faster. If they are not significantly different I would use the approach you believe is simplest.
While the order of ArrayList and LinkedLIst are the same for space, the ArrayList is much smaller.
All the same clarity is usually the most important unless you know you have a performance issue.
(Question first understood as time complexity disregarding space. Request for clarifications made in comments)
I would use ArrayList (over LinkedList), not only because time complexity, but because it is simpler and it doesn't build a new node for each item.
Note overall complexity will be O(n) in either case: For ArrayList an O(n) + O(1), or prob*O(n) + (1-prob)*O(1) will be in the order of O(n).
Given equal complexity ( O(n) ), then you should find out actual execution time, or chose the easier to implement or maintain.
Consider that on modern architectures memory bandwidth is often the bottleneck. Sometimes computing a result is hence faster than storing and reading pre-computed values. In other words, computational complexity should be considered together with memory complexity. I would keep the memory usage small if the computations are not expensive and can work within the cache.
But in the end, you will have to test...
I am comparing 2 HashMaps, and I am trying to figure out the time complexity of the comparison loop.
The code is as follows:
//map1 is a HashMap and contains m elements and keys
//map2 is a HashMap and contains n elements and keys
List<myObject> myList = new ArrayList<myObject>()
for (String key: map1.keySet()){
if(!map2.containsKey(key)){
myList.add(map.get(key));
}
}
The first for loop will be O(m). I found on some other forum that the containsKey() takes lg(n) time. Can someone please confirm that? I couldn't find it in the JavaDocs.
If so , then the the total time complexity would be O(mlg{n}).
Also any ideas on how to do this comparison in a better way would be helpful.
Depends on your hashcode algorithm and collisions.
Using a perfect hashcode, theoretically map look up is O(1), constant time, if there are collisions, it might be upto O(n).
So in your case, if you have good hash algorithms, it would be O(m).
if you look at wiki, you can get more understanding about the concept. You can also look at Map source code.
The Java HashMap implementation should constantly be resizing the internal data structure to be larger than the number of elements in the map by a certain amount and the hashing algorithm is good so I would assume collisions are minimal and that you will get much closer to O(1) than O(n).
What HashMap are you using? The one that comes with Java? Your own?
You're right about the time complexity of the outer loop: O(n). The asymptotic complexity of HashMap.containsKey() is O(1) unless you've done something ridiculous in your implementation of myObject.hashCode(). So your method should run in O(n) time. An optimization would be to ensure you're looping over the smaller of the two maps.
Note that TreeMap.containsKey() has O(log n) complexity, not HashMap... Stop looking at those forums :)
I found other entries for this question that dealt with specific methods, but nothing comprehensive. I'd like to verify my own understanding of the most often used methods of this data structure:
O(1) - Constant Time:
isEmpty()
add(x)
add(x, i)
set(x, i)
size()
get(i)
remove(i)
O(N) - Linear Time:
indexof(x)
clear()
remove(x)
remove(i)
Is this correct? Thanks for your help.
The best resource is straight from the official API:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
I've seen some interesting claims on SO re Java hashmaps and their O(1) lookup time. Can someone explain why this is so? Unless these hashmaps are vastly different from any of the hashing algorithms I was bought up on, there must always exist a dataset that contains collisions.
In which case, the lookup would be O(n) rather than O(1).
Can someone explain whether they are O(1) and, if so, how they achieve this?
A particular feature of a HashMap is that unlike, say, balanced trees, its behavior is probabilistic. In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be. For a hash map, that of course is the case of a collision with respect to how full the map happens to be. A collision is pretty easy to estimate.
pcollision = n / capacity
So a hash map with even a modest number of elements is pretty likely to experience at least one collision. Big O notation allows us to do something more compelling. Observe that for any arbitrary, fixed constant k.
O(n) = O(k * n)
We can use this feature to improve the performance of the hash map. We could instead think about the probability of at most 2 collisions.
pcollision x 2 = (n / capacity)2
This is much lower. Since the cost of handling one extra collision is irrelevant to Big O performance, we've found a way to improve performance without actually changing the algorithm! We can generalzie this to
pcollision x k = (n / capacity)k
And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for. You could get the probability to an arbitrarily tiny level by choosing the correct k, all without altering the actual implementation of the algorithm.
We talk about this by saying that the hash-map has O(1) access with high probability
You seem to mix up worst-case behaviour with average-case (expected) runtime. The former is indeed O(n) for hash tables in general (i.e. not using a perfect hashing) but this is rarely relevant in practice.
Any dependable hash table implementation, coupled with a half decent hash, has a retrieval performance of O(1) with a very small factor (2, in fact) in the expected case, within a very narrow margin of variance.
In Java, how HashMap works?
Using hashCode to locate the corresponding bucket [inside buckets container model].
Each bucket is a LinkedList (or a Balanced Red-Black Binary Tree under some conditions starting from Java 8) of items residing in that bucket.
The items are scanned one by one, using equals for comparison.
When adding more items, the HashMap is resized (doubling the size) once a certain load percentage is reached.
So, sometimes it will have to compare against a few items, but generally, it's much closer to O(1) than O(n) / O(log n).
For practical purposes, that's all you should need to know.
Remember that o(1) does not mean that each lookup only examines a single item - it means that the average number of items checked remains constant w.r.t. the number of items in the container. So if it takes on average 4 comparisons to find an item in a container with 100 items, it should also take an average of 4 comparisons to find an item in a container with 10000 items, and for any other number of items (there's always a bit of variance, especially around the points at which the hash table rehashes, and when there's a very small number of items).
So collisions don't prevent the container from having o(1) operations, as long as the average number of keys per bucket remains within a fixed bound.
I know this is an old question, but there's actually a new answer to it.
You're right that a hash map isn't really O(1), strictly speaking, because as the number of elements gets arbitrarily large, eventually you will not be able to search in constant time (and O-notation is defined in terms of numbers that can get arbitrarily large).
But it doesn't follow that the real time complexity is O(n)--because there's no rule that says that the buckets have to be implemented as a linear list.
In fact, Java 8 implements the buckets as TreeMaps once they exceed a threshold, which makes the actual time O(log n).
O(1+n/k) where k is the number of buckets.
If implementation sets k = n/alpha then it is O(1+alpha) = O(1) since alpha is a constant.
If the number of buckets (call it b) is held constant (the usual case), then lookup is actually O(n).
As n gets large, the number of elements in each bucket averages n/b. If collision resolution is done in one of the usual ways (linked list for example), then lookup is O(n/b) = O(n).
The O notation is about what happens when n gets larger and larger. It can be misleading when applied to certain algorithms, and hash tables are a case in point. We choose the number of buckets based on how many elements we're expecting to deal with. When n is about the same size as b, then lookup is roughly constant-time, but we can't call it O(1) because O is defined in terms of a limit as n → ∞.
Elements inside the HashMap are stored as an array of linked list (node), each linked list in the array represents a bucket for unique hash value of one or more keys.
While adding an entry in the HashMap, the hashcode of the key is used to determine the location of the bucket in the array, something like:
location = (arraylength - 1) & keyhashcode
Here the & represents bitwise AND operator.
For example: 100 & "ABC".hashCode() = 64 (location of the bucket for the key "ABC")
During the get operation it uses same way to determine the location of bucket for the key. Under the best case each key has unique hashcode and results in a unique bucket for each key, in this case the get method spends time only to determine the bucket location and retrieving the value which is constant O(1).
Under the worst case, all the keys have same hashcode and stored in same bucket, this results in traversing through the entire list which leads to O(n).
In the case of java 8, the Linked List bucket is replaced with a TreeMap if the size grows to more than 8, this reduces the worst case search efficiency to O(log n).
We've established that the standard description of hash table lookups being O(1) refers to the average-case expected time, not the strict worst-case performance. For a hash table resolving collisions with chaining (like Java's hashmap) this is technically O(1+α) with a good hash function, where α is the table's load factor. Still constant as long as the number of objects you're storing is no more than a constant factor larger than the table size.
It's also been explained that strictly speaking it's possible to construct input that requires O(n) lookups for any deterministic hash function. But it's also interesting to consider the worst-case expected time, which is different than average search time. Using chaining this is O(1 + the length of the longest chain), for example Θ(log n / log log n) when α=1.
If you're interested in theoretical ways to achieve constant time expected worst-case lookups, you can read about dynamic perfect hashing which resolves collisions recursively with another hash table!
It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions.
Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.
This basically goes for most hash table implementations in most programming languages, as the algorithm itself doesn't really change.
If there are no collisions present in the table, you only have to do a single look-up, therefore the running time is O(1). If there are collisions present, you have to do more than one look-up, which drives down the performance towards O(n).
It depends on the algorithm you choose to avoid collisions. If your implementation uses separate chaining then the worst case scenario happens where every data element is hashed to the same value (poor choice of the hash function for example). In that case, data lookup is no different from a linear search on a linked list i.e. O(n). However, the probability of that happening is negligible and lookups best and average cases remain constant i.e. O(1).
Only in theoretical case, when hashcodes are always different and bucket for every hash code is also different, the O(1) will exist. Otherwise, it is of constant order i.e. on increment of hashmap, its order of search remains constant.
Academics aside, from a practical perspective, HashMaps should be accepted as having an inconsequential performance impact (unless your profiler tells you otherwise.)
Of course the performance of the hashmap will depend based on the quality of the hashCode() function for the given object. However, if the function is implemented such that the possibility of collisions is very low, it will have a very good performance (this is not strictly O(1) in every possible case but it is in most cases).
For example the default implementation in the Oracle JRE is to use a random number (which is stored in the object instance so that it doesn't change - but it also disables biased locking, but that's an other discussion) so the chance of collisions is very low.