Hashmap comparison running time - java

I am comparing 2 HashMaps, and I am trying to figure out the time complexity of the comparison loop.
The code is as follows:
//map1 is a HashMap and contains m elements and keys
//map2 is a HashMap and contains n elements and keys
List<myObject> myList = new ArrayList<myObject>()
for (String key: map1.keySet()){
if(!map2.containsKey(key)){
myList.add(map.get(key));
}
}
The first for loop will be O(m). I found on some other forum that the containsKey() takes lg(n) time. Can someone please confirm that? I couldn't find it in the JavaDocs.
If so , then the the total time complexity would be O(mlg{n}).
Also any ideas on how to do this comparison in a better way would be helpful.

Depends on your hashcode algorithm and collisions.
Using a perfect hashcode, theoretically map look up is O(1), constant time, if there are collisions, it might be upto O(n).
So in your case, if you have good hash algorithms, it would be O(m).
if you look at wiki, you can get more understanding about the concept. You can also look at Map source code.

The Java HashMap implementation should constantly be resizing the internal data structure to be larger than the number of elements in the map by a certain amount and the hashing algorithm is good so I would assume collisions are minimal and that you will get much closer to O(1) than O(n).
What HashMap are you using? The one that comes with Java? Your own?

You're right about the time complexity of the outer loop: O(n). The asymptotic complexity of HashMap.containsKey() is O(1) unless you've done something ridiculous in your implementation of myObject.hashCode(). So your method should run in O(n) time. An optimization would be to ensure you're looping over the smaller of the two maps.
Note that TreeMap.containsKey() has O(log n) complexity, not HashMap... Stop looking at those forums :)

Related

Can we reduce Time complexity O(n) of preparing a Java HashSet from an ArrayList?

There is an ArrayList of 10000+ items. I am trying to make them unique through a HashSet which is a operation of O(n) complexity. Is there any other algorithm / DS which can make a Collection unique with lower complexity than O(n)?
No, this is literally impossible. O(n) is the minimum complexity to so much as read through the ArrayList, let alone do anything with the elements.
Without going through all the elements once, it is not possible to confirm that your set has all unique values. Hence, O(n) is minimum possible.

Running time of insertion into 2 hashtables with iteration and printing

I have a program that does the following:
Iterates through a string, placing words into a HashMap<String, Integer> where the key represents the unique word, and the value represents a running total occurrences (incremented each time the word is found).
I believe up to this point we are O(n) since each of the insertions is constant time.
Then, I iterate through the hashmap and insert the values into a new HashMap<Integer, List<String>>. The String goes into the List in the value where the count matches. I think that we are still at O(n) because the operations used on HashMaps and Lists are constant time.
Then, I iterate through the HashMap and print the Strings in each List.
Does anything in this program cause me to go above O(n) complexity?
That is O(n), unless your word-parsing algorithm is not linear (but it should be).
You're correct, with a caveat. In a hash table, insertions and lookups take expected O(1) time each, so the expected runtime of your algorithm is O(n). If you have a bad hash function, there's a chance it will take longer than that, usually (for most reasonable hash table implementations) O(n2) in the worst-case.
Additionally, as #Paul Draper pointed out, this assumes that the computation of the hash code for each string takes time O(1) and that comparing the strings in the table takes time O(1). If you have strings whose lengths aren't bounded from above by some constant, it might take longer to compute the hash codes. In fact, a more accurate analysis would be that the runtime is O(n + L), where L is the total length of all the strings.
Hope this helps!
Beyond the two issues that Paul Draper and templatetypedef point out, there's another potential one. You write that your second map is a hashmap < int,list < string > >. This allows for a total linear complexity only if the implementation you choose for the list allows for (amortized) constant time appending. This is the case if you use an ArrayList and you add entries at the end, or you choose a LinkedList and add entries at either end.
I think this covers the default choices for most developers, so it's not really an obstacle.

Set time and speed complexity

I am brushing up algorithms and data structures and have a few questions as well as statements I would like you to check.
ArrayList - O(1) (size, get, set, ...), O(n) - add operation.
LinkedList - all operation O(1) (including add() ), except for retrieving n-th element which is O(n). I assume size() operation runs in O(1) as well, right?
TreeSet - all operations O(lg(N)). size() operation takes O(lg(n)), right?
HashSet - all operations O(1) if proper hash function is applied.
HashMap - all operations O(1), anologous to HashSet.
Any further explanations are highly welcome. Thank you in advance.
ArrayList.add() is amortized O(1). If the operation doesn't require a resize, it's O(1). If it does require a resize, it's O(n), but the size is then increased such that the next resize won't occur for a while.
From the Javadoc:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
The documentation is generally pretty good for Java collections, in terms of performance analysis.
The O(1) for hash algorithms isn't a matter of just applying a "proper" hash function - even with a very good hash function, you could still happen to get hash collisions. The usual complexity is O(1), but of course it can be O(n) if all the hashes happen to collide.
(Additionally, that's counting the cost of hashing as O(1) - in reality, if you're hashing strings for example, each call to hashCode may be O(k) in the length of the string.)
Visit the following links. It will help you getting your doubts cleared.
Data structures & their complexity
Java standard data structures Big O notation

Loop Through a set of Objects

I have a map of key-value pairs of huge size, approximately 10^7, and I have to loop through it 15 times a second in order to update its contents
Is there any class or structure that offers good complexity and reduces the time needed to loop through?
Currently, I am using TreeMap but the complexity is log n only for contains, put, get and remove. Looping through the elements is of n complexity
Do you know any structure or do you have any idea that may reduce the complexity below n?
If you have to arbitrary loop over the entire collection, you will not get better than n. If you have to loop the entire collection, you could use a simple ArrayList. but if you need to access specific data in the collection using a key, TreeMap will be fine.
You can't beat the O(n) bound on any sequential (or finitely parallel) computer, if your problem is just to look at all of O(n) values.
If you have a finitely parallel machine and depending on exactly how you're updating the elements, you could achieve speedup. For instance, using CUDA and a GPU or OpenMP/MPI and a cluster/multi-core workstation, you could compute A[i] = A[i]^3 or some such with good speedup. Of course, then there's the question of communication... but this might be something to look at.

Computational complexity of TreeSet operations in Java?

I am trying to clear up some things regarding complexity in some of the operations of TreeSet. On the javadoc it says:
"This implementation provides
guaranteed log(n) time cost for the
basic operations (add, remove and
contains)."
So far so good. My question is what happens on addAll(), removeAll() etc. Here the javadoc for Set says:
"If the specified collection is also a
set, the addAll operation effectively
modifies this set so that its value is
the union of the two sets."
Is it just explaining the logical outcome of the operation or is it giving a hint about the complexity? I mean, if the two sets are represented by e.g. red-black trees it would be better to somehow join the trees than to "add" each element of one to the other.
In any case, is there a way to combine two TreeSets into one with O(logn) complexity?
Thank you in advance. :-)
You could imagine how it would be possible to optimize special cases to O(log n), but the worst case has got to be O(m log n) where m and n are the number of elements in each tree.
Edit:
http://net.pku.edu.cn/~course/cs101/resource/Intro2Algorithm/book6/chap14.htm
Describes a special case algorithm that can join trees in O(log(m + n)) but note the restriction: all members of S1 must be less than all members of S2. This is what I meant that there are special optimizations for special cases.
Looking at the java source for TreeSet, it looks like it if the passed in collection is a SortedSet then it uses a O(n) time algorithm. Otherwise it calls super.addAll, which I'm guessing will result in O(n logn).
EDIT - guess I read the code too fast, TreeSet can only use the O(n) algorithm if it's backing map is empty
According to this blog post:
http://rgrig.blogspot.com/2008/06/java-api-complexity-guarantees.html
it's O(n log n). Because the documentation gives no hints about the complexity, you might want to write your own algorithm if the performance is critical for you.
It is not possible to perform merging of trees or join sets like in Disjoint-set data structures because you don't know if the elements in the 2 trees are disjoint. Since the data structures have knowledge about the content in other trees, it is necessary to check if one element exists in the other tree before adding to it or at-least trying to add it into another tree and abort adding it if you find it on the way.
So, it should be O(MlogN)

Categories