What is the necessity to introduce Pair class when Hashmap can do the same job?
I see Pair being introduced to Java version 8
Your choice of which class to use is not just a message to your computer. It's also a message to future developers - people who will maintain your code in the future, or even you yourself in a few months time.
By choosing whether to declare a particular variable as either HashMap or Pair, you're telling those future developers something. It's EITHER
This variable references some kind of map, which uses a hash algorithm for fast retrieval.
OR
This variable references a pair of values.
That will help the future developers to understand what your code is doing. Whereas you can certainly use a HashMap with a single entry instead of a Pair, it would be a very strange thing to do, and it would send entirely the wrong message to the future maintainers of your code.
A pair is basically a convenient way of associating a simple key to a value. Maps do the same thing to store key-value pairs but maps stores a collection of pairs and operate them as a whole.
Number of times we have a requirement where a key-value pair shall exist on its own, for instance:
A key-value pair needs to be passed to a method as an argument, Or
A method needs to return just two values in form of a pair
Map complicates the things when we just need a single pair of key-value.
Pair<K, V> is a part of JavaFX whereas Hashmap is in the core API. You can most likely use Pair to create a hashmap implementation (I have not tested this, but I see no reason as to why not), but a Pair is not the same as a HashMap.
I want to write my own Map in Java. I know how map works, but i don't really know where you can keep keys and values. Can i keep them for example in List? So the keys would be store in the list and values would be store in another list?
Best would be if you checked out some of the concepts behind HashMap, TreeMap, HeapMap etc.
Once you understand those concepts, you're far better prepared for writing your own map when it comes to speed.
In other words: unless you know the concepts of all available implementations, it is very unlikely your wheel-re-invention will be a better solution.
Also be sure to test your implementations very thoroughly, as Collection are the backbone and heart of any good application.
Two very very simple (but slow) solutions are these:
1) As suggested above, you can use an ArrayList<Pair> and add your custom getItemByKey() (in Java commonly named 'get') method.
2) You can use two arrays, both keeping the same size, and keeping keys and values matched by their respective indices.
For choosing the data structure there's not better than Array (not all time but almost) of Entries (key/value) because the main goal of map is to map objects for objects, so mapping keys to values.
Using arrays for fast and constant access O(1), but you have a little problem, when your map is full, you have to create new Array and copy old entries.
Note: HashMap works in the same way.
At one place i have to use a map with many values mapped to a single key, so i was wondering whether there is any significant performance distinction between using HashMap of key, list and MultiMap of key , values in java.
You can try it but I doubt there is much difference as it does much the same thing.
IMHO The advantage is simpler/clearer code which is usually more important than performance.
I'd recommend to use google collections if you want to use a more convenient implementation of a Multimap. In case you don't want to introduce a new dependency, HashMap<Key, Collection<Value>> should do the trick which is pretty much what apache.collections HashMultiMap does.
If it is a Map Key-> Values, use a Map implementation.
As you will have some Values with the same Keys, use the http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/collect/HashMultiset.html from the Google Collection (now guava library, http://code.google.com/p/guava-libraries/ ) for your task.
Hash provides O(1) which is fast and does nothing with the size of elements.
Regarding to Multimap, you could put values in dependent collection (List, Set). Different collection implementations provides different performance.
EDIT: As I commented on Sebastian's answer. You could use Guava which provides different value collection implemantions: HashMultiMap (HashMap<KEY, HashSet<VALUE>>), ArrayListMultiMap (HashMap<KEY, ArrayList<VALUE>>)...
I was surprised by the fact that Map<?,?> is not a Collection<?>.
I thought it'd make a LOT of sense if it was declared as such:
public interface Map<K,V> extends Collection<Map.Entry<K,V>>
After all, a Map<K,V> is a collection of Map.Entry<K,V>, isn't it?
So is there a good reason why it's not implemented as such?
Thanks to Cletus for a most authoritative answer, but I'm still wondering why, if you can already view a Map<K,V> as Set<Map.Entries<K,V>> (via entrySet()), it doesn't just extend that interface instead.
If a Map is a Collection, what are the elements? The only reasonable answer is "Key-value pairs"
Exactly, interface Map<K,V> extends Set<Map.Entry<K,V>> would be great!
but this provides a very limited (and not particularly useful) Map abstraction.
But if that's the case then why is entrySet specified by the interface? It must be useful somehow (and I think it's easy to argue for that position!).
You can't ask what value a given key maps to, nor can you delete the entry for a given key without knowing what value it maps to.
I'm not saying that that's all there is to it to Map! It can and should keep all the other methods (except entrySet, which is redundant now)!
From the Java Collections API Design FAQ:
Why doesn't Map extend Collection?
This was by design. We feel that
mappings are not collections and
collections are not mappings. Thus, it
makes little sense for Map to extend
the Collection interface (or vice
versa).
If a Map is a Collection, what are the
elements? The only reasonable answer
is "Key-value pairs", but this
provides a very limited (and not
particularly useful) Map abstraction.
You can't ask what value a given key
maps to, nor can you delete the entry
for a given key without knowing what
value it maps to.
Collection could be made to extend
Map, but this raises the question:
what are the keys? There's no really
satisfactory answer, and forcing one
leads to an unnatural interface.
Maps can be viewed as Collections (of
keys, values, or pairs), and this fact
is reflected in the three "Collection
view operations" on Maps (keySet,
entrySet, and values). While it is, in
principle, possible to view a List as
a Map mapping indices to elements,
this has the nasty property that
deleting an element from the List
changes the Key associated with every
element before the deleted element.
That's why we don't have a map view
operation on Lists.
Update: I think the quote answers most of the questions. It's worth stressing the part about a collection of entries not being a particularly useful abstraction. For example:
Set<Map.Entry<String,String>>
would allow:
set.add(entry("hello", "world"));
set.add(entry("hello", "world 2"));
(assuming an entry() method that creates a Map.Entry instance)
Maps require unique keys so this would violate this. Or if you impose unique keys on a Set of entries, it's not really a Set in the general sense. It's a Set with further restrictions.
Arguably you could say the equals()/hashCode() relationship for Map.Entry was purely on the key but even that has issues. More importantly, does it really add any value? You may find this abstraction breaks down once you start looking at the corner cases.
It's worth noting that the HashSet is actually implemented as a HashMap, not the other way around. This is purely an implementation detail but is interesting nonetheless.
The main reason for entrySet() to exist is to simplify traversal so you don't have to traverse the keys and then do a lookup of the key. Don't take it as prima facie evidence that a Map should be a Set of entries (imho).
While you've gotten a number of answers that cover your question fairly directly, I think it might be useful to step back a bit, and look at the question a bit more generally. That is, not to look specifically at how the Java library happens to be written, and look at why it's written that way.
The problem here is that inheritance only models one type of commonality. If you pick out two things that both seem "collection-like", you can probably pick out a 8 or 10 things they have in common. If you pick out a different pair of "collection-like" things, they'll also 8 or 10 things in common -- but they won't be the same 8 or 10 things as the first pair.
If you look at a dozen or so different "collection-like" things, virtually every one of them will probably have something like 8 or 10 characteristics in common with at least one other one -- but if you look at what's shared across every one of them, you're left with practically nothing.
This is a situation that inheritance (especially single inheritance) just doesn't model well. There's no clean dividing line between which of those are really collections and which aren't -- but if you want to define a meaningful Collection class, you're stuck with leaving some of them out. If you leave only a few of them out, your Collection class will only be able to provide quite a sparse interface. If you leave more out, you'll be able to give it a richer interface.
Some also take the option of basically saying: "this type of collection supports operation X, but you're not allowed to use it, by deriving from a base class that defines X, but attempting to use the derived class' X fails (e.g., by throwing an exception).
That still leaves one problem: almost regardless of which you leave out and which you put in, you're going to have to draw a hard line between what classes are in and what are out. No matter where you draw that line, you're going to be left with a clear, rather artificial, division between some things that are quite similar.
I guess the why is subjective.
In C#, I think Dictionary extends or at least implements a collection:
public class Dictionary<TKey, TValue> : IDictionary<TKey, TValue>,
ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>,
IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback
In Pharo Smalltak as well:
Collection subclass: #Set
Set subclass: #Dictionary
But there is an asymmetry with some methods. For instance, collect: will takes association (the equivalent of an entry), while do: take the values. They provide another method keysAndValuesDo: to iterate the dictionary by entry. Add: takes an association, but remove: has been "suppressed":
remove: anObject
self shouldNotImplement
So it's definitively doable, but leads to some other issues regarding the class hierarchy.
What is better is subjective.
The answer of cletus is good, but I want to add a semantic approach. To combine both makes no sense, think of the case you add a key-value-pair via the collection interface and the key already exists. The Map-interface allows only one value associated with the key. But if you automatically remove the existing entry with the same key, the collection has after the add the same size as before - very unexpected for a collection.
Java collections are broken. There is a missing interface, that of Relation. Hence, Map extends Relation extends Set. Relations (also called multi-maps) have unique name-value pairs. Maps (aka "Functions"), have unique names (or keys) which of course map to values. Sequences extend Maps (where each key is an integer > 0). Bags (or multi-sets) extend Maps (where each key is an element and each value is the number of times the element appears in the bag).
This structure would allow intersection, union etc. of a range of "collections". Hence, the hierarchy should be:
Set
|
Relation
|
Map
/ \
Bag Sequence
Sun/Oracle/Java ppl - please get it right next time. Thanks.
Map<K,V> should not extend Set<Map.Entry<K,V>> since:
You can't add different Map.Entrys with the same key to the same Map, but
You can add different Map.Entrys with the same key to the same Set<Map.Entry>.
If you look at the respective data structure you can easily guess why Map is not a part of Collection. Each Collection stores a single value where as a Map stores key-value pair. So methods in Collection interface are incompatible for Map interface. For example in Collection we have add(Object o). What would be such implementation in Map. It doesn't make sense to have such a method in Map. Instead we have a put(key,value) method in Map.
Same argument goes for addAll(), remove(), and removeAll() methods. So the main reason is the difference in the way data is stored in Map and Collection.
Also if you recall Collection interface implemented Iterable interface i.e. any interface with .iterator() method should return an iterator which must allow us to iterate over the values stored in the Collection. Now what would such method return for a Map? Key iterator or a Value iterator? This does not make sense either.
There are ways in which we can iterate over keys and values stores in a Map and that is how it is a part of Collection framework.
Exactly, interface Map<K,V> extends
Set<Map.Entry<K,V>> would be great!
Actually, if it were implements Map<K,V>, Set<Map.Entry<K,V>>, then I tend to agree.. It seems even natural. But that doesn't work very well, right? Let's say we have HashMap implements Map<K,V>, Set<Map.Entry<K,V>, LinkedHashMap implements Map<K,V>, Set<Map.Entry<K,V> etc... that is all good, but if you had entrySet(), nobody will forget to implement that method, and you can be sure that you can get entrySet for any Map, whereas you aren't if you are hoping that the implementor has implemented both interfaces...
The reason I don't want to have interface Map<K,V> extends Set<Map.Entry<K,V>> is simply, because there will be more methods. And after all, they are different things, right? Also very practically, if I hit map. in IDE, I don't want to see .remove(Object obj), and .remove(Map.Entry<K,V> entry) because I can't do hit ctrl+space, r, return and be done with it.
Straight and simple.
Collection is an interface which is expecting only one Object, whereas Map requires Two.
Collection(Object o);
Map<Object,Object>
The hash structures I am aware of - HashTable, HashSet & HashMap.
Do they all use the bucket structure - ie when two hashcodes are similar exactly the same one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?
In Sun's current implementation of the Java library, IdentityHashMap and the internal implementation in ThreadLocal use probing structures.
The general problem with probing hash tables in Java is that hashCode and equals may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using == to check matches, then you can check many references without a performance problem.
IIRC, Azul had a fast concurrent quadratic probing hash map.
A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet is actually implemented by a HashMap underneath (all keys being mapped to the same singleton value across all HashSets) and hence uses the same bucket structure.
If an element is added, its equality is checked against all items in the linked list (via .equals) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.
I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.
I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)
No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.
When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.
Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.
The classes you mentioned -- HashTable, HashSet, and HashMap don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.
The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.
Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.
Did you mean to ask something about the specific implementation of one of these structures?
Except the HashSet. Set is by definition unique elements.
This was a mistake. Please see the comments below.