Null keys in Java collections that are not sorted

Null keys in Java collections that are not sorted - java

Can please someone explain me the following:
it is clear for me why null keys (values) are prohibited in Java sorted collections
But why they are not allowed in HashTable, ConcurrentHashMap and ... Properties
Does it relates somehow to thread-safety?
Is null value allowed/prohibited in CopyOnWriteArrayList? Why?
Thank you

There are 2 questions here. Both can be answered by reading the docs.
1 - Why HashTable, ConcurrentHashMap and Properties cannot receive "null" as a key?
HashTable:
https://docs.oracle.com/javase/8/docs/api/java/util/Hashtable.html
To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method.
ConcurrentHashMap:
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
Conversely, because keys and values in the map are never null, null serves as a reliable atomic indicator of the current lack of any result.
The Properties class extends HashTable, so you already got the reason.
2 - Is null value allowed/prohibited in CopyOnWriteArrayList? Why?
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArrayList.html
All elements are permitted, including null.
There is no further explanation on why is that, but I believe it's a design decision.

But why they are not allowed in HashTable
From the Doc:
To successfully store and retrieve objects from a Hashtable, the
objects used as keys must implement the hashCode method and the equals method.
of a null referenced object is impossible to get that information(because is null)
Is null value allowed/prohibited in CopyOnWriteArrayList??
from the source code
public class CopyOnWriteArrayList<E>
A thread-safe variant of java.util.ArrayList in which all mutative
operations (add, set, and so on) are implemented by making a fresh
copy of the underlying array. ...
....
.....
Element-changing operations on iterators themselves (remove,
set, and add) are not supported. These methods throw
UnsupportedOperationException.
All elements are permitted, including null.
some times it helps reading the doc:

Related

Why does Hashmap allow a null key?

Actually I read lots of posts for this question, but did not get the exact reason/answer for "Why does Hashmap allow a null key?". Please can anyone give me the exact answer with an example?

One interpretation of your question:
why hashmap allowed [only] one null key?
Ask yourself: if HashMap allowed more than one null key, how would the map object distinguish between them?
Hint: there is only one null value.
Alternative interpretation of your question
why hashmap allowed [a] null key?
Because it is useful in some circumstances, and because there is no real semantic need to disallow it1, 2.
By contrast, with TreeMap null keys are disallowed because supporting them would be difficult given the implications of orderings involving null.
Given that the specified semantics for Comparable is to throw NPE.
Comparator is allowed to order null, but it is not required to. And many common implementations don't.
So if null was allowed with TreeMap, then the behavior of a map could be different depending on whether a Comparator or a Comparable is used. Messy.
1 - At least, that was the view when they specified HashMap in Java 1.2 back in 1998. Some of the designers may have changed their minds since then, but since the behavior is clearly specified it cannot be changed without messing up compatibility. It won't happen ...
2 - Support for null keys requires some special case code in HashMap which at least adds complexity to the implementation. It is not clear if it is a performance overhead for HashMap, since there would still need to be an implicit test for a null keys even if null keys were not allowed. This is most likely down in the noise.

Java engineers must have realized that having a null key and values has its uses like using them for default cases. So, they provided HashMap class with collection framework in Java 5 with capability of storing null key and values.
The put method to insert key value pair in HashMap checks for null key and stores it at the first location of the internal table array. It isn’t afraid of the null values and does not throw NullPointerException like Hashtable does.
Now, there can be only one null key as keys have to be unique although we can have multiple null values associated with different keys.
this link can answer more hashmap null key explained

The javadoc for HashMap.put clearly states:
Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced.
It clearly states what happens when you do a put with a key which was already in the map. The specific case of key == null behaves in the same way: you can't have two different mappings for the null key (just like you can't for any other key).

This is still viewed as a mistake done in early implementations of HashMap that unfortunately people where (are?) relying on (as well as some particular order until java-8). The thing is, having this null key you could always pass some metadata along with your actual data "for free". Notice that all new collections ConcurrentHashMap, Map.of (in java-9), etc - all prohibit nulls to start with.

Why does Map.of not allow null keys and values?

With Java 9, new factory methods have been introduced for the List, Set and Map interfaces. These methods allow quickly instantiating a Map object with values in one line. Now, if we consider:
Map<Integer, String> map1 = new HashMap<Integer, String>(Map.of(1, "value1", 2, "value2", 3, "value3"));
map1.put(4, null);
The above is allowed without any exception while if we do:
Map<Integer, String> map2 = Map.of(1, "value1", 2, "value2", 3, "value3", 4, null );
It throws:
Exception in thread "main" java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:221)
..
I am not able to get, why null is not allowed in second case.
I know HashMap can take null as a key as well as a value but why was that restricted in the case of Map.of?
The same thing happens in the case of java.util.Set.of("v1", "v2", null) and java.util.List.of("v1", "v2", null).

As others pointed out, the Map contract allows rejecting nulls...
[S]ome implementations prohibit null keys and values [...]. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException.
... and the collection factories (not just on maps) make use of that.
They disallow null keys and values. Attempts to create them with null keys or values result in NullPointerException.
But why?
Allowing null in collections is by now seen as a design error. This has a variety of reasons. A good one is usability, where the most prominent trouble maker is Map::get. If it returns null, it is unclear whether the key is missing or the value was null. Generally speaking, collections that are guaranteed null free are easier to use. Implementation-wise, they also require less special casing, making the code easier to maintain and more performant.
You can listen to Stuart Marks explain it in this talk but JEP 269 (the one that introduced the factory methods) summarizes it as well:
Null elements, keys, and values will be disallowed. (No recently introduced collections have supported nulls.) In addition, prohibiting nulls offers opportunities for a more compact internal representation, faster access, and fewer special cases.
Since HashMap was already out in the wild when this was slowly discovered, it was too late to change it without breaking existing code but most recent implementations of those interfaces (e.g. ConcurrentHashMap) do not allow null anymore and the new collections for the factory methods are no exception.
(I thought another reason was that explicitly using null values was seen as a likely implementation error but I got that wrong. That was about to duplicate keys, which are illegal as well.)
So disallowing null had some technical reason but it was also done to improve the robustness of the code using the created collections.

Exactly - a HashMap is allowed to store null, not the Map returned by the static factory methods. Not all maps are the same.
Generally as far as I know it has a mistake in the first place to allow nulls in the HashMap as keys, newer collections ban that possibility to start with.
Think of the case when you have an Entry in your HashMap that has a certain key and value == null. You do get, it returns null. What does the mean? It has a mapping of null or it is not present?
Same goes for a Key - hashcode from such a null key has to treated specially all the time. Banning nulls to start with - make this easier.

In my opinion non-nullability makes sense for keys, but not for values.
The Map interface becomes a weak contract, you can't trust it's behaviour without looking at the implementation, and that is assuming you have access to see it. Java11's Map.of() does not allow null values while HashMap does, but they both implement the same Map contract - how come?
Having a null value or no value at all could not, and arguably should not be considered as a distinct scenario. When you get the value for a key and you get null, who cares if it was explicitly put there or not? there's simply no value for the supplied key, the map does not contain such key, simple as that. Null is null, there's no nullnull or null^2
Existing libraries make extensive use of map as a means to pass properties to them, many of them optional, this makes Map.of() useless as a construct for such structures.
Kotlin enforces nullability at compile time and allows for maps with null values with no known issues.
The reason behind not allowing null values is probably just due to implementation detail and convenience of the creator.

While HashMap does allow null values, Map.of does not use a HashMap and throws an exception if one is used either as key or value, as documented:
The Map.of() and Map.ofEntries() static factory methods provide a convenient way to create immutable maps. The Map instances created by these methods have the following characteristics:
...
They disallow null keys and values. Attempts to create them with null keys or values result in NullPointerException.

Allowing nulls in maps has been an error. We can see it now, but I guess it wasn't clear when HashMap was introduced. NullpointerException is the most common bug seen in production code.
I would say that the JDK goes in the direction of helping developers fight the NPE plague. Some examples:
Introduction of Optional
Collectors.toMap(keyMapper, valueMapper) doesn't allow neither the keyMapper nor the valueMapper function to return a null value
Stream.findFirst() and Stream.findAny() throw NPE if the value found is null
So, I think that disallowing null in the new JDK9 immutable collections (and maps) goes in the same direction.

The major difference is: when you build your own Map the "option 1" way ... then you are implicitly saying: "I want to have full freedom in what I am doing".
So, when you decide that your map should have a null key or value (maybe for the reasons listed here) then you are free to do so.
But "option 2" is about a convenience thing - probably intended to be used for constants. And the people behind Java simply decided: "when you use these convenience methods, then the resulting map shall be null-free".
Allowing for null values means that
if (map.contains(key))
is not the same as
if (map.get(key) != null)
which might be a problem sometimes. Or more precisely: it is something to remember when dealing with that very map object.
And just an anecdotal hint why this seems to be a reasonable approach: our team implemented similar convenience methods ourselves. And guess what: without knowing anything about plans about future Java standard methods - our methods do the exact same thing: they return an immutable copy of the incoming data; and they throw up when you provide null elements. We are even so strict that when you pass in empty lists/maps/... we complain as well.

The documentation does not say why null is not allowed:
They disallow null keys and values. Attempts to create them with null keys or values result in NullPointerException.
In my opinion, the Map.of() and Map.ofEntries() static factory methods, which are going to produce a constant, are mostly formed by a developer at the compile type. Then, what is the reason to keep a null as the key or the value?
Whereas, the Map#put is usually used by filling a map at runtime where null keys/values could occur.

Not all Maps allow null as key
The reason mentioned in the docs of Map.
Some map implementations have restrictions on the keys and values they may contain. For example, some implementations prohibit null keys and values, and some have restrictions on the types of their keys. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException.

how does hashing in java works?

I am trying to figure something out about hashing in java.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.

HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)

Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.
A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.

If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)
A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)
where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.
When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:
1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
2. o2.equals(o2) == o2.equals(o1)
3. The hashcode of an object doesn't change while it is a key in a hash table.
It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.
"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"
The hash values are typically not stored. Rather they are calculated as required.
In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.

This is the most fundamental contract in Java: the .equals()/.hashCode() contract.
The most important part of it here is that two objects which are considered .equals() should return the same .hashCode().
The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following .hashCode() implementation, which, while perfectly legal, is as broken an implementation as can exist:
#Override
public int hashCode() { return 42; } // legal!!
While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the Set contract stipulates that a Set should not contain duplicate elements; however, the strategy of a Set implementation is left... Well, to the implementation. You will notice, if you look at the javadoc of Map, that its keys can be retrieved by a method called .keySet(). Therefore, Map and Set are very closely related in this regard.
If we take the case of a HashSet (and, ultimately, HashMap), it relies on .equals() and .hashCode(): when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, a TreeSet (and TreeMap) relies on the natural ordering of elements (see Comparable).
However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken, .hashCode() implementation above), then .equals() is used to determine whether that object is really unique.
Note that, internally, a HashSet is a HashMap...

Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.

HashMap stores key-value pair in Map.Entry static nested class implementation.
HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out
the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already
existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites
the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index
in the array and then use equals() method to find the correct Entry and return it’s value.
Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing.
HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied
by load factor and whenever we try to add an entry, if map size is greater than threshold,
HashMap rehashes the contents of map into a new array with a larger capacity.
The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs,
for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity
and load factor.

Why is HashMap faster than HashSet?

I have been reading/researching the reason why HashMapis faster than HashSet.
I am not quite understanding the following statements:
HashMap is faster than HashSet because the values are associated to a unique key.
In HashSet, member object is used for calculating hashcode value which can be same for two objects so equals() method is used to check for equality. If it returns false, that means the two objects are different. In HashMap, the hashcode value is calculated using the key object.
The HashMap hashcode value is calculated using the key object. Here, the member object is used to calculate the hashcode, which can be the same for two objects, so equals() method is used to check for equality. If it returns false, that means the two objects are different.
To conclude my question:
I thought HashMap and HashSet calculate the hashcode in the same way. Why are they different?
Can you provide a concrete example how HashSet and HashMap calculating the hashcode differently?
I know what a "key object" is, but what does it mean by "member object"?
HashMap can do the same things as HashSet, and faster. Why do we need HashSet? Example:
HashMap <Object1, Boolean>= new HashMap<Object1, boolean>();
map.put("obj1",true); => exist
map.get("obj1"); =>if null = not exist, else exist

Performance:
If you look at the source code of HashSet (at least JDK 6, 7 and 8), it uses HashMap internally, so it basically does exactly what you are doing with sample code.
So, if you need a Set implementation, you use HashSet, if you need a Map - HashMap. Code using HashMap instead of HashSet will have exactly the same performance as using HashSet directly.
Choosing the right collection
Map - maps keys to values (associative array) - http://en.wikipedia.org/wiki/Associative_array.
Set - a collection that contains no duplicate elements - http://en.wikipedia.org/wiki/Set_(computer_science).
If the only thing you need your collection for is to check if an element is present in there - use Set. Your code will be cleaner and more understandable to others.
If you need to store some data for your elements - use Map.

None of these answers really explain why HashMap is faster than HashSet. They both have to calculate the hashcode, but think about the nature of the key of a HashMap - it is typically a simple String or even a number. Calculating the hashcode of that is much faster than the default hashcode calculation of an entire object. If the key of the HashMap was the same object as that stored in a HashSet, there would be no real difference in performance. The difference comes in the what sort of object is the HashMap's key.

Why is it useful to have null values or null keys in hash maps?

Hashtable does not allow null keys or values, while HashMap allows null values and 1 null key.
Questions:
Why is this so?
How is it useful to have such a key and values in HashMap?

1. Why is this so?
HashMap is newer than Hashtable and fixes some of its limitations.
I can only guess what the designers were thinking, but here are my guesses:
Hashtable calculates a hash for each key by calling hashCode on each key. This would fail if the key were null, so this could be a reason for disallowing nulls as keys.
The method Hashtable.get returns null if the key is not present. If null were a valid value it would be ambiguous as to whether null meant that the key was present but had value null, or if the key was absent. Ambiguity is bad, so this could be a reason for disallowing nulls as values.
However it turns out that sometimes you do actually want to store nulls so the restrictions were removed in HashMap. The following warning was also included in the documentation for HashMap.get:
A return value of null does not necessarily indicate that the map contains no mapping for the key; it is also possible that the map explicitly maps the key to null.
2. How is it useful to have such a key and values in HashMap?
It is useful to explicitly store null to distinguish between a key that you know exists but doesn't have an associated value and a key that doesn't exist. An example is a list of registered users and their birthdays. If you ask for a specific user's birthday you want to be able to distinguish between that user not existing and the user existing but they haven't entered their birthday.
I can't think of any (good) reason for wanting to store null as a key, and in general I'd advise against using null as a key, but presumably there is at least one person somewhere that needs that keys that can be null.

Well, I think Mark Byers answered perfectly, so just a simple example where null values and keys can be useful:
Imagine you have an expensive function that always returns the same result for the same input. A map is a simple way for caching its results. Maybe sometimes the function will return null, but you need to save it anyway, because the execution is expensive. So, null values must be stored. The same applies to null key if it's an accepted input for the function.

HashTable is very old class , from JDK 1.0.The classes which are in place from JDK 1.0 are called Legacy classes and by default they are synchronized.
To understand this, first of all you need to understand comments written on this class by author.
“This class implements a hashtable, which maps keys to values. Any non-null object can be used as a key or as a value. To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method.”
HashTable class is implemented on hashing mechanism, means to store any key-value pair, its required hash code of key object. HashTable calculates a hash for each key by calling hashCode on each key. This would fail If key would be null, it will not be able to give hash for null key, it will throw NullPointerException and similar is the case for value, throwing null if the value is null.
But later on it was realised that null key and value has its own importance, then revised implementations of HashTable were introduced like HashMap which allow one null key and multiple null values.
For HashMap, it allows one null key and there is a null check for keys, if the key is null then that element will be stored in a zero location in Entry array.
We cannot have more than one Null key in HashMap because Keys are unique therefor only one Null key and many Null values are allowed.
USE - Null key we can use for some default value.
Modified and better implementation of HashTable was later introduced as ConcurrentHashMap.

In addition to what answered by Mark Bayers,,
Null is considered as data and it has to be stored as a value for further checking. In many cases null as value can be used to check if key entry is there but no value is assigned to it, so some action can be taken accordingly. This can be done by first checking if key is there, and then getting value.
There is one more case in which just put whatever data is coming(without any check). All the checks are applied to it after getting it.
Whereas null as a key, i think can be used to define some default data. Usually null as a key does not make much sense.

Sir HashMap is also internally uses hashCode() method for inserting an element in HashMap, so I think this will be not the proper reason for "why HashTable allow null key"

It will make the Map interface easier to use / less verbose. null is a legitimate value for reference types. Making the map able to handle null keys and values will eliminate the need for null checking before calling the api. So the map api create less "surprises" during runtime.
For example, it is common that map will be used to categorize a collection of homogeneous objects based a single field. When map is compatible with null, the code will be more concise as it is just a simple loop without any if statement (of course you will need to make sure the collection does not have null elements). Fewer lines of code with no branch / exception handling will be more likely to be logically correct.
On the other hand, not allowing null will not make the map interface better/safer/easier to use. It is not practical to rely on the map to reject nulls - that means exception will be thrown and you have to catch and handle it. Or, to get rid of the exception you will have to ensure nothing is null before calling the map methods - in which case you don't care if map accepts null since you filtered the input anyway.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.