Storing more objects in HashMap than range of int [duplicate] - java

This question already has answers here:
Theoretical limit for number of keys (objects) that can be stored in a HashMap?
(4 answers)
Closed 5 years ago.
I was reading about HashMap. HashCodereturns int value. What if i have Huge Huge HashMap, which needs to store more objects than int range. Consider that for every object HashCode() method will returns unique value. In this case what will happen
Is any exception thrown ? Or
It will behave randomly?

You mean storing more than 2 billion entries? Java collections or maps can't do this, their size is always an int value.
There are 3rd party libraries for huge maps.
Are you sure you can store these many objects in memory anyway? One object takes at least 24 bytes (you will be out of the range of Compressed OOPS), so you will be using beyond 100 gigabytes of RAM, and that is with very small objects stored in the HashMap.
PS: I don't understand what you mean with "hashCode returning a unique value". Hash codes don't have to be unique. For a 2+ billion entry hash map, a 32 bit hash code is a bit weak, but still theoretically possible.

Related

How does Java Hashtable calculate where to place an element based on hashcode? [duplicate]

This question already has answers here:
How does a hash table work?
(17 answers)
Closed 2 years ago.
In Java, Hashtable has buckets whose quantity is equal to its capacity. Now how does it determine that it has to store an object in a particular bucket? I know it uses hashcode of the object but hashcode is a weird long string, what does hashtable do to the hashcode to determine place of entry?
Implementation-dependent (as in, if you rely on it working this way, your code is broken; the things HashMap guarantees are spelled out in its javadoc, and none of what I'm about to type is in there):
hashes are just a number. Between about -2billion and +2billion. That 'long weird string' you see is just a more convenient way to show it to you.
First, the higher digits of that number are mixed into the lower digits (actually, the higher bits are XORed into the lower ones): 12340005 is turned into 12341239.
Then, that number is divided by how many buckets there currently are, but the result is tossed out, it's the remainder we are interested in. This remainder is necessarily 0 or higher, and never more than '# of buckets there are', so always points exactly at one of the buckets.
That's the bucket that the object goes into.
if buckets grow too large, a resizing is done.
For more, well, HashMap is open source, as is HashSet - just have a look.
For behavior as of jdk7 see:
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/util/Hashtable.java#L358
int index = (hash & 0x7FFFFFFF) % tab.length;
This is a common technique for hash tables. First bit is discarded (to make the value positive). The index is the remainder from division by table size.
I know it uses hashcode of the object but hashcode is a weird long string, what does hashtable do to the hashcode to determine place of entry?
A hashcode is not a "weird long string". It is a 32 bit signed integer.
(I think you are confusing the hashcode and what you get when you call Object::toString ... which is a string consisting of a hashcode AND a Java internal type name.)
So what HashMap and HashTable (and HashSet and LinkedHashMap) actually do is:
call hashCode() to get the 32 bit integer,
perform some implementation-specific mangling1 on the integer,
convert the mangled integer to a non-negative integer by removing the sign bit,
compute an array index (for the bucket) as value % array.length where array is the hash table's current array of hash chains (or trees).
1 - Some implementations of HashMap / HashTable perform some simple / cheap bitwise mangling. The purpose is to reduce clustering in the case that the low few bits of the hashcode values are not evenly distributed.

Algorithm used for bucket lookup for hashcodes [duplicate]

This question already has answers here:
What hashing function does Java use to implement Hashtable class?
(6 answers)
Closed 8 years ago.
In most cases, HashSet has lookup complexity O(1). I understand that this is because objects are kept in buckets corresponding to hashcodes of the object.
When lookup is done, it directly goes to the bucket and finds (using equals if many objects are present in same bucket) the element.
I always wonder, how it directly goes to the required bucket? Which algorithm is used for bucket lookup? Does that add nothing to total lookup time?
I always wonder, how it directly goes to the required bucket?
The hash code is treated and used as an index in to an array.
The index is determined by hash & (array.length - 1) because the length of the Java HashMap's internal array is always a power of 2. (This a cheaper computation of hash % array.length.)
Each "bucket" is actually a linked list (and now, possibly a tree) where entries with colliding hashes are grouped. If there are collisions, then a linear search through the bucket is performed.
Does that add nothing to total lookup time?
It incurs the cost of a few loads from memory.
Often, the algorithm is simply
hash = hashFunction(key)
index = hash % arraySize
See the wikipedia article on Hash Table for details.
From memory: the HashSet is actually backed by a HashMap and the basic look up process is:
Get the key
hash it (hashcode())
hashcode % the number of buckets
Go to that bucket and evaluate equals()
For a Set there would only be unique elements. I would suggest reading the source for HashSet and it should be able to answer your queries.
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.containsKey%28java.lang.Object%29
Also note that the Java 8 code has been updated and this explanation covers pre Java 8 codebase. I have not examined in detail the Java 8 implementation except to figure out that it is different.

How does HashMap determine where to put things? [duplicate]

This question already has answers here:
Internals of how the HashMap put() and get() methods work (basic logic only )
(3 answers)
Closed 9 years ago.
How does the add method in HashMap determine where a key goes in a HashMap? Like, if I was trying to put "S","T","A","C","K" into the HashMap of size 10, how does it determine where each letter goes?
The least significant bits of the object's hash code are used to select a bucket. Note there is no such thing as a java.util.HashMap of size 10, the size must be a power of 2 so that the bits can be masked to choose a bucket. If you pass 10 to the constructor, you will get a HashMap with 16 buckets back.
So, reducing to 8 bits for clarity, if "S" returns hashcode 123 java will do
01111011 & 00001111 -> 00001011
and put S in bucket 11.
The real Hash Map also applies a secondary hash function that shifts bits rightward to make sure there is data with some entropy in the least significant bits so that things have a good chance of being distributed evenly even if their hashCode function isn't that great.

Size of a list in Java [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How many data a list can hold at the maximum
What is the maximum number of elements a list can hold?
Assuming you mean implementations of the java.util.List interface, methods like get() and size() use int, so the upper theoretical boundary would be Integer.MAX_VALUE entries. You might run out of memory before you reach this limit though!
The index type in Java arrays is int too, so you're definitely limited to Integer.MAX_VALUE entries for regular arrays.
If you're talking about an ArrayList, they're indexed using integers (which are always signed in Java), so they can theoretically hold 2^31 elements (not 2^32). At that point you're probably going to have memory issues anyway.
LinkedList is also an implementation of List which stores the elements as a Linked List. So theoretically its size is equivalent to the amount of memory you can allocate.

How does the content of a Hashtable affect its size in memory?

If I have Hashtable A that has 5 million keys mapped to 5 million unique values, and I have Hashtable B that has 5 million keys mapped to 20 unique values, then approximately how much more memory would Hashtable A use compared to Hashtable B?
All of the keys and values are Strings that are approximately 20-50 characters in length.
My initial guess is that Hashtable A would take up roughly double the space as Hashtable B, but if you include the mappings then Hashtable B would use:
(5 million keys + 5 million mappings + 20 values) / (5 million keys + 5 million mappings + 5 million values) = .66
66.6% of the memory Hashtable A uses. However I don't know if a mapping would use as much space as a key or value if the keys and values are Strings.
Comments?
I don't think this has to do much with the hash table, since the "values" of the hash table are merely references to what I assume are the existing values. The increase in total cost would be based primarily on the size of a value. After all, you could have every key mapped to null.
Also, depending on the size of your keys, this may or may not have an impact. For example, a mapping from 5 million heavy objects (like strings) to 5 million lighter objects like Integers would not be that different from mapping 5 million heavy objects to 20 different values of Integer.
If you're storing literal strings, then the JVM may intern them, in which case the 20 key version would use significantly less memory (just how much less I don't know how to calculate). But for a standard hash table implementation that isn't subject to such magic, they would both use the same amount of memory, since each "bucket" will store a value, regardless of if that value is also stored in other buckets.

Categories