Time complexity of creating hash value of a string in hashtable

Time complexity of creating hash value of a string in hashtable - java

It's usually said that inserting and finding a string in a hash table is O(1). But how is hash key of a string made? Why it's not considered O(L), length of string?
It is clear to me that why for integers it is O(1), but not for strings.
I do understand why in general, inserting into a hash table is O(1), but I am confused about the step before inserting the hash into table: making the hash value.
Also is there any difference between how hash keys for strings are generated in java and unordered_map in C++?
Thanks.

Inserting etc. in a hashtable is O(1) in the sense that it is constant (or more precisely, bounded) in regard to the number of elements in the table.
The "O(1)" in this context makes no claim about how fast you can compute your hashes. If the effort for this grows in some way, that is the way it is. However, I find it unlikely that the complexity of a decent (i.e. "fit for this application") hash function will ever be worse than linear in the "size" (i.e. the length in our string-example) of the object being hashed.

It's usually said that inserting and finding a string in a hashtable is O(1). But how is hash key of a string made ? Why it's not O(L), length of string? It's clear for me that why for integers it's O(1), but not for strings.
The O(1) commonly quoted means the time doesn't grow with the number of elements in the container. As you say, the time to generate a hash value from a string might not itself be O(1) in the length of the string - though for some implementations it is: for example Microsoft's C++ std::hash<std::string> has:
size_t _Val = 2166136261U;
size_t _First = 0;
size_t _Last = _Keyval.size();
size_t _Stride = 1 + _Last / 10;
if (_Stride < _Last)
_Last -= _Stride;
for(; _First < _Last; _First += _Stride)
_Val = 16777619U * _Val ^ (size_t)_Keyval[_First];
return (_Val);
The _Stride is a tenth of the string length, so a fixed number of characters that far apart will be incorporated in the hash value. Such a hash function is O(1) in the length of the string.
GCC's C++ Standard library takes a different approach: in v4.7.2 at least, it calls down through a _Hash_impl support class to the static non-member function _Hash_bytes, which does a Murmur hash incorporating every byte. GCC's hash<std::string> is therefore O(N) in the length of the string.
GCC's higher prioritorisation of collision minimisation is also evident in its use of prime numbers of buckets for std::unordered_set and std::unordered_map, which MS's implementation doesn't do - at least up until VS2013/VC12; summarily MS's approach will be lighter-weight/faster for keys that aren't collision prone, and at lower load factors, but degrades earlier and more dramatically otherwise.
And is there any difference between how hash keys for strings are produced between hashTable in java and unordered_map in C++?
How strings are hashed is not specified by the C++ Standard - it's left to the individual compiler implementations. Consequently, different compromises are struck by different compilers - even different versions of the same compiler.
The documentation David Pérez Cabrera's answer links to explains the hashCode function in Java:
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
That's clearly O(N) in the length of the string.
Returning quickly to...
It's usually said that inserting and finding a string in a hashtable is O(1).
...a "key" ;-P insight is that in many problem domains, the real-world lengths of the strings is known not to vary significantly, or hashing for the worst-case length is still plenty fast enough. Consider a person's or company's name, a street address, an identifier from some source code, a programming-language keyword, a product/book/CD etc name: you can expect a billion keys to take roughly a million times more memory to store than the first thousand. With a hash table, most operations on the entire data set can be expected to take a million times longer. And this will be as true in 100 years' time as it is today. Importantly, if some request comes in related to a single key, it shouldn't take much longer to perform than it used to with a thousand keys (assuming sufficient RAM, and ignoring CPU caching effects) - though sure, if it's a long key it may take longer than for a short key, and if you have ultra-low-latency or hard-realtime requirements, you may care. But, the average throughput for requests with random keys will be constant despite having a million times more data.
Only when you have a problem domain with massive variance in key size and the key-hashing time is significant given your performance needs, or where you expect the average key size to increase over time (e.g. if the keys are video streams, and every few years people are bumping up resolutions and frame rates creating an exponential growth in key size), will you need to pay close attention to the hashing (and key comparison) costs.

Acording to implementation of Java, Hashtable use the hashCode method of key (String or Integer).
Hashtable
String.hashCode
Integer.hashCode
And C++ use std::hash<std::string> or std::hash<int> according to http://en.cppreference.com/w/cpp/utility/hash and the implementation was in functional file (/path/to/c++... /include/c++/4.8/functional)

The complexity of a hashing function is never O(1). If the length of the string is n then the complexity is surely O(n). However, if you compute all hashes in a given array, you won't have to calculate for the second time and you can always compare two strings in O(1) time by comparing the precalculated hashes.

Related

Java: A "prime" number or a "power of two" as HashMap size?

Many books and tutorials say that the size of a hash table must be a prime to evenly distribute the keys in all the buckets. But Java's HashMap always uses a size that is a power of two. Shouldn't it be using a prime? What's better, a "prime" or a "power of two" as the hash table size?

Using a power of two effectively masks out top bits of the hash code. Thus a poor-quality hash function might perform particularly badly in this scenario.
Java's HashMap mitigates this by mistrusting the object's hashCode() implementation and applying a second level of hashing to its result:
Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.
If you have a good hash function, or do something similar to what HashMap does, it does not matter whether you use prime numbers etc as the table size.
If, on the other hand, the hash function is of unknown or poor quality, then using a prime number would be a safer bet. It will, however, make dynamically-sized tables tricker to implement, since all of a sudden you need to be able to produce prime numbers instead of just multiplying the size by a constant factor.

The standard HashMap implementation has a hash method which rehashes your object's hashcode to avoid that pitfall. The comment before the hash() method reads:
/**
* Retrieve object hash code and applies a supplemental hash function to the
* result hash, which defends against poor quality hash functions. This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/

The only way to know which is better between prime and power-of-two is to benchmark it.
Many years ago, when writing an assembler whose performance depended strongly on symbol talbe lookup, I tested this using a large block of generated identifiers. Even with a naive mapping, I found that power-of-two, as expected, had less even distribution and longer chains than a similar sized prime number of buckets. It still ran faster, because of the speed of bucket selection by bit masking.
I strongly suspect the java.util developers would not have resorted to the extra hashing and power-of-two without benchmarking it against using a prime number of buckets. It is a really obvious thing to do when designing a hashed data structure.
For that reason, I'm sure the rehash and power-of-two size gives better performance for typical Java hash maps than a prime number of buckets.

From a performance/calculation time point of view power-of-two sizes can be calculated with just bit masking which is faster than integer modulo operation which would be required otherwise.

You probably should use prime sized hash tables if you use quadratic probing for collision resolution. If you have a prime sized table, quadratic probing will hit half of the entries, less if it is not a prime. So you might not find a suitable place to store you entry even if your hash table is less than half full. Since Java hash maps don't use quadratic probing, there is no need to use primes as size.

A good hash function to use in interviews for integer numbers, strings?

I have come across situations in an interview where I needed to use a hash function for integer numbers or for strings. In such situations which ones should we choose ? I've been wrong in these situations because I end up choosing the ones which have generate lot of collisions but then hash functions tend to be mathematical that you cannot recollect them in an interview. Are there any general recommendations so atleast the interviewer is satisfied with your approach for integer numbers or string inputs? Which functions would be adequate for both inputs in an "interview situation"

Here is a simple recipe from Effective java page 33:
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the
equals method, that is), do the following:
Compute an int hash code c for the field:
If the field is a boolean, compute (f ? 1 : 0).
If the field is a byte, char, short, or int, compute (int) f.
If the field is a long, compute (int) (f ^ (f >>> 32)).
If the field is a float, compute Float.floatToIntBits(f).
If the field is a double, compute Double.doubleToLongBits(f), and
then hash the resulting long as in step 2.1.iii.
If the field is an object reference and this class’s equals method
compares the field by recursively invoking equals, recursively
invoke hashCode on the field. If a more complex comparison is
required, compute a “canonical representation” for this field and
invoke hashCode on the canonical representation. If the value of the
field is null, return 0 (or some other constant, but 0 is traditional).
48 CHAPTER 3 METHODS COMMON TO ALL OBJECTS
If the field is an array, treat it as if each element were a separate field.
That is, compute a hash code for each significant element by applying
these rules recursively, and combine these values per step 2.b. If every
element in an array field is significant, you can use one of the
Arrays.hashCode methods added in release 1.5.
Combine the hash code c computed in step 2.1 into result as follows:
result = 31 * result + c;
Return result.
When you are finished writing the hashCode method, ask yourself whether
equal instances have equal hash codes. Write unit tests to verify your intuition!
If equal instances have unequal hash codes, figure out why and fix the problem.

You should ask the interviewer what the hash function is for - the answer to this question will determine what kind of hash function is appropriate.
If it's for use in hashed data structures like hashmaps, you want it to be a simple as possible (fast to execute) and avoid collisions (most common values map to different hash values). A good example is an integer hashing to the same integer - this is the standard hashCode() implementation in java.lang.Integer
If it's for security purposes, you will want to use a cryptographic hash function. These are primarily designed so that it is hard to reverse the hash function or find collisions.
If you want fast pseudo-random-ish hash values (e.g. for a simulation) then you can usually modify a pseudo-random number generator to create these. My personal favourite is:
public static final int hash(int a) {
a ^= (a << 13);
a ^= (a >>> 17);
a ^= (a << 5);
return a;
}
If you are computing a hash for some form of composite structure (e.g. a string with multiple characters, or an array, or an object with multiple fields), then there are various techniques you can use to create a combined hash function. I'd suggest something that XORs the rotated hash values of the constituent parts, e.g.:
public static <T> int hashCode(T[] data) {
int result=0;
for(int i=0; i<data.length; i++) {
result^=data[i].hashCode();
result=Integer.rotateRight(result, 1);
}
return result;
}
Note the above is not cryptographically secure, but will do for most other purposes. You will obviously get collisions but that's unavoidable when hashing a large structure to a integer :-)

For integers, I usually go with k % p where p = size of the hash table and is a prime number and for strings I choose hashcode from String class. Is this sufficient enough for an interview with a major tech company? – phoenix 2 days ago
Maybe not. It's not uncommon to need to provide a hash function to a hash table whose implementation is unknown to you. Further, if you hash in a way that depends on the implementation using a prime number of buckets, then your performance may degrade if the implementation changes due to a new library, compiler, OS port etc..
Personally, I think the important thing at interview is a clear understanding of the ideal characteristics of a general-purpose hash algorithm, which is basically that for any two input keys with values varying by as little as one bit, each and every bit in the output has about 50/50 chance of flipping. I found that quite counter-intuitive because a lot of the hashing functions I first saw used bit-shifts and XOR and a flipped input bit usually flipped one output bit (usually in another bit position, so 1-input-bit-affects-many-output-bits was a little revelation moment when I read it in one of Knuth's books. With this knowledge you're at least capable of testing and assessing specific implementations regardless of how they're implemented.
One approach I'll mention because it achieves this ideal and is easy to remember, though the memory usage may make it slower than mathematical approaches (could be faster too depending on hardware), is to simply use each byte in the input to look up a table of random ints. For example, given a 24-bit RGB value and int table[3][256], table[0][r] ^ table[1][g] ^ table[2][b] is a great sizeof int hash value - indeed "perfect" if inputs are randomly scattered through the int values (rather than say incrementing - see below). This approach isn't ideal for long or arbitrary-length keys, though you can start revisiting tables and bit-shift the values etc..
All that said, you can sometimes do better than this randomising approach for specific cases where you are aware of the patterns in the input keys and/or the number of buckets involved (for example, you may know the input keys are contiguous from 1 to 100 and there are 128 buckets, so you can pass the keys through without any collisions). If, however, the input ceases to meet your expectations, you can get horrible collision problems, while a "randomising" approach should never get much worse than load (size() / buckets) implies. Another interesting insight is that when you want a quick-and-mediocre hash, you don't necessarily have to incorporate all the input data when generating the hash: e.g. last time I looked at Visual C++'s string hashing code it picked ten letters evenly spaced along the text to use as inputs....

Hash : How does it work internally?

This might sound as an very vague question upfront but it is not. I have gone through Hash Function description on wiki but it is not very helpful to understand.
I am looking simple answers for rather complex topics like Hashing. Here are my questions:
What do we mean by hashing? How does it work internally?
What algorithm does it follow ?
What is the difference between HashMap, HashTable and HashList ?
What do we mean by 'Constant Time Complexity' and why does different implementation of the hash gives constant time operation ?
Lastly, why in most interview questions Hash and LinkedList are asked, is there any specific logic for it from testing interviewee's knowledge?
I know my question list is big but I would really appreciate if I can get some clear answers to these questions as I really want to understand the topic.

Here is a good explanation about hashing. For example you want to store the string "Rachel" you apply a hash function to that string to get a memory location. myHashFunction(key: "Rachel" value: "Rachel") --> 10. The function may return 10 for the input "Rachel" so assuming you have an array of size 100 you store "Rachel" at index 10. If you want to retrieve that element you just call GetmyHashFunction("Rachel") and it will return 10. Note that for this example the key is "Rachel" and the value is "Rachel" but you could use another value for that key for example birth date or an object. Your hash function may return the same memory location for two different inputs, in this case you will have a collision you if you are implementing your own hash table you have to take care of this maybe using a linked list or other techniques.
Here are some common hash functions used. A good hash function satisfies that: each key is equally likely to hash to any of the n memory slots independently of where any other key has hashed to. One of the methods is called the division method. We map a key k into one of n slots by taking the remainder of k divided by n. h(k) = k mod n. For example if your array size is n = 100 and your key is an integer k = 15 then h(k) = 10.
Hashtable is synchronised and Hashmap is not.
Hashmap allows null values as key but Hashtable does not.
The purpose of a hash table is to have O(c) constant time complexity in adding and getting the elements. In a linked list of size N if you want to get the last element you have to traverse all the list until you get it so the complexity is O(N). With a hash table if you want to retrieve an element you just pass the key and the hash function will return you the desired element. If the hash function is well implemented it will be in constant time O(c) This means you dont have to traverse all the elements stored in the hash table. You will get the element "instantly".
Of couse a programer/developer computer scientist needs to know about data structures and complexity =)

Hashing means generating a (hopefully) unique number that represents a value.
Different types of values (Integer, String, etc) use different algorithms to compute a hashcode.
HashMap and HashTable are maps; they are a collection of unqiue keys, each of which is associated with a value.
Java doesn't have a HashList class. A HashSet is a set of unique values.
Getting an item from a hashtable is constant-time with regard to the size of the table.
Computing a hash is not necessarily constant-time with regard to the value being hashed.
For example, computing the hash of a string involves iterating the string, and isn't constant-time with regard to the size of the string.
These are things that people ought to know.

Hashing is transforming a given entity (in java terms - an object) to some number (or sequence). The hash function is not reversable - i.e. you can't obtain the original object from the hash. Internally it is implemented (for java.lang.Object by getting some memory address by the JVM.
The JVM address thing is unimportant detail. Each class can override the hashCode() method with its own algorithm. Modren Java IDEs allow for generating good hashCode methods.
Hashtable and hashmap are the same thing. They key-value pairs, where keys are hashed. Hash lists and hashsets don't store values - only keys.
Constant-time means that no matter how many entries there are in the hashtable (or any other collection), the number of operations needed to find a given object by its key is constant. That is - 1, or close to 1
This is basic computer-science material, and it is supposed that everyone is familiar with it. I think google have specified that the hashtable is the most important data-structure in computer science.

I'll try to give simple explanations of hashing and of its purpose.
First, consider a simple list. Each operation (insert, find, delete) on such list would have O(n) complexity, meaning that you have to parse the whole list (or half of it, on average) to perform such an operation.
Hashing is a very simple and effective way of speeding it up: consider that we split the whole list in a set of small lists. Items in one such small list would have something in common, and this something can be deduced from the key. For example, by having a list of names, we could use first letter as the quality that will choose in which small list to look. In this way, by partitioning the data by the first letter of the key, we obtained a simple hash, that would be able to split the whole list in ~30 smaller lists, so that each operation would take O(n)/30 time.
However, we could note that the results are not that perfect. First, there are only 30 of them, and we can't change it. Second, some letters are used more often than others, so that the set with Y or Z will be much smaller that the set with A. For better results, it's better to find a way to partition the items in sets of roughly same size. How could we solve that? This is where you use hash functions. It's such a function that is able to create an arbitrary number of partitions with roughly the same number of items in each. In our example with names, we could use something like
int hash(const char* str){
int rez = 0;
for (int i = 0; i < strlen(str); i++)
rez = rez * 37 + str[i];
return rez % NUMBER_OF_PARTITIONS;
};
This would assure a quite even distribution and configurable number of sets (also called buckets).

What do we mean by Hashing, how does
it work internally ?
Hashing is the transformation of a string shorter fixed-length value or key that represents the original string. It is not indexing. The heart of hashing is the hash table. It contains array of items. Hash tables contain an index from the data item's key and use this index to place the data into the array.
What algorithm does it follow ?
In simple words most of the Hash algorithms work on the logic "index = f(key, arrayLength)"
Lastly, why in most interview
questions Hash and LinkedList are
asked, is there any specific logic for
it from testing interviewee's
knowledge ?
Its about how good you are at logical reasoning. It is most important data-structure that every programmers know it.

Why Object#hashCode() returns int instead of long

why not:
public native long hashCode();
instead of:
public native int hashCode();
for higher chance of achieving unique hash codes?

Because the maximum length of an array is Integer.MAX_VALUE.
Since the prime use of hashCode() is to determine which slot to insert an object into in the backing array of a HashMap/Hashtable, a hashcode > Integer.MAX_VALUE would not be able to be stored in the array.

Anyway, the hash code value will be used to determine a number of row in a table which is relatively small value.
In HashMap, for instance, the default table contains 256 rows only 16 rows (Sun JDK 1.6.0_17). This means that the row number is determined in the way like this:
int rowNumber = obj.hashCode() % rowsCount;
So, the real distribution is from 0 to rowsCount.
UPD: I remember the implementation of ConcurrentHashMap. In a nutshell, ConcurrentHashMap contains many relatively small tables. At first the hashCode function is used to determine the table number, and after that the same function is used to determine a row in the selected table.
This approach removes the limitation of array size (and even allows to build distributed hash table).
So, I incline to the conclusion that hashCode returns int because it covers the vast majority of use cases.

I'd assume it's a balance of computation cost vs. hash range. Hashcodes are so frequently referenced that pushing around twice as much data every time you need a hash would be expensive, especially if you consider more common use cases -
for example - if you create a small hash with 10, or 100, or 1000 values, the difference in the number of hash collisions you're going to see will be extremely negligible. For larger hashes, ... well, think of how big a hash will need to be for 10**32 values to start having frequent collisions, and whether that's even possible to do in a JVM given the amount of memory you'd need.

Is a Java hashmap search really O(1)?

I've seen some interesting claims on SO re Java hashmaps and their O(1) lookup time. Can someone explain why this is so? Unless these hashmaps are vastly different from any of the hashing algorithms I was bought up on, there must always exist a dataset that contains collisions.
In which case, the lookup would be O(n) rather than O(1).
Can someone explain whether they are O(1) and, if so, how they achieve this?

A particular feature of a HashMap is that unlike, say, balanced trees, its behavior is probabilistic. In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be. For a hash map, that of course is the case of a collision with respect to how full the map happens to be. A collision is pretty easy to estimate.
pcollision = n / capacity
So a hash map with even a modest number of elements is pretty likely to experience at least one collision. Big O notation allows us to do something more compelling. Observe that for any arbitrary, fixed constant k.
O(n) = O(k * n)
We can use this feature to improve the performance of the hash map. We could instead think about the probability of at most 2 collisions.
pcollision x 2 = (n / capacity)2
This is much lower. Since the cost of handling one extra collision is irrelevant to Big O performance, we've found a way to improve performance without actually changing the algorithm! We can generalzie this to
pcollision x k = (n / capacity)k
And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for. You could get the probability to an arbitrarily tiny level by choosing the correct k, all without altering the actual implementation of the algorithm.
We talk about this by saying that the hash-map has O(1) access with high probability

You seem to mix up worst-case behaviour with average-case (expected) runtime. The former is indeed O(n) for hash tables in general (i.e. not using a perfect hashing) but this is rarely relevant in practice.
Any dependable hash table implementation, coupled with a half decent hash, has a retrieval performance of O(1) with a very small factor (2, in fact) in the expected case, within a very narrow margin of variance.

In Java, how HashMap works?
Using hashCode to locate the corresponding bucket [inside buckets container model].
Each bucket is a LinkedList (or a Balanced Red-Black Binary Tree under some conditions starting from Java 8) of items residing in that bucket.
The items are scanned one by one, using equals for comparison.
When adding more items, the HashMap is resized (doubling the size) once a certain load percentage is reached.
So, sometimes it will have to compare against a few items, but generally, it's much closer to O(1) than O(n) / O(log n).
For practical purposes, that's all you should need to know.

Remember that o(1) does not mean that each lookup only examines a single item - it means that the average number of items checked remains constant w.r.t. the number of items in the container. So if it takes on average 4 comparisons to find an item in a container with 100 items, it should also take an average of 4 comparisons to find an item in a container with 10000 items, and for any other number of items (there's always a bit of variance, especially around the points at which the hash table rehashes, and when there's a very small number of items).
So collisions don't prevent the container from having o(1) operations, as long as the average number of keys per bucket remains within a fixed bound.

I know this is an old question, but there's actually a new answer to it.
You're right that a hash map isn't really O(1), strictly speaking, because as the number of elements gets arbitrarily large, eventually you will not be able to search in constant time (and O-notation is defined in terms of numbers that can get arbitrarily large).
But it doesn't follow that the real time complexity is O(n)--because there's no rule that says that the buckets have to be implemented as a linear list.
In fact, Java 8 implements the buckets as TreeMaps once they exceed a threshold, which makes the actual time O(log n).

O(1+n/k) where k is the number of buckets.
If implementation sets k = n/alpha then it is O(1+alpha) = O(1) since alpha is a constant.

If the number of buckets (call it b) is held constant (the usual case), then lookup is actually O(n).
As n gets large, the number of elements in each bucket averages n/b. If collision resolution is done in one of the usual ways (linked list for example), then lookup is O(n/b) = O(n).
The O notation is about what happens when n gets larger and larger. It can be misleading when applied to certain algorithms, and hash tables are a case in point. We choose the number of buckets based on how many elements we're expecting to deal with. When n is about the same size as b, then lookup is roughly constant-time, but we can't call it O(1) because O is defined in terms of a limit as n → ∞.

Elements inside the HashMap are stored as an array of linked list (node), each linked list in the array represents a bucket for unique hash value of one or more keys.
While adding an entry in the HashMap, the hashcode of the key is used to determine the location of the bucket in the array, something like:
location = (arraylength - 1) & keyhashcode
Here the & represents bitwise AND operator.
For example: 100 & "ABC".hashCode() = 64 (location of the bucket for the key "ABC")
During the get operation it uses same way to determine the location of bucket for the key. Under the best case each key has unique hashcode and results in a unique bucket for each key, in this case the get method spends time only to determine the bucket location and retrieving the value which is constant O(1).
Under the worst case, all the keys have same hashcode and stored in same bucket, this results in traversing through the entire list which leads to O(n).
In the case of java 8, the Linked List bucket is replaced with a TreeMap if the size grows to more than 8, this reduces the worst case search efficiency to O(log n).

We've established that the standard description of hash table lookups being O(1) refers to the average-case expected time, not the strict worst-case performance. For a hash table resolving collisions with chaining (like Java's hashmap) this is technically O(1+α) with a good hash function, where α is the table's load factor. Still constant as long as the number of objects you're storing is no more than a constant factor larger than the table size.
It's also been explained that strictly speaking it's possible to construct input that requires O(n) lookups for any deterministic hash function. But it's also interesting to consider the worst-case expected time, which is different than average search time. Using chaining this is O(1 + the length of the longest chain), for example Θ(log n / log log n) when α=1.
If you're interested in theoretical ways to achieve constant time expected worst-case lookups, you can read about dynamic perfect hashing which resolves collisions recursively with another hash table!

It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions.
Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.

This basically goes for most hash table implementations in most programming languages, as the algorithm itself doesn't really change.
If there are no collisions present in the table, you only have to do a single look-up, therefore the running time is O(1). If there are collisions present, you have to do more than one look-up, which drives down the performance towards O(n).

It depends on the algorithm you choose to avoid collisions. If your implementation uses separate chaining then the worst case scenario happens where every data element is hashed to the same value (poor choice of the hash function for example). In that case, data lookup is no different from a linear search on a linked list i.e. O(n). However, the probability of that happening is negligible and lookups best and average cases remain constant i.e. O(1).

Only in theoretical case, when hashcodes are always different and bucket for every hash code is also different, the O(1) will exist. Otherwise, it is of constant order i.e. on increment of hashmap, its order of search remains constant.

Academics aside, from a practical perspective, HashMaps should be accepted as having an inconsequential performance impact (unless your profiler tells you otherwise.)

Of course the performance of the hashmap will depend based on the quality of the hashCode() function for the given object. However, if the function is implemented such that the possibility of collisions is very low, it will have a very good performance (this is not strictly O(1) in every possible case but it is in most cases).
For example the default implementation in the Oracle JRE is to use a random number (which is stored in the object instance so that it doesn't change - but it also disables biased locking, but that's an other discussion) so the chance of collisions is very low.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Time complexity of creating hash value of a string in hashtable - java

Related

Java: A "prime" number or a "power of two" as HashMap size?

A good hash function to use in interviews for integer numbers, strings?

Hash : How does it work internally?

Why Object#hashCode() returns int instead of long

Is a Java hashmap search really O(1)?

Categories

Resources