I want to know how hashtable orders its values after using Put method.
For example:
a b c d e
Normal 2 weeks Next Save and Finish Go to Cases
hashtable.put("a","Normal"); ...
The order of the values will be different and not in the same order we put.
I think the order will be like this:
b a e c d
2 weeks Normal Go to Cases Next Save and Finish
Please suggest data structures that solve the problem.
Thanks.
As very often in these cases, the answer is in the documentation:
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
Similar to HashMap, HashTable also does not guarantee insertion order for the elements.
Reason
HashTable is optimized for fast look up. This is achieved by calculating hash for key values stored. This ensures that searching for any value in HashTable is O(1), takes the same time irrespective the number entries in HashTable.
Thus the entries are stored based on the hash generated for key. This is the reason why HashTable does not guarantee the order of the elements in which they were inserted.
A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.
http://www.webopedia.com/TERM/H/hashing.html
http://interactivepython.org/runestone/static/pythonds/SortSearch/Hashing.html
As previously explained, hashtable iterate order is just casual. If you want to preserve inserted order use LinkedHashMap. If you want to obtain natural order or a predefined one, use TreeMap. As natural order I mean the order of key, for example String, Integer, Long and so on, as implement Comparable interface, are automatically sorted as any other class that implements Comparable. Predefined order can be supplied by a Comparator too, creating the TreeMap.
Related
I am learning Java now and I am learning about different kinds of collections, so far I learned about LinkedList, ArrayList and Array[].
Now I've been introduced to Hash types of collections, HashSet and HashMap, and I didn't quite understand why there are useful, because the list of commands that they support is quietly limited, also, they are sorted in a random order and I need to Override the equal and HashKey methods in order to make it work right with class.
Now, what I don't understand is the benefits over the hassle of using these types instead of ArrayList of a costume class.
I mean, what Map is doing is connecting 2 objects as 1, but wouldn't it just be better to create a class that contains this 2 objects as parameters, and have getters to modify and use them?
If the benefit is that this Hash objects can only contain 1 object of the same name, wouldn't it just be easier to make the ArrayList check that the type is not already there before adding it?
So far I learned to choose when to use LinkedList, ArrayList or Array[] by the rule of "if it's really simple, use Array[], if it's a bit more complex use ArrayList (for example to hold collection of certain class), and if the list is dynamic with a lot of objects inside that need to change order according to removing or adding a new one in the middle or go back and forth within the list then use LinkedList.
But I couldn't understand when to prefer HashMap or HashSet, and I would be really glad if you could explain it to me.
Let me help you out here...
Hashed collections are the most efficient to add, search and remove data, since they hash the key (in HashMap) or the element (in HashSet) to find the place where they belong in a single step.
The concept of hashing is really simple. It is the process of representing an object as a number that can work as it´s id.
For example, if you have a string in Java like String name = "Jeremy";, and you print its hashcode: System.out.println(name.hashCode());, you will see a big number there (-2079637766), that was created using that string object values (in this string object, it's characters), that way, that number can be used as an Id for that object.
So the Hashed collections like the ones mentioned above, use this number to use it as an array index to find the elements in no-time. But obviously is too big to use it as an array index for a possible small array. So they need to reduce that number so it fits in the range of the array size. (HashMap and HashSet use arrays to store their elements).
The operation that they use to reduce that number is called hashing, and is something like this: Math.abs(-2079637766 % arrayLength);.
It's not like that exactly, it's a bit more complex, but this is to simplify.
Let's say that arrayLength = 16;
The % operator will reduce that big number to a number smaller than 16, so that it can be fit in the array.
That is why a Hashed collection will not allow duplicate, because if you try to add the same object or an equivalent one (like 2 strings with the same characters), it will produce the same hashcode and will override whatever value is in the result index.
In your question, you mentioned that if you are worried about duplicates items in an ArrayList, we can just check if the item is there before inserting it, so this way we don't need to use a HashSet. But that is not a good idea, because if you call the method list.contains(elem); in an ArrayList, it needs to go one by one comparing the elements to see if it's there. If you have 1 million elements in the ArrayList, and you check if an element is there, but it is not there, the ArrayList iterated over 1 million elements, that is not good. But with a HashSet, it would only hashed the object and go directly where it is supposed to be in the array and check, doing it in just 1 step, instead of 1 million. So you see how efficient a HashSet is compared to an ArrayList.
The same happens with a HashMap of size 1 million, that it will only take 1 single step to check if a key is there, and not 1 million.
The same thing happens when you need to add, find and remove an element, with the hashed collections it will do all that in a single step (constant time, doesn't depend on the size of the map), but that varies for other structures.
That's why it is really efficient and widely used.
Main Difference between an ArrayList and a LinkedList:
If you want to find the element at place 500 in an ArrayList of size 1000, you do: list.get(500); and it will do that in a single step, because an ArrayList is implemented with an array, so with that 500, it goes directly where the element is in the array.
But a LinkedList is not implemented with an array, but with objects pointing to each other. This way, they need to go linearly and counting from 0, one by one until they get to the 500, which is not really efficient compared to the 1 single step of the ArrayList.
But when you need to add and remove elements in an ArrayList, sometimes the Array will need to be recreated so more elements fit in it, increasing the overhead.
But that doesn't happen with the LinkedList, since no array has to be recreated, only the objects (nodes) have to be re-referenced, which is done in a single step.
So an ArrayList is good when you won't be deleting or adding a lot of elements on the structure, but you are going to read a lot from it.
If you are going to add and remove a lot of elements, then is better a linked list since it has less work to do with those operations.
Why you need to implement the equals(), hashCode() methods for user-defined classes when you want to use those objects in HashMaps, and implement Comparable interface when you want to use those objects with TreeMaps?
Based on what I mentioned earlier for HashMaps, is possible that 2 different objects produce the same hash, if that happens, Java will not override the previous one or remove it, but it will keep them both in the same index. That is why you need to implement hashCode(), so you make sure that your objects will not have a really simple hashCode that can be easily duplicated.
And the reason why is recommended to override the equals() method is that if there is a collision (2 or more objects sharing the same hash in a HashMap), then how do you tell them apart? Well, asking the equals() method of those 2 objects if they are the same. So if you ask the map if it contains a certain key, and in that index, it finds 3 elements, it asks the equals() methods of those elements if its equals() to the key that was passed, if so, it returns that one. If you don't override the equals() method properly and specify what things you want to check for equality (like the properties name, age, etc.), then some unwanted overrides inside the HashMap will happen and you will not like it.
If you create your own classes, say, Person, and has properties like name, age, lastName and email, you can use those properties in the equals() method and if 2 different objects are passed but have the same values in your selected properties for equality, then you return true to indicate that they are the same, or false otherwise. Like the class String, that if you do s1.equals(s2); if s1 = new String("John"); and s2 = new String("John");, even though they are different objects in Java Heap Memory, the implementation of String.equals method uses the characters to determine if the objects are equals, and it returns true for this example.
To use a TreeMap with user-defined classes, you need to implement the Comparable interface, since the TreeMap will compare and sort the objects based on some properties, you need to specify by which properties your objects will be sorted. Will your objects be sorted by age? By name? By id? Or by any other property that you would like. Then, when you implement the Comparable interface and override the compareTo(UserDefinedClass o) method, you do your logic and return a positive number if the current object is greater than the o object passed, 0 if they are the same and a negative number if the current object is smaller. That way, the TreeMap will know how to sort them, based on the number returned.
First HashSet. In HashSet, you can easily get whether it contains given element. Let's have a set of people in your class and you want to ask whether a guy is in your class. You can make an array list of strings. And if you want to ask if a guy is in your class, you have to iterate through whole the list until you find him, which might be too slow for longer lists. If you use HashSet instead, the operation is much faster. You calculate the hash of the searched string and then you go directly to the hash, so you don't need to pass so many elements to answer your question. Well, you can also make a workaround to make the ArrayList faster to access for this purpose but this is already prepared.
And now HashMap. Now imagine that you also want to store a score for each person. So now you can use HashMap. You enter the name and you get his score in a short time, without the need of iterating through whole the data structure.
Does it make sense?
Concerning your question:
"But I couldn't understand when to prefer HashMap or HashSet, and I
would be really glad if you could explain it to me"
The HashMap implement the Map interface, to be used for mapping a Key (K) to a value (V) in constant time, and where order doesn't matter, so you can put and retrieve those data efficiently if you now the key.
And HashSet implement the Set interface, but is internanly using and HashMap, its role is to be used as a Set, meaning you're not supposed to retrieve an element, you just check that is in the set or not (mostly).
In HashMap, you can have identical value, while you can't in a Set (because its a property of a Set).
Concerning this question :
If the benefit is that this Hash objects can only contain 1 object of the same name, >wouldn't it just be easier to make the ArrayList check that the type is not already >there before adding it?
When dealing with collection, you have may base you choice of a particular one on the data representation but also on the way you want to access and store those data, how do you access it ? Do you need to sort them ? Because each implemenation may have different complexity (https://en.wikipedia.org/wiki/Time_complexity), it become important.
Using the doc,
For ArrayList:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).
For HashMap:
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So it's about the time complexity.
You may choose even more untypical collection for certain problems :).
This has little to do with Java specifically, and the choice depends mostly on performance requirements, but there's a fundamental difference that must be highlighted. Conceptually, Lists are types of collections that keep the order of insertion and may have duplicates, Sets are more like bags of items that have no specific order and no duplicates. Of course, different implementations may find a way around it (like a TreeSet).
First, let's check the difference between ArrayList and LinkedList. A linked list is a set of nodes, where each node contains a value and a link to the next and previous nodes. This makes inserting an element to a linked list a matter of appending a node to the end of the list, which is a quick operation since the memory does not have to be contiguous, as long as a node keeps a reference to the next node. On the other side, accessing a specific element requires transversing the entire list until finding it.
An array list, as the name implies, wraps an array. Accessing elements in an array by using its index is direct access, but inserting an element implies resizing the array to include the new element, so the memory it occupies is contiguous, making writes a bit heavier in this case.
A HashMap works like a dictionary, where for each key there's a value. The behavior of the insertion will mostly depend on how the hashCode and equals functions of the object used as a key are implemented. If the hashCode of two keys is the same, there's a hash collision, so equals will be used to understand if it's the same key or not. If equals is the same, then it's the same key, so the value is replaced. If not, the new value is added to the collection. Accessing and Writing values depends mostly on calculating the hash of the key followed by direct access to the value, making both operations really quick, O(1).
A set is pretty much like a hash map, without the "values" part, thus, it follows the same rules regarding the implementation of hashCode and equals operations for the added value.
It might be handy to study a bit about the Big-O notation and complexity of algorithms. If you are starting with Java, I'd strongly recommend the book Effective Java, by Joshua Bloch.
Hope it helps you dig further.
In the piece of code similar to
//something before
Iteration<String> iterator = hashMap.keySet().iterator();// HashMap<String, Document>
while(iterator.hasNext()){
System.out.println(iterator.next());
}
//something after
I know that the order of print can be different by the order of insertion of entry key, value; all right.
But if I call this piece in another moment, with re-create the variable hashMap and putting them the equal elements, can the second-moment time print be different from the first-time print?
My question was born by a problem with a web-app: I have a list of String in a JSP, but, after some years, the customer call because the order of the String was different in the morning, but it shows the usual order at the afternoon.
The problem is happened in only one day: the web-app uses the explained piece of code for take a Map and populate an ArrayList.
This ArrayList does'nt any explicit changement of order (no Comparator or similar classes).
I think (hope) that the cause of different order of print derives by a different sequence of iteration in the same HashMap at run-time and I looking for a validation by other people.
In the web, I read that the iteration order by a HashMap changes if the HashMap receives a modification: but what happens if the HashMap remains the same?
Hash map document says HashMap makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
that explains though the hashmap is same it can not guaranatee on order. for Ordered map you can use TreeMap or LinkedHashMap
TreeMap API says The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
HashMap API documentation states that
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
For a Map that keeps its keys in original insertion order, use LinkedHashMap.
For a Map that keeps its keys in sorted order (either natural order or by you passing a Comparator), use either TreeMap or ConcurrentSkipListMap. If multi-threaded, use the second.
For a Map where the key an enum, use EnumMap if you want the entries ordered by the definition order of the enum's objects.
The other six Map implementations bundled with Java 11 do not promise any order to their entries.
See this graphic table of mine as an overview.
Use a LinkedHashMap instead, to preserve insertion order. From the javadoc: "Hash table and linked list implementation of the Map interface, with predictable iteration order."
If you just want a Map with predictable ordering, then you can also use TreeMap. However, a LinkedHashMap is faster, as seen here: "TreeMap has O(log n) performance for containsKey, get, put, and remove, according to the Javadocs, while LinkedHashMap is O(1) for each."
As Octopus mentioned, HashMap "makes no guarantees as to the order of the map," and you shouldn't use it if order must remain consistent.
I am wondering if there is a more efficient method for getting objects out of my LinkedHashMap with timestamps greater than a specified time. I.e. something better than the following:
Iterator<Foo> it = foo_map.values().iterator();
Foo foo;
while(it.hasNext()){
foo = it.next();
if(foo.get_timestamp() < minStamp) continue;
break;
}
In my implementation, each of my objects has essentially three values: an "id," "timestamp," and "data." The objects are insterted in order of their timestamps, so when I call an iterator over the set, I get ordered results (as required by the linked hashmap contract). The map is keyed to the object's id, so I can quickly lookup them up by id.
When I look them up by a timestamp condition, however, I get an iterator with sorted results. This is an improvement over a generic hashmap, but I still need to iterate sequentially over much of the range until I find the next entry with a higher timestamp than the specified one.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential? If I went with a treemap as an alternative, would it offer overall speed advantages, or is it doing essentially the same thing in the background? Since the collection is sorted by insertion order already, I'm thinking tree map has a lot more overhead I don't need?
There is no faster way ... if you just use a LinkedHashMap.
If you want faster access, you need to use a different data structure. For example, a TreeSet with an appropriate comparator might be a better solution for this aspect of your problem. For example if your TreeSet is ordered by date, then calling tailSet with an appropriate dummy value can give you all elements greater or equal to a given date.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential?
Not for a LinkedHashMap.
However, if the ordered list was an ArrayList instead, then you could use "binary search" on the list ... provided that you could lock it to prevent concurrent modifications while you are searching. (Actually, concurrency is a potential issue to consider no matter how you implement this ... including your current linear search.)
If you want to keep the ability to do id lookups, then you need two data structures; e.g. a TreeSet and a HashMap which share their element objects. A TreeSet will probably be more efficient than trying to maintain an ArrayList in order assuming that there are random insertions and/or random deletions.
I am trying to figure something out about hashing in java.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)
Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.
A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)
A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)
where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.
When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:
1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
2. o2.equals(o2) == o2.equals(o1)
3. The hashcode of an object doesn't change while it is a key in a hash table.
It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.
"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"
The hash values are typically not stored. Rather they are calculated as required.
In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.
This is the most fundamental contract in Java: the .equals()/.hashCode() contract.
The most important part of it here is that two objects which are considered .equals() should return the same .hashCode().
The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following .hashCode() implementation, which, while perfectly legal, is as broken an implementation as can exist:
#Override
public int hashCode() { return 42; } // legal!!
While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the Set contract stipulates that a Set should not contain duplicate elements; however, the strategy of a Set implementation is left... Well, to the implementation. You will notice, if you look at the javadoc of Map, that its keys can be retrieved by a method called .keySet(). Therefore, Map and Set are very closely related in this regard.
If we take the case of a HashSet (and, ultimately, HashMap), it relies on .equals() and .hashCode(): when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, a TreeSet (and TreeMap) relies on the natural ordering of elements (see Comparable).
However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken, .hashCode() implementation above), then .equals() is used to determine whether that object is really unique.
Note that, internally, a HashSet is a HashMap...
Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.
HashMap stores key-value pair in Map.Entry static nested class implementation.
HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out
the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already
existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites
the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index
in the array and then use equals() method to find the correct Entry and return it’s value.
Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing.
HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied
by load factor and whenever we try to add an entry, if map size is greater than threshold,
HashMap rehashes the contents of map into a new array with a larger capacity.
The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs,
for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity
and load factor.
I have two arraylists A1, A2. Each element of A1 would be the key and each element of A2 the corresponding value. So the solution that I found is looping on A1 ( A1 and A2 have the same size) making hashmap.add(A1[i],A2[i]) but is there a way to directly send the key value pairs as two sets? I want to avoid loops it is going to slow my code..Thank you in advance!
"I want to avoid loops it is going to slow my code." Anytime code does anything, it slows down your code. The key is to avoid doing things you don't need to do. Something needs to iterate through your lists.
You have a list of key value pairs. The only way to take that list and add it into the Hashmap is to iterate the list. If you had your key value pairs stored in some other type of Map object then you could use the Hashmap constructor HashMap(Map<? extends K,? extends V> m) but not with your ArrayList.
Don't fear the iteration in your code if it is the right approach. Remember Polya: Find a solution, then see if you can find a better solution.
If you happen to be able to control the order in which you insert the items into the arrays, and you can do so in sorted order, you may not need the HashMap after all. While the hashed lookup will exhibit amortized constant time, you can get lookup in O(log n) time with binary search over a sorted random-access sequence like an array. Function Arrays#binarySearch() allows you to figure out which element, if any, matches your key in the first array and, given that position, you could the access the corresponding value in the parallel array.
This approach is most beneficial when you build up the data only once and look up entries frequently, and don't do any subsequent adding or removing of entries.
Maybe you would be able to avoid the iteration to insert the data into arrays doing it directly into the hashmap.
Good luck!