I am relatively new to Java and am puzzled about the following thing: I usually add objects to an ArrayList before setting its content. I.e.,
List<Bla> list = new ArrayList<>();
Bla bla = new Bla();
list.add(bla);
bla.setContent(); // content influences hashCode
This approach works great. I am concerned whether this approach will give me trouble when used with HashSets or HashMaps. The internal hash table get set at the time the object is added. What will happen if setContent() gets called after the object was added to HashSet or HashMap (and its hashCode changes)?
Should I fully set the (hashCode influencing) content before adding / putting into HashSets or HashMaps? Is it generally recommended to finish building objects before adding them?
Thank you very much for your insights.
What will happen if setContent() gets called after the object was added to HashSet or HashMap (and its hashCode changes)?
Disaster.
Should I fully set the (hashCode influencing) content before adding / putting into HashSets or HashMaps? Is it generally recommended to finish building objects before adding them?
Yes.
The relevant line of documentation is on java.util.Set:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Generally speaking, this sort of error will manifest itself with elements being both "in" and "not in" your collection, with different methods disagreeing. You may get lucky and your elements may appear to still be in the collection, or they may not; this may happen essentially at random.
This is one of the many, many reasons why it's excellent practice for most of your objects to be immutable -- completely impossible to modify in the first place after construction.
I am learning Java now and I am learning about different kinds of collections, so far I learned about LinkedList, ArrayList and Array[].
Now I've been introduced to Hash types of collections, HashSet and HashMap, and I didn't quite understand why there are useful, because the list of commands that they support is quietly limited, also, they are sorted in a random order and I need to Override the equal and HashKey methods in order to make it work right with class.
Now, what I don't understand is the benefits over the hassle of using these types instead of ArrayList of a costume class.
I mean, what Map is doing is connecting 2 objects as 1, but wouldn't it just be better to create a class that contains this 2 objects as parameters, and have getters to modify and use them?
If the benefit is that this Hash objects can only contain 1 object of the same name, wouldn't it just be easier to make the ArrayList check that the type is not already there before adding it?
So far I learned to choose when to use LinkedList, ArrayList or Array[] by the rule of "if it's really simple, use Array[], if it's a bit more complex use ArrayList (for example to hold collection of certain class), and if the list is dynamic with a lot of objects inside that need to change order according to removing or adding a new one in the middle or go back and forth within the list then use LinkedList.
But I couldn't understand when to prefer HashMap or HashSet, and I would be really glad if you could explain it to me.
Let me help you out here...
Hashed collections are the most efficient to add, search and remove data, since they hash the key (in HashMap) or the element (in HashSet) to find the place where they belong in a single step.
The concept of hashing is really simple. It is the process of representing an object as a number that can work as it´s id.
For example, if you have a string in Java like String name = "Jeremy";, and you print its hashcode: System.out.println(name.hashCode());, you will see a big number there (-2079637766), that was created using that string object values (in this string object, it's characters), that way, that number can be used as an Id for that object.
So the Hashed collections like the ones mentioned above, use this number to use it as an array index to find the elements in no-time. But obviously is too big to use it as an array index for a possible small array. So they need to reduce that number so it fits in the range of the array size. (HashMap and HashSet use arrays to store their elements).
The operation that they use to reduce that number is called hashing, and is something like this: Math.abs(-2079637766 % arrayLength);.
It's not like that exactly, it's a bit more complex, but this is to simplify.
Let's say that arrayLength = 16;
The % operator will reduce that big number to a number smaller than 16, so that it can be fit in the array.
That is why a Hashed collection will not allow duplicate, because if you try to add the same object or an equivalent one (like 2 strings with the same characters), it will produce the same hashcode and will override whatever value is in the result index.
In your question, you mentioned that if you are worried about duplicates items in an ArrayList, we can just check if the item is there before inserting it, so this way we don't need to use a HashSet. But that is not a good idea, because if you call the method list.contains(elem); in an ArrayList, it needs to go one by one comparing the elements to see if it's there. If you have 1 million elements in the ArrayList, and you check if an element is there, but it is not there, the ArrayList iterated over 1 million elements, that is not good. But with a HashSet, it would only hashed the object and go directly where it is supposed to be in the array and check, doing it in just 1 step, instead of 1 million. So you see how efficient a HashSet is compared to an ArrayList.
The same happens with a HashMap of size 1 million, that it will only take 1 single step to check if a key is there, and not 1 million.
The same thing happens when you need to add, find and remove an element, with the hashed collections it will do all that in a single step (constant time, doesn't depend on the size of the map), but that varies for other structures.
That's why it is really efficient and widely used.
Main Difference between an ArrayList and a LinkedList:
If you want to find the element at place 500 in an ArrayList of size 1000, you do: list.get(500); and it will do that in a single step, because an ArrayList is implemented with an array, so with that 500, it goes directly where the element is in the array.
But a LinkedList is not implemented with an array, but with objects pointing to each other. This way, they need to go linearly and counting from 0, one by one until they get to the 500, which is not really efficient compared to the 1 single step of the ArrayList.
But when you need to add and remove elements in an ArrayList, sometimes the Array will need to be recreated so more elements fit in it, increasing the overhead.
But that doesn't happen with the LinkedList, since no array has to be recreated, only the objects (nodes) have to be re-referenced, which is done in a single step.
So an ArrayList is good when you won't be deleting or adding a lot of elements on the structure, but you are going to read a lot from it.
If you are going to add and remove a lot of elements, then is better a linked list since it has less work to do with those operations.
Why you need to implement the equals(), hashCode() methods for user-defined classes when you want to use those objects in HashMaps, and implement Comparable interface when you want to use those objects with TreeMaps?
Based on what I mentioned earlier for HashMaps, is possible that 2 different objects produce the same hash, if that happens, Java will not override the previous one or remove it, but it will keep them both in the same index. That is why you need to implement hashCode(), so you make sure that your objects will not have a really simple hashCode that can be easily duplicated.
And the reason why is recommended to override the equals() method is that if there is a collision (2 or more objects sharing the same hash in a HashMap), then how do you tell them apart? Well, asking the equals() method of those 2 objects if they are the same. So if you ask the map if it contains a certain key, and in that index, it finds 3 elements, it asks the equals() methods of those elements if its equals() to the key that was passed, if so, it returns that one. If you don't override the equals() method properly and specify what things you want to check for equality (like the properties name, age, etc.), then some unwanted overrides inside the HashMap will happen and you will not like it.
If you create your own classes, say, Person, and has properties like name, age, lastName and email, you can use those properties in the equals() method and if 2 different objects are passed but have the same values in your selected properties for equality, then you return true to indicate that they are the same, or false otherwise. Like the class String, that if you do s1.equals(s2); if s1 = new String("John"); and s2 = new String("John");, even though they are different objects in Java Heap Memory, the implementation of String.equals method uses the characters to determine if the objects are equals, and it returns true for this example.
To use a TreeMap with user-defined classes, you need to implement the Comparable interface, since the TreeMap will compare and sort the objects based on some properties, you need to specify by which properties your objects will be sorted. Will your objects be sorted by age? By name? By id? Or by any other property that you would like. Then, when you implement the Comparable interface and override the compareTo(UserDefinedClass o) method, you do your logic and return a positive number if the current object is greater than the o object passed, 0 if they are the same and a negative number if the current object is smaller. That way, the TreeMap will know how to sort them, based on the number returned.
First HashSet. In HashSet, you can easily get whether it contains given element. Let's have a set of people in your class and you want to ask whether a guy is in your class. You can make an array list of strings. And if you want to ask if a guy is in your class, you have to iterate through whole the list until you find him, which might be too slow for longer lists. If you use HashSet instead, the operation is much faster. You calculate the hash of the searched string and then you go directly to the hash, so you don't need to pass so many elements to answer your question. Well, you can also make a workaround to make the ArrayList faster to access for this purpose but this is already prepared.
And now HashMap. Now imagine that you also want to store a score for each person. So now you can use HashMap. You enter the name and you get his score in a short time, without the need of iterating through whole the data structure.
Does it make sense?
Concerning your question:
"But I couldn't understand when to prefer HashMap or HashSet, and I
would be really glad if you could explain it to me"
The HashMap implement the Map interface, to be used for mapping a Key (K) to a value (V) in constant time, and where order doesn't matter, so you can put and retrieve those data efficiently if you now the key.
And HashSet implement the Set interface, but is internanly using and HashMap, its role is to be used as a Set, meaning you're not supposed to retrieve an element, you just check that is in the set or not (mostly).
In HashMap, you can have identical value, while you can't in a Set (because its a property of a Set).
Concerning this question :
If the benefit is that this Hash objects can only contain 1 object of the same name, >wouldn't it just be easier to make the ArrayList check that the type is not already >there before adding it?
When dealing with collection, you have may base you choice of a particular one on the data representation but also on the way you want to access and store those data, how do you access it ? Do you need to sort them ? Because each implemenation may have different complexity (https://en.wikipedia.org/wiki/Time_complexity), it become important.
Using the doc,
For ArrayList:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).
For HashMap:
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So it's about the time complexity.
You may choose even more untypical collection for certain problems :).
This has little to do with Java specifically, and the choice depends mostly on performance requirements, but there's a fundamental difference that must be highlighted. Conceptually, Lists are types of collections that keep the order of insertion and may have duplicates, Sets are more like bags of items that have no specific order and no duplicates. Of course, different implementations may find a way around it (like a TreeSet).
First, let's check the difference between ArrayList and LinkedList. A linked list is a set of nodes, where each node contains a value and a link to the next and previous nodes. This makes inserting an element to a linked list a matter of appending a node to the end of the list, which is a quick operation since the memory does not have to be contiguous, as long as a node keeps a reference to the next node. On the other side, accessing a specific element requires transversing the entire list until finding it.
An array list, as the name implies, wraps an array. Accessing elements in an array by using its index is direct access, but inserting an element implies resizing the array to include the new element, so the memory it occupies is contiguous, making writes a bit heavier in this case.
A HashMap works like a dictionary, where for each key there's a value. The behavior of the insertion will mostly depend on how the hashCode and equals functions of the object used as a key are implemented. If the hashCode of two keys is the same, there's a hash collision, so equals will be used to understand if it's the same key or not. If equals is the same, then it's the same key, so the value is replaced. If not, the new value is added to the collection. Accessing and Writing values depends mostly on calculating the hash of the key followed by direct access to the value, making both operations really quick, O(1).
A set is pretty much like a hash map, without the "values" part, thus, it follows the same rules regarding the implementation of hashCode and equals operations for the added value.
It might be handy to study a bit about the Big-O notation and complexity of algorithms. If you are starting with Java, I'd strongly recommend the book Effective Java, by Joshua Bloch.
Hope it helps you dig further.
I'd like to be able to determine whether I've encountered an object before - I have a graph implementation and I want to see if I've created a cycle, probably by iterating through the Node objects with a tortoise/hare floyd algorithm.
But I want to avoid a linear search through my list of "seen" nodes each time. This would be great if I had a hash table for just keys. Can I somehow hash an object? Aren't java objects just references to places in memory anyway? I wonder how much of a problem collisions would be if so..
The simple answer is to create a HashSet and add each node to the set the first time you encounter it.
The only case that this won't work is if you've overloaded hashCode() and equals(Object) for the node class to implement equality based on node contents (or whatever). Then you'll need to:
use the IdentityHashMap class which uses == and System.identityHashcode rather than equals(Object) and hashCode(), or
build a hashtable yourself using your own flavour of object identity.
Aren't java objects just references to places in memory anyway?
Yes and no. Yes, the reference is represented by a memory address (on most JVMs). The problem is that 1) you can't get hold of the address, and 2) it can change when the GC relocates the object. This means that you can't use the object address as a hashcode.
The identityHashCode method deals this by returning a value that is initially based on the memory address. If you then call identityHashCode again for the same object, you are guaranteed to get the same value as before ... even if the object has been relocated.
I wonder how much of a problem collisions would be if so..
The hash values produced by the identityHashCode method can collide. (That is, two distinct objects can have the same identity hashcode value.) Anything that uses these values has to deal with this. (The standard HashSet and IdentityHashMap classes take care of these collisions ... if you chose to use them.)
I'd like to be able to determine whether I've encountered an object
before
Use an IdentityHashMap. It is the ideal for your job since it is not an equals but a == implementation.
Take a look at HashSet. Note that in order for objects to work with HashSet, they need to provide correct implementations of hashCode and equals methods of the java.lang.Object class.
You'll need to implement a hash function for your objects. This is done by overriding hashCode() defined in java.lang.Object. This method is used by HashMap, HashSet etc to store objects. In hashCode() it's up to you to calculate a hash for the object. Don't forget to also implement the equals()-method!
Take a look at Java collection framework (http://docs.oracle.com/javase/tutorial/collections/)
Am I correct in assuming that if you have an object that is contained inside a Java Set<> (or as a key in a Map<> for that matter), any fields that are used to determine identity or relation (via hashCode(), equals(), compareTo() etc.) cannot be changed without causing unspecified behavior for operations on the collection? (edit: as alluded to in this other question)
(In other words, these fields should either be immutable, or you should require the object to be removed from the collection, then changed, then reinserted.)
The reason I ask is that I was reading the Hibernate Annotations reference guide and it has an example where there is a HashSet<Toy> but the Toy class has fields name and serial that are mutable and are also used in the hashCode() calculation... a red flag went off in my head and I just wanted to make sure I understood the implications of it.
The javadoc for Set says
Note: Great care must be exercised if
mutable objects are used as set
elements. The behavior of a set is not
specified if the value of an object is
changed in a manner that affects
equals comparisons while the object is
an element in the set. A special case
of this prohibition is that it is not
permissible for a set to contain
itself as an element.
This simply means you can use mutable objects in a set, and even change them. You just should make sure the change doesn't impact the way the Set finds the items. For HashSet, that would require not changing the fields used for calculating hashCode().
That is correct, it can cause some problems locating the map entry. Officially the behavior is undefined, so if you add it to a hashset or as a key in a hashmap, you should not be changing it.
Yes, that will cause bad things to happen.
// Given that the Toy class has a mutable field called 'name' which is used
// in equals() and hashCode():
Set<Toy> toys = new HashSet<Toy>();
Toy toy = new Toy("Fire engine", ToyType.WHEELED_VEHICLE, Color.RED);
toys.add(toy);
System.out.println(toys.contains(toy)); // true
toy.setName("Fast truck");
System.out.println(toys.contains(toy)); // false
In a HashSet/HashMap, you could mutate a contained object to change the results of compareTo() operation -- relative comparison isn't used to locate objects. But it'd be fatal inside a TreeSet/TreeMap.
You can also mutate objects that are inside an IdentityHashMap -- nothing other than object identity is used to locate contents.
Even though you can do these things with these qualifications, they make your code more fragile. What if someone wants to change to a TreeSet later, or add that mutable field to the hashCode/equality test?
Note that I'm not actually doing anything with a database here, so ORM tools are probably not what I'm looking for.
I want to have some containers that each hold a number of objects, with all objects in one container being of the same class. The container should show some of the behaviour of a database table, namely:
allow one of the object's fields to be used as a unique key, i. e. other objects that have the same value in that field are not added to the container.
upon accepting a new object, the container should issue a numeric id that is returned to the caller of the insertion method.
Instead of throwing an error when a "duplicate entry" is being requested, the container should just skip insertion and return the key of the already existing object.
Now, I would write a generic container class that accepts objects which implement an interface to get the value of the key field and use a HashMap keyed with those values as the actual storage class. Is there a better approach using existing built-in classes? I was looking through HashSet and the like, but they didn't seem to fit.
None of the Collections classes will do what you need. You'll have to write your own!
P.S. You'll also need to decide whether your class will be thread-safe or not.
P.P.S. ConcurrentHashMap is close, but not exactly the same. If you can subclass or wrap it or wrap the objects that enter your map such that you're relying only on that class for thread-safety, you'll have an efficient and thread-safe implementation.
You can simulate this behavior with a HashSet. If the objects you're adding to the collection have a field that you can use as a unique ID, just have that field returned by the object's hashCode() method (or use a calculated hash code value, either way should work).
HashSet won't throw an error when you add a duplicate entry, it just returns false. You could wrap (or extend) HashSet so that your add method returns the unique ID that you want as a return value.
I was thinking you could do it with ArrayList, using the current location in the array as the "id", but that doesn't prevent you from making an insert at an existing location, plus when you insert at that location, it will move everything up. But you might base your own class on ArrayList, returning the current value of .size() after a .add.
Is there a reason why the object's hash code couldn't be used as a "numeric id"?
If not, then all you'd need to do is wrap the call into a ConcurrentHashMap, return the object's hashCode and use the putIfAbsent(K key, V value) method to ensure you don't add duplicates.
putIfAbsent also returns the existing value, so you could get its hashCode to return to your user.
See ConcurrentHashMap