Java HashSet contains duplicates if contained element is modified - java

Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.
However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.
Here is the code which demonstrates this:
public static void main(String[] args)
{
HashSet<GraphEdge> set = new HashSet<>();
GraphEdge edge1 = new GraphEdge(1, "a");
GraphEdge edge2 = new GraphEdge(2, "b");
GraphEdge edge3 = new GraphEdge(3, "c");
set.add(edge1);
set.add(edge2);
set.add(edge3);
edge2.setId(1);
edge2.setName("a");
for(GraphEdge edge: set)
{
System.out.println(edge.toString());
}
if(edge2.equals(edge1))
{
System.out.println("Equals");
}
else
{
System.out.println("Not Equals");
}
}
public class GraphEdge
{
private int id;
private String name;
//Constructor ...
//Getters & Setters...
public int hashCode()
{
int hash = 7;
hash = 47 * hash + this.id;
hash = 47 * hash + Objects.hashCode(this.name);
return hash;
}
public boolean equals(Object o)
{
if(o == this)
{
return true;
}
if(o instanceof GraphEdge)
{
GraphEdge anotherGraphEdge = (GraphEdge) o;
if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
{
return true;
}
}
return false;
}
}
The output from the above code:
1 a
1 a
3 c
Equals
Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?
A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.

The situation you describe is invalid. See the Javadoc: "The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set."

To add to #EJP's answer, what will happen in practice if you mutate objects in a HashSet to make them duplicates (in the sense of the equals / hashcode contract) is that the hash table data structure will break.
Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g. contains and other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.
If you iterate the set, both instances will still be present ... in violation of the Set contract.
Of course, this is very broken from the application perspective.
You can avoid this problem by either:
using an immutable type for your set elements,
making a copy of the objects as you put them into the set and / or pull them out of the set,
writing your code so that it "knows" not to change the objects for the duration ...
From the perspective of correctness and robustness, the first option is clearly best.
Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?

You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.
If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.

HashSet is not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdge immutable. For example:
GraphEdge edge4 = edge2.changeName("new_name");
In the case where GraphEdge is immutable, changing a value result in returning a new instance rather changing the existing instance.

method that can be used to print the elements of a LinkedList of String objects, without any duplicate elements. The method takes a LinkedList object as an input, and then creates a new HashSet object. The method then iterates through the elements of the input LinkedList, and adds each element to the HashSet. Since a HashSet does not allow duplicate elements, this ensures that only unique elements are added to the HashSet.
Then, the method iterates through the HashSet and prints each element to the console, separated by a space. Unlike the printList method, this method does not add any newlines before or after the list of elements. It simply prints the string "Non-duplicates are: " followed by the elements of the HashSet.
public static void printSetList(LinkedList<String> list) {
Set<String> hashSet = new HashSet<>();
for (String v : list) {
hashSet.add(v);
}
System.out.print("Non-duplicates are: ");
for (String v : hashSet) {
System.out.print(v + " ");
}
}

Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.
Try replacing your implementation of hashCode with the following:
public int hashCode()
{
return Objects.hashCode(this.id, this.name);
}

You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...
public class TestIterator {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("1");
list.add("1");
list.add("2");
list.add("3");
for (String s : new UniqueIterator<String>(list)) {
System.out.println(s);
}
}
}
public class UniqueIterator<T> implements Iterable<T> {
private Set<T> hashSet = new HashSet<T>();
public UniqueIterator(Iterable<T> iterable) {
for (T t : iterable) {
hashSet.add(t);
}
}
public Iterator<T> iterator() {
return hashSet.iterator();
}
}

Related

Accessing an element in a Set [duplicate]

Why doesn't Set provide an operation to get an element that equals another element?
Set<Foo> set = ...;
...
Foo foo = new Foo(1, 2, 3);
Foo bar = set.get(foo); // get the Foo element from the Set that equals foo
I can ask whether the Set contains an element equal to bar, so why can't I get that element? :(
To clarify, the equals method is overridden, but it only checks one of the fields, not all. So two Foo objects that are considered equal can actually have different values, that's why I can't just use foo.
To answer the precise question "Why doesn't Set provide an operation to get an element that equals another element?", the answer would be: because the designers of the collection framework were not very forward looking. They didn't anticipate your very legitimate use case, naively tried to "model the mathematical set abstraction" (from the javadoc) and simply forgot to add the useful get() method.
Now to the implied question "how do you get the element then": I think the best solution is to use a Map<E,E> instead of a Set<E>, to map the elements to themselves. In that way, you can efficiently retrieve an element from the "set", because the get() method of the Map will find the element using an efficient hash table or tree algorithm. If you wanted, you could write your own implementation of Set that offers the additional get() method, encapsulating the Map.
The following answers are in my opinion bad or wrong:
"You don't need to get the element, because you already have an equal object": the assertion is wrong, as you already showed in the question. Two objects that are equal still can have different state that is not relevant to the object equality. The goal is to get access to this state of the element contained in the Set, not the state of the object used as a "query".
"You have no other option but to use the iterator": that is a linear search over a collection which is totally inefficient for large sets (ironically, internally the Set is organized as hash map or tree that could be queried efficiently). Don't do it! I have seen severe performance problems in real-life systems by using that approach. In my opinion what is terrible about the missing get() method is not so much that it is a bit cumbersome to work around it, but that most programmers will use the linear search approach without thinking of the implications.
There would be no point of getting the element if it is equal. A Map is better suited for this usecase.
If you still want to find the element you have no other option but to use the iterator:
public static void main(String[] args) {
Set<Foo> set = new HashSet<Foo>();
set.add(new Foo("Hello"));
for (Iterator<Foo> it = set.iterator(); it.hasNext(); ) {
Foo f = it.next();
if (f.equals(new Foo("Hello")))
System.out.println("foo found");
}
}
static class Foo {
String string;
Foo(String string) {
this.string = string;
}
#Override
public int hashCode() {
return string.hashCode();
}
#Override
public boolean equals(Object obj) {
return string.equals(((Foo) obj).string);
}
}
If you have an equal object, why do you need the one from the set? If it is "equal" only by a key, an Map would be a better choice.
Anyway, the following will do it:
Foo getEqual(Foo sample, Set<Foo> all) {
for (Foo one : all) {
if (one.equals(sample)) {
return one;
}
}
return null;
}
With Java 8 this can become a one liner:
return all.stream().filter(sample::equals).findAny().orElse(null);
Default Set in Java is, unfortunately, not designed to provide a "get" operation, as jschreiner accurately explained.
The solutions of using an iterator to find the element of interest (suggested by dacwe) or to remove the element and re-add it with its values updated (suggested by KyleM), could work, but can be very inefficient.
Overriding the implementation of equals so that non-equal objects are "equal", as stated correctly by David Ogren, can easily cause maintenance problems.
And using a Map as an explicit replacement (as suggested by many), imho, makes the code less elegant.
If the goal is to get access to the original instance of the element contained in the set (hope I understood correctly your use case), here is another possible solution.
I personally had your same need while developing a client-server videogame with Java. In my case, each client had copies of the components stored in the server and the problem was whenever a client needed to modify an object of the server.
Passing an object through the internet meant that the client had different instances of that object anyway. In order to match this "copied" instance with the original one, I decided to use Java UUIDs.
So I created an abstract class UniqueItem, which automatically gives a random unique id to each instance of its subclasses.
This UUID is shared between the client and the server instance, so this way it could be easy to match them by simply using a Map.
However directly using a Map in a similar usecase was still inelegant. Someone might argue that using an Map might be more complicated to mantain and handle.
For these reasons I implemented a library called MagicSet, that makes the usage of an Map "transparent" to the developer.
https://github.com/ricpacca/magicset
Like the original Java HashSet, a MagicHashSet (which is one of the implementations of MagicSet provided in the library) uses a backing HashMap, but instead of having elements as keys and a dummy value as values, it uses the UUID of the element as key and the element itself as value. This does not cause overhead in the memory use compared to a normal HashSet.
Moreover, a MagicSet can be used exactly as a Set, but with some more methods providing additional functionalities, like getFromId(), popFromId(), removeFromId(), etc.
The only requirement to use it is that any element that you want to store in a MagicSet needs to extend the abstract class UniqueItem.
Here is a code example, imagining to retrieve the original instance of a city from a MagicSet, given another instance of that city with the same UUID (or even just its UUID).
class City extends UniqueItem {
// Somewhere in this class
public void doSomething() {
// Whatever
}
}
public class GameMap {
private MagicSet<City> cities;
public GameMap(Collection<City> cities) {
cities = new MagicHashSet<>(cities);
}
/*
* cityId is the UUID of the city you want to retrieve.
* If you have a copied instance of that city, you can simply
* call copiedCity.getId() and pass the return value to this method.
*/
public void doSomethingInCity(UUID cityId) {
City city = cities.getFromId(cityId);
city.doSomething();
}
// Other methods can be called on a MagicSet too
}
If your set is in fact a NavigableSet<Foo> (such as a TreeSet), and Foo implements Comparable<Foo>, you can use
Foo bar = set.floor(foo); // or .ceiling
if (foo.equals(bar)) {
// use bar…
}
(Thanks to #eliran-malka’s comment for the hint.)
With Java 8 you can do:
Foo foo = set.stream().filter(item->item.equals(theItemYouAreLookingFor)).findFirst().get();
But be careful, .get() throws a NoSuchElementException, or you can manipulate a Optional item.
Convert set to list, and then use get method of list
Set<Foo> set = ...;
List<Foo> list = new ArrayList<Foo>(set);
Foo obj = list.get(0);
Why:
It seems that Set plays a useful role in providing a means of comparison. It is designed not to store duplicate elements.
Because of this intention/design, if one were to get() a reference to the stored object, then mutate it, it is possible that the design intentions of Set could be thwarted and could cause unexpected behavior.
From the JavaDocs
Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.
How:
Now that Streams have been introduced one can do the following
mySet.stream()
.filter(object -> object.property.equals(myProperty))
.findFirst().get();
Object objectToGet = ...
Map<Object, Object> map = new HashMap<Object, Object>(set.size());
for (Object o : set) {
map.put(o, o);
}
Object objectFromSet = map.get(objectToGet);
If you only do one get this will not be very performing because you will loop over all your elements but when performing multiple retrieves on a big set you will notice the difference.
you can use Iterator class
import java.util.Iterator;
import java.util.HashSet;
public class MyClass {
public static void main(String[ ] args) {
HashSet<String> animals = new HashSet<String>();
animals.add("fox");
animals.add("cat");
animals.add("dog");
animals.add("rabbit");
Iterator<String> it = animals.iterator();
while(it.hasNext()) {
String value = it.next();
System.out.println(value);
}
}
}
it looks like the proper object to use is the Interner from guava :
Provides equivalent behavior to String.intern() for other immutable
types. Common implementations are available from the Interners
class.
It also has a few very interesting levers, like concurrencyLevel, or the type of references used (it might be worth noting that it doesn't offer a SoftInterner which I could see as more useful than a WeakInterner).
I know, this has been asked and answered long ago, however if anyone is interested, here is my solution - custom set class backed by HashMap:
http://pastebin.com/Qv6S91n9
You can easily implement all other Set methods.
Been there done that!! If you are using Guava a quick way to convert it to a map is:
Map<Integer,Foo> map = Maps.uniqueIndex(fooSet, Foo::getKey);
If you want nth Element from HashSet, you can go with below solution,
here i have added object of ModelClass in HashSet.
ModelClass m1 = null;
int nth=scanner.nextInt();
for(int index=0;index<hashset1.size();index++){
m1 = (ModelClass) itr.next();
if(nth == index) {
System.out.println(m1);
break;
}
}
If you look at the first few lines of the implementation of java.util.HashSet you will see:
public class HashSet<E>
....
private transient HashMap<E,Object> map;
So HashSet uses HashMap interally anyway, which means that if you just use a HashMap directly and use the same value as the key and the value you will get the effect you want and save yourself some memory.
Because any particular implementation of Set may or may not be random access.
You can always get an iterator and step through the Set, using the iterators' next() method to return the result you want once you find the equal element. This works regardless of the implementation. If the implementation is NOT random access (picture a linked-list backed Set), a get(E element) method in the interface would be deceptive, since it would have to iterate the collection to find the element to return, and a get(E element) would seem to imply this would be necessary, that the Set could jump directly to the element to get.
contains() may or may not have to do the same thing, of course, depending on the implementation, but the name doesn't seem to lend itself to the same sort of misunderstandings.
Yes, use HashMap ... but in a specialised way: the trap I foresee in trying to use a HashMap as a pseudo-Set is the possible confusion between "actual" elements of the Map/Set, and "candidate" elements, i.e. elements used to test whether an equal element is already present. This is far from foolproof, but nudges you away from the trap:
class SelfMappingHashMap<V> extends HashMap<V, V>{
#Override
public String toString(){
// otherwise you get lots of "... object1=object1, object2=object2..." stuff
return keySet().toString();
}
#Override
public V get( Object key ){
throw new UnsupportedOperationException( "use tryToGetRealFromCandidate()");
}
#Override
public V put( V key, V value ){
// thorny issue here: if you were indavertently to `put`
// a "candidate instance" with the element already in the `Map/Set`:
// these will obviously be considered equivalent
assert key.equals( value );
return super.put( key, value );
}
public V tryToGetRealFromCandidate( V key ){
return super.get(key);
}
}
Then do this:
SelfMappingHashMap<SomeClass> selfMap = new SelfMappingHashMap<SomeClass>();
...
SomeClass candidate = new SomeClass();
if( selfMap.contains( candidate ) ){
SomeClass realThing = selfMap.tryToGetRealFromCandidate( candidate );
...
realThing.useInSomeWay()...
}
But... you now want the candidate to self-destruct in some way unless the programmer actually immediately puts it in the Map/Set... you'd want contains to "taint" the candidate so that any use of it unless it joins the Map makes it "anathema". Perhaps you could make SomeClass implement a new Taintable interface.
A more satisfactory solution is a GettableSet, as below. However, for this to work you have either to be in charge of the design of SomeClass in order to make all constructors non-visible (or... able and willing to design and use a wrapper class for it):
public interface NoVisibleConstructor {
// again, this is a "nudge" technique, in the sense that there is no known method of
// making an interface enforce "no visible constructor" in its implementing classes
// - of course when Java finally implements full multiple inheritance some reflection
// technique might be used...
NoVisibleConstructor addOrGetExisting( GettableSet<? extends NoVisibleConstructor> gettableSet );
};
public interface GettableSet<V extends NoVisibleConstructor> extends Set<V> {
V getGenuineFromImpostor( V impostor ); // see below for naming
}
Implementation:
public class GettableHashSet<V extends NoVisibleConstructor> implements GettableSet<V> {
private Map<V, V> map = new HashMap<V, V>();
#Override
public V getGenuineFromImpostor(V impostor ) {
return map.get( impostor );
}
#Override
public int size() {
return map.size();
}
#Override
public boolean contains(Object o) {
return map.containsKey( o );
}
#Override
public boolean add(V e) {
assert e != null;
V result = map.put( e, e );
return result != null;
}
#Override
public boolean remove(Object o) {
V result = map.remove( o );
return result != null;
}
#Override
public boolean addAll(Collection<? extends V> c) {
// for example:
throw new UnsupportedOperationException();
}
#Override
public void clear() {
map.clear();
}
// implement the other methods from Set ...
}
Your NoVisibleConstructor classes then look like this:
class SomeClass implements NoVisibleConstructor {
private SomeClass( Object param1, Object param2 ){
// ...
}
static SomeClass getOrCreate( GettableSet<SomeClass> gettableSet, Object param1, Object param2 ) {
SomeClass candidate = new SomeClass( param1, param2 );
if (gettableSet.contains(candidate)) {
// obviously this then means that the candidate "fails" (or is revealed
// to be an "impostor" if you will). Return the existing element:
return gettableSet.getGenuineFromImpostor(candidate);
}
gettableSet.add( candidate );
return candidate;
}
#Override
public NoVisibleConstructor addOrGetExisting( GettableSet<? extends NoVisibleConstructor> gettableSet ){
// more elegant implementation-hiding: see below
}
}
PS one technical issue with such a NoVisibleConstructor class: it may be objected that such a class is inherently final, which may be undesirable. Actually you could always add a dummy parameterless protected constructor:
protected SomeClass(){
throw new UnsupportedOperationException();
}
... which would at least let a subclass compile. You'd then have to think about whether you need to include another getOrCreate() factory method in the subclass.
Final step is an abstract base class (NB "element" for a list, "member" for a set) like this for your set members (when possible - again, scope for using a wrapper class where the class is not under your control, or already has a base class, etc.), for maximum implementation-hiding:
public abstract class AbstractSetMember implements NoVisibleConstructor {
#Override
public NoVisibleConstructor
addOrGetExisting(GettableSet<? extends NoVisibleConstructor> gettableSet) {
AbstractSetMember member = this;
#SuppressWarnings("unchecked") // unavoidable!
GettableSet<AbstractSetMembers> set = (GettableSet<AbstractSetMember>) gettableSet;
if (gettableSet.contains( member )) {
member = set.getGenuineFromImpostor( member );
cleanUpAfterFindingGenuine( set );
} else {
addNewToSet( set );
}
return member;
}
abstract public void addNewToSet(GettableSet<? extends AbstractSetMember> gettableSet );
abstract public void cleanUpAfterFindingGenuine(GettableSet<? extends AbstractSetMember> gettableSet );
}
... usage is fairly obvious (inside your SomeClass's static factory method):
SomeClass setMember = new SomeClass( param1, param2 ).addOrGetExisting( set );
The contract of the hash code makes clear that:
"If two objects are equal according to the Object method, then calling the hashCode method on each of the two objects must produce the same integer result."
So your assumption:
"To clarify, the equals method is overridden, but it only checks one of
the fields, not all. So two Foo objects that are considered equal can
actually have different values, that's why I can't just use foo."
is wrong and you are breaking the contract. If we look at the "contains" method of Set interface, we have that:
boolean contains(Object o);
Returns true if this set contains the specified element. More
formally, returns true if and only if this set contains an element
"e" such that o==null ? e==null : o.equals(e)
To accomplish what you want, you can use a Map where you define the key and store your element with the key that defines how objects are different or equal to each other.
Here's what you can do if you have a NavigableSet (e.g. a TreeSet):
public static <E> E get(NavigableSet<E> set, E key) {
return set.tailSet(key, true).floor(key);
}
The things are slightly trickier for HashSet and its descendants like LinkedHashSet:
import java.util.*;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
public class Test {
private static final Field mapField;
private static final Method hashMethod;
private static final Method getNodeMethod;
private static final Field keyField;
static {
try {
mapField = HashSet.class.getDeclaredField("map");
mapField.setAccessible(true);
hashMethod = HashMap.class.getDeclaredMethod("hash", Object.class);
hashMethod.setAccessible(true);
getNodeMethod = HashMap.class.getDeclaredMethod("getNode",
Integer.TYPE, Object.class);
getNodeMethod.setAccessible(true);
keyField = Class.forName("java.util.HashMap$Node").getDeclaredField("key");
keyField.setAccessible(true);
} catch (ReflectiveOperationException e) {
throw new RuntimeException(e);
}
}
public static <E> E get(HashSet<E> set, E key) {
try {
Object map = mapField.get(set);
Object hash = hashMethod.invoke(null, key);
Object node = getNodeMethod.invoke(map, hash, key);
if (node == null)
return null;
#SuppressWarnings("unchecked")
E result = (E)keyField.get(node);
return result;
} catch (ReflectiveOperationException e) {
throw new RuntimeException(e);
}
}
public static <E> E get(NavigableSet<E> set, E key) {
return set.tailSet(key, true).floor(key);
}
public static void main(String[] args) {
HashSet<Integer> s = new HashSet<>();
// HashSet<Integer> s = new LinkedHashSet<>();
// TreeSet<Integer> s = new TreeSet<>();
for (int i = 0; i < 100_000; i++)
s.add(i);
Integer key = java.awt.event.KeyEvent.VK_FIND;
Integer hidden = get(s, key);
System.out.println(key);
System.out.println(hidden);
System.out.println(key.equals(hidden));
System.out.println(key == hidden);
}
}
Quick helper method that might address this situation:
<T> T onlyItem(Collection<T> items) {
if (items.size() != 1)
throw new IllegalArgumentException("Collection must have single item; instead it has " + items.size());
return items.iterator().next();
}
Try using an array:
ObjectClass[] arrayName = SetOfObjects.toArray(new ObjectClass[setOfObjects.size()]);
Following can be an approach
SharedPreferences se_get = getSharedPreferences("points",MODE_PRIVATE);
Set<String> main = se_get.getStringSet("mydata",null);
for(int jk = 0 ; jk < main.size();jk++)
{
Log.i("data",String.valueOf(main.toArray()[jk]));
}

HashSet turns unreliable when modifying a field of a contained object. Why/When or how should I use a HashSet?

When I edit an object, which is contained within a HashSet, the hash of the object changes, but the HashSet is not updated internally. Therefor, I practically can add the same object twice:
TestObject testObject = new TestObject(1, "hello");
Set<TestObject> set = new HashSet<>();
set.add(testObject);
testObject.number = 2;
set.add(testObject);
set.forEach(System.out::println);
//will print
//{number:2, string:hello}
//{number:2, string:hello}
Full working code example:
import java.util.*;
public class Main {
public static void main(String[] args) {
TestObject testObject = new TestObject(1, "hello");
Set<TestObject> set = new HashSet<>();
// add initial object
set.add(testObject);
// modify object
testObject.number = 2;
testObject.string = "Bye";
// re-add same object
set.add(testObject);
set.forEach(System.out::println);
}
}
class TestObject {
public int number;
public String string;
public TestObject(int number, String string) {
this.number = number;
this.string = string;
}
#Override
public int hashCode() {
return Objects.hash(number, string);
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceof TestObject)) {
return false;
}
TestObject o = (TestObject) obj;
return number == o.number && string.equals(o.string);
}
#Override
public String toString() {
return "{number:" + number + ", string:" + string + "}";
}
}
This means, after modifying an object which already is contained in a HashSet, theHashSet` turns unreliable or invalid.
Modifying an object that is somewhere contained in a Set (probably even without knowing) seems a regular use case to me . And something which I probably already have done a lot.
This throws me back and brings one basic question to me: When or why should I use a HashSet if it has such a behaviour?
Well, if you have a look at the HashSet source you'll see that it's basically a HashMap<E, Object> with the elements being the keys - and modifying keys of a hashmap is never a good idea. The map/set will not be updated if the hash would change, in fact the map/set wouldn't even know about that change.
In general keys of a HashMap or elements in a HashSet should be immutable in that their hash and equality doesn't change. In most cases the hash and equality are based on those object's (business) identity, so if number and string are both part of that object's identity then you shouldn't be able to change those.
Modifying an object that is somewhere contained in a Set (probably even without knowing) seems a regular use case to me . And something which I probably already have done a lot.
It's probably true that objects contained in sets are modified quite often but that normally would mean that data that's not used to generate the hashcode or to check equality are modified. As an example let's say a person's hashcode is based on their ID number. That would mean that hashCode() and equals() should only be based on that number and that everything else could be modified safely.
So you could modify elements in a HashSet as long as you're not modifying their "id".
When or why should I use a HashSet if it has such a behaviour?
If you need to store mutable objects in a HashSet you have a few options which basically revolve around using only the immutable parts for hashCode() and equals(). For sets that could be done by using a wrapper object that provides a customized implementation for those methods. Alternatively you could extract one or more immutable properties and use those as the key into a map (in case of multiple properties you'd need to build some sort of key object out of those)
You’re never supposed to compare strings with == use .equals instead
Adding an element that is already present, as you said, won't override the element that is already in the HashSet. Use a remove(), before calling the add(), to insure the new value to be inserted effectively.
Side note: as some users have noted, pay attention to the Strings' comparisons in your test.

Is there a data structure that only stores hash codes and not the actual objects?

My use-case is that I'm looking for a data structure in Java that will let me see if an object with the same hash code is inside (by calling contains()), but I will never need to iterate through the elements or retrieve the actual objects. A HashSet is close, but from my understanding, it still contains references to the actual objects, and that would be a waste of memory since I won't ever need the contents of the actual objects. The best option I can think of is a HashSet of type Integer storing only the hash codes, but I'm wondering if there is a built-in data structure that would accomplish the same thing (and only accept one type as opposed to HashSet of type Integer which will accept the hash code of any object).
A Bloom filter can tell whether an object might be a member, or is definitely not a member. You can control the likelihood of false positives. Each hash value maps to a single bit.
The Guava library provides an implementation in Java.
You could use a primitive collection implementation like IntSet to store values of hash codes. Obviously as others have mentioned this assumes collisions aren't a problem.
If you want to track if a hash code is already present and to do it memory efficient a BitSet may suite your requirements.
Look at the following example:
public static void main(String[] args) {
BitSet hashCodes = new BitSet();
hashCodes.set("1".hashCode());
System.out.println(hashCodes.get("1".hashCode())); // true
System.out.println(hashCodes.get("2".hashCode())); // false
}
The BitSet "implements a vector of bits that grows as needed.". It's a JDK "built-in data structure" which doesn't contain "references to the actual objects". It stores only if "the same hash code is inside".
EDIT:
As #Steve mentioned in his comment the implementation of the BitSet isn't the most memory efficient one. But there are more memory efficient implementations of a bit set - though not built-in.
There is no such built-in data structure, because such a data structure is rarely needed. It's easy to build one, though.
public class HashCodeSet<T> {
private final HashSet<Integer> hashCodes;
public MyHashSet() {
hashCodes = new HashSet<>();
}
public MyHashSet(int initialCapacity) {
hashCodes = new HashSet<>(initialCapacity);
}
public HashCodeSet(HashCodeSet toCopy) {
hashCodes = new HashSet<>(toCopy.hashCodes);
}
public void add(T element) {
hashCodes.add(element.hashCode());
}
public boolean containsHashCodeOf(T element) {
return hashCodes.contains(element.hashCode());
}
#Override
public boolean equals(o: Object) {
return o == this || o instanceof HashCodeSet &&
((HashCodeSet) o).hashCodes.equals(hashCodes);
}
#Override
public int hashCode() {
return hashCodes.hashCode(); // hash-ception
}
#Override
public String toString() {
return hashCodes.toString();
}
}

Why does the HashSet contains multiple the same objects? [duplicate]

Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.
However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.
Here is the code which demonstrates this:
public static void main(String[] args)
{
HashSet<GraphEdge> set = new HashSet<>();
GraphEdge edge1 = new GraphEdge(1, "a");
GraphEdge edge2 = new GraphEdge(2, "b");
GraphEdge edge3 = new GraphEdge(3, "c");
set.add(edge1);
set.add(edge2);
set.add(edge3);
edge2.setId(1);
edge2.setName("a");
for(GraphEdge edge: set)
{
System.out.println(edge.toString());
}
if(edge2.equals(edge1))
{
System.out.println("Equals");
}
else
{
System.out.println("Not Equals");
}
}
public class GraphEdge
{
private int id;
private String name;
//Constructor ...
//Getters & Setters...
public int hashCode()
{
int hash = 7;
hash = 47 * hash + this.id;
hash = 47 * hash + Objects.hashCode(this.name);
return hash;
}
public boolean equals(Object o)
{
if(o == this)
{
return true;
}
if(o instanceof GraphEdge)
{
GraphEdge anotherGraphEdge = (GraphEdge) o;
if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
{
return true;
}
}
return false;
}
}
The output from the above code:
1 a
1 a
3 c
Equals
Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?
A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.
The situation you describe is invalid. See the Javadoc: "The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set."
To add to #EJP's answer, what will happen in practice if you mutate objects in a HashSet to make them duplicates (in the sense of the equals / hashcode contract) is that the hash table data structure will break.
Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g. contains and other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.
If you iterate the set, both instances will still be present ... in violation of the Set contract.
Of course, this is very broken from the application perspective.
You can avoid this problem by either:
using an immutable type for your set elements,
making a copy of the objects as you put them into the set and / or pull them out of the set,
writing your code so that it "knows" not to change the objects for the duration ...
From the perspective of correctness and robustness, the first option is clearly best.
Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?
You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.
If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.
HashSet is not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdge immutable. For example:
GraphEdge edge4 = edge2.changeName("new_name");
In the case where GraphEdge is immutable, changing a value result in returning a new instance rather changing the existing instance.
method that can be used to print the elements of a LinkedList of String objects, without any duplicate elements. The method takes a LinkedList object as an input, and then creates a new HashSet object. The method then iterates through the elements of the input LinkedList, and adds each element to the HashSet. Since a HashSet does not allow duplicate elements, this ensures that only unique elements are added to the HashSet.
Then, the method iterates through the HashSet and prints each element to the console, separated by a space. Unlike the printList method, this method does not add any newlines before or after the list of elements. It simply prints the string "Non-duplicates are: " followed by the elements of the HashSet.
public static void printSetList(LinkedList<String> list) {
Set<String> hashSet = new HashSet<>();
for (String v : list) {
hashSet.add(v);
}
System.out.print("Non-duplicates are: ");
for (String v : hashSet) {
System.out.print(v + " ");
}
}
Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.
Try replacing your implementation of hashCode with the following:
public int hashCode()
{
return Objects.hashCode(this.id, this.name);
}
You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...
public class TestIterator {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("1");
list.add("1");
list.add("2");
list.add("3");
for (String s : new UniqueIterator<String>(list)) {
System.out.println(s);
}
}
}
public class UniqueIterator<T> implements Iterable<T> {
private Set<T> hashSet = new HashSet<T>();
public UniqueIterator(Iterable<T> iterable) {
for (T t : iterable) {
hashSet.add(t);
}
}
public Iterator<T> iterator() {
return hashSet.iterator();
}
}

Remove whats common to both lists

How can I remove whats common to both lists based on an object attribute. Below I am trying to remove all values from testList2 that contain the same str1 parameter as testList1.
I think I can override the equals method in the class that is being compared as equals method is used under the hood when using removeAll ?
testList1 & testList2 are of type ArrayList and both contain a List of Test objects.
testList1.removeAll(testList2);
public class Test{
private String str1;
private String str2;
public Test(String str1 , String str2){
this.str1 = str1;
this.str2 = str2;
}
public String getStr1() {
return str1;
}
public String getStr2() {
return str2;
}
public boolean equals(Object o){
Test t = (Test)o;
return this.getStr1().equalsIgnoreCase(t.getStr2());
}
}
Yes, overriding equals(...) should work with removeAll(...), since ArrayList will use that for equality checks.
Under the hood, the removeAll(...) method in AbstractCollection (which is a super class of ArrayList) will call contains(entry) on the collection that is passed to removeAll(...). contains(...) in ArrayList will then get the index of the element using indexOf(...) which in turn loops through all elements and calls equals(...) on those.
That said, it becomes obvious that the removeAll() implementation using lists has O(n2) complexity (loop through the source list and for each entry loop through the parameter list) which might get quite slow for bigger lists.
Thus you might want to pass a set of the objects that you want removed to removeAll(...). This would result in O(n * log(n)) complexity (the loop over the source list remains, but the contains call on a set is O(log(n)) only).
If you want all objects from both lists without repeated ones (at least that was what I understood before your edits):
Set<Test> both = new HashSet<Test>();
both.addAll(testList1);
both.addAll(testList2);
//and if you really need to use a List instead of a Set
List<Test> result = new ArrayList<Test>(both);
Of course, you'll still have to override equals() so the collections can understand what you mean by it.
First of all equals() should determine whether two objects are logically equal. If these objects are logically equal when their str1 fields are equal then you may go with equals and use methods defined for the collections. In this case equals() contract (defined in the java.lang.Object) is worth reading.
If I were working with your code I would prefer if you solve your problem with iteration instead of defining incorrect equals() method (Warning: not tested code):
Set<String> strings = new HashSet<String>(listOne.size());
for(Test t : listOne){
strings.add(t.getStr1());
}
Iterator<Test> it = listTwo.iterator();
while(it.hasNext()){
Test t = it.next();
if(strings.contains(t.getStr1())
it.remove();
}
I think I can override the equals method in the class that is being compared as equals method is used under the hood when using removeAll ?
// you need to compare the current values to the values in t
public boolean equals(Object o){
Test t = (Test)o;
return t.getStr1().equalsIgnoreCase(t.getStr1())
&& t.getStr2().equalsIgnoreCase(t.getStr2());
}
I would also make the fields final if you can.
If str1 is present in testList1 & testList2, you want to remove it from both Lists, right!
// Iterate till all elements
for (int i=0; i < testList.size()-1; i++) {
Test t1 = testList1.get(i); // Get element i of TestList1 Arr
Test t2 = testList2.get(i); // Get element i of TestList2 Arr
// If t1 & t2 both contains same values in str1 & str2
if (t1.equuals(str1) && t2.equals(str1) ) {
// Remove the elements from list
testList1.remove(i);
testlist2.remove(i);
}
}
If you want to override equals in Test object
// This will comapre current instance and the passed instance
public boolean equals(String toCompare, Test obj) {
return (this.equals(toComare) && obj.equals(toComapre));
}
Hope this helps you. Use whichever is convinient to you, both will work.

Categories