Remove whats common to both lists

Remove whats common to both lists - java

How can I remove whats common to both lists based on an object attribute. Below I am trying to remove all values from testList2 that contain the same str1 parameter as testList1.
I think I can override the equals method in the class that is being compared as equals method is used under the hood when using removeAll ?
testList1 & testList2 are of type ArrayList and both contain a List of Test objects.
testList1.removeAll(testList2);
public class Test{
private String str1;
private String str2;
public Test(String str1 , String str2){
this.str1 = str1;
this.str2 = str2;
}
public String getStr1() {
return str1;
}
public String getStr2() {
return str2;
}
public boolean equals(Object o){
Test t = (Test)o;
return this.getStr1().equalsIgnoreCase(t.getStr2());
}
}

Yes, overriding equals(...) should work with removeAll(...), since ArrayList will use that for equality checks.
Under the hood, the removeAll(...) method in AbstractCollection (which is a super class of ArrayList) will call contains(entry) on the collection that is passed to removeAll(...). contains(...) in ArrayList will then get the index of the element using indexOf(...) which in turn loops through all elements and calls equals(...) on those.
That said, it becomes obvious that the removeAll() implementation using lists has O(n2) complexity (loop through the source list and for each entry loop through the parameter list) which might get quite slow for bigger lists.
Thus you might want to pass a set of the objects that you want removed to removeAll(...). This would result in O(n * log(n)) complexity (the loop over the source list remains, but the contains call on a set is O(log(n)) only).

If you want all objects from both lists without repeated ones (at least that was what I understood before your edits):
Set<Test> both = new HashSet<Test>();
both.addAll(testList1);
both.addAll(testList2);
//and if you really need to use a List instead of a Set
List<Test> result = new ArrayList<Test>(both);
Of course, you'll still have to override equals() so the collections can understand what you mean by it.

First of all equals() should determine whether two objects are logically equal. If these objects are logically equal when their str1 fields are equal then you may go with equals and use methods defined for the collections. In this case equals() contract (defined in the java.lang.Object) is worth reading.
If I were working with your code I would prefer if you solve your problem with iteration instead of defining incorrect equals() method (Warning: not tested code):
Set<String> strings = new HashSet<String>(listOne.size());
for(Test t : listOne){
strings.add(t.getStr1());
}
Iterator<Test> it = listTwo.iterator();
while(it.hasNext()){
Test t = it.next();
if(strings.contains(t.getStr1())
it.remove();
}

I think I can override the equals method in the class that is being compared as equals method is used under the hood when using removeAll ?
// you need to compare the current values to the values in t
public boolean equals(Object o){
Test t = (Test)o;
return t.getStr1().equalsIgnoreCase(t.getStr1())
&& t.getStr2().equalsIgnoreCase(t.getStr2());
}
I would also make the fields final if you can.

If str1 is present in testList1 & testList2, you want to remove it from both Lists, right!
// Iterate till all elements
for (int i=0; i < testList.size()-1; i++) {
Test t1 = testList1.get(i); // Get element i of TestList1 Arr
Test t2 = testList2.get(i); // Get element i of TestList2 Arr
// If t1 & t2 both contains same values in str1 & str2
if (t1.equuals(str1) && t2.equals(str1) ) {
// Remove the elements from list
testList1.remove(i);
testlist2.remove(i);
}
}
If you want to override equals in Test object
// This will comapre current instance and the passed instance
public boolean equals(String toCompare, Test obj) {
return (this.equals(toComare) && obj.equals(toComapre));
}
Hope this helps you. Use whichever is convinient to you, both will work.

Related

HashSet turns unreliable when modifying a field of a contained object. Why/When or how should I use a HashSet?

When I edit an object, which is contained within a HashSet, the hash of the object changes, but the HashSet is not updated internally. Therefor, I practically can add the same object twice:
TestObject testObject = new TestObject(1, "hello");
Set<TestObject> set = new HashSet<>();
set.add(testObject);
testObject.number = 2;
set.add(testObject);
set.forEach(System.out::println);
//will print
//{number:2, string:hello}
//{number:2, string:hello}
Full working code example:
import java.util.*;
public class Main {
public static void main(String[] args) {
TestObject testObject = new TestObject(1, "hello");
Set<TestObject> set = new HashSet<>();
// add initial object
set.add(testObject);
// modify object
testObject.number = 2;
testObject.string = "Bye";
// re-add same object
set.add(testObject);
set.forEach(System.out::println);
}
}
class TestObject {
public int number;
public String string;
public TestObject(int number, String string) {
this.number = number;
this.string = string;
}
#Override
public int hashCode() {
return Objects.hash(number, string);
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceof TestObject)) {
return false;
}
TestObject o = (TestObject) obj;
return number == o.number && string.equals(o.string);
}
#Override
public String toString() {
return "{number:" + number + ", string:" + string + "}";
}
}
This means, after modifying an object which already is contained in a HashSet, theHashSet` turns unreliable or invalid.
Modifying an object that is somewhere contained in a Set (probably even without knowing) seems a regular use case to me . And something which I probably already have done a lot.
This throws me back and brings one basic question to me: When or why should I use a HashSet if it has such a behaviour?

Well, if you have a look at the HashSet source you'll see that it's basically a HashMap<E, Object> with the elements being the keys - and modifying keys of a hashmap is never a good idea. The map/set will not be updated if the hash would change, in fact the map/set wouldn't even know about that change.
In general keys of a HashMap or elements in a HashSet should be immutable in that their hash and equality doesn't change. In most cases the hash and equality are based on those object's (business) identity, so if number and string are both part of that object's identity then you shouldn't be able to change those.
Modifying an object that is somewhere contained in a Set (probably even without knowing) seems a regular use case to me . And something which I probably already have done a lot.
It's probably true that objects contained in sets are modified quite often but that normally would mean that data that's not used to generate the hashcode or to check equality are modified. As an example let's say a person's hashcode is based on their ID number. That would mean that hashCode() and equals() should only be based on that number and that everything else could be modified safely.
So you could modify elements in a HashSet as long as you're not modifying their "id".
When or why should I use a HashSet if it has such a behaviour?
If you need to store mutable objects in a HashSet you have a few options which basically revolve around using only the immutable parts for hashCode() and equals(). For sets that could be done by using a wrapper object that provides a customized implementation for those methods. Alternatively you could extract one or more immutable properties and use those as the key into a map (in case of multiple properties you'd need to build some sort of key object out of those)

You’re never supposed to compare strings with == use .equals instead

Adding an element that is already present, as you said, won't override the element that is already in the HashSet. Use a remove(), before calling the add(), to insure the new value to be inserted effectively.
Side note: as some users have noted, pay attention to the Strings' comparisons in your test.

Why does the HashSet contains multiple the same objects? [duplicate]

Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.
However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.
Here is the code which demonstrates this:
public static void main(String[] args)
{
HashSet<GraphEdge> set = new HashSet<>();
GraphEdge edge1 = new GraphEdge(1, "a");
GraphEdge edge2 = new GraphEdge(2, "b");
GraphEdge edge3 = new GraphEdge(3, "c");
set.add(edge1);
set.add(edge2);
set.add(edge3);
edge2.setId(1);
edge2.setName("a");
for(GraphEdge edge: set)
{
System.out.println(edge.toString());
}
if(edge2.equals(edge1))
{
System.out.println("Equals");
}
else
{
System.out.println("Not Equals");
}
}
public class GraphEdge
{
private int id;
private String name;
//Constructor ...
//Getters & Setters...
public int hashCode()
{
int hash = 7;
hash = 47 * hash + this.id;
hash = 47 * hash + Objects.hashCode(this.name);
return hash;
}
public boolean equals(Object o)
{
if(o == this)
{
return true;
}
if(o instanceof GraphEdge)
{
GraphEdge anotherGraphEdge = (GraphEdge) o;
if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
{
return true;
}
}
return false;
}
}
The output from the above code:
1 a
1 a
3 c
Equals
Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?
A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.

The situation you describe is invalid. See the Javadoc: "The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set."

To add to #EJP's answer, what will happen in practice if you mutate objects in a HashSet to make them duplicates (in the sense of the equals / hashcode contract) is that the hash table data structure will break.
Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g. contains and other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.
If you iterate the set, both instances will still be present ... in violation of the Set contract.
Of course, this is very broken from the application perspective.
You can avoid this problem by either:
using an immutable type for your set elements,
making a copy of the objects as you put them into the set and / or pull them out of the set,
writing your code so that it "knows" not to change the objects for the duration ...
From the perspective of correctness and robustness, the first option is clearly best.
Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?

You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.
If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.

HashSet is not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdge immutable. For example:
GraphEdge edge4 = edge2.changeName("new_name");
In the case where GraphEdge is immutable, changing a value result in returning a new instance rather changing the existing instance.

method that can be used to print the elements of a LinkedList of String objects, without any duplicate elements. The method takes a LinkedList object as an input, and then creates a new HashSet object. The method then iterates through the elements of the input LinkedList, and adds each element to the HashSet. Since a HashSet does not allow duplicate elements, this ensures that only unique elements are added to the HashSet.
Then, the method iterates through the HashSet and prints each element to the console, separated by a space. Unlike the printList method, this method does not add any newlines before or after the list of elements. It simply prints the string "Non-duplicates are: " followed by the elements of the HashSet.
public static void printSetList(LinkedList<String> list) {
Set<String> hashSet = new HashSet<>();
for (String v : list) {
hashSet.add(v);
}
System.out.print("Non-duplicates are: ");
for (String v : hashSet) {
System.out.print(v + " ");
}
}

Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.
Try replacing your implementation of hashCode with the following:
public int hashCode()
{
return Objects.hashCode(this.id, this.name);
}

You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...
public class TestIterator {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("1");
list.add("1");
list.add("2");
list.add("3");
for (String s : new UniqueIterator<String>(list)) {
System.out.println(s);
}
}
}
public class UniqueIterator<T> implements Iterable<T> {
private Set<T> hashSet = new HashSet<T>();
public UniqueIterator(Iterable<T> iterable) {
for (T t : iterable) {
hashSet.add(t);
}
}
public Iterator<T> iterator() {
return hashSet.iterator();
}
}

Java (ArrayList check with object's int)

I have to make an ArrayList that contains an object, the object has one int for year lets say 1
and I don't what another object with the same year 1.
If one object has the int = 1 , i dont want another object with that int(1) in my list.
i want to deny it.
Should I try using equal?
something like
#Override
public boolean equals(Object o){
Object object = (Object)o;
return this.getInt.equals(object.getInt());
}

Either use a Set...which explicitly disallows duplicates, or check if the list contains the element on insertion.
#Override
public boolean add(T element) {
if(contains(element)) {
return false;
} else {
return super.add(element);
}
}
Overriding equals wouldn't get you very far, as you'd be overriding it for the List itself (i.e. you'd be checking if two lists were equal).

Perhaps you can try using a HashMap linked that links that "int" with the object. That could be:
Map<Integer, Object> map = new HashMap<>();
map.put(object.getInt(), object);
...
//Each time you put a new object you could try this:
if(!map.contains(object.getInt()))
map.put(object.getInt, object);
//And you can retrieve your object by an int
int a = 1;
Object obj = map.get(1);

In this case, as the value is of type int, you can use equal operator.
public boolean equals(Object o){
Object object = (Object)o;
return (this.getInt()==object.getInt());
}
For this kind of requirement, ArrayList is not suggestible. As mentioned in the other answers try using HashMap.

Yes, you can. When you call
myArrayList.contains(myObejct);
the ArrayList will invode myObejct's equals method. So you can tell if the object is already in you list.
And I think you can change you method a little,
#Override
public boolean equals(Object o){
if (!(o instanceof YourClass))
return false;
YourClass object = (YourClass)o;
return this.getInt.equals(object.getInt());
}
because if you don't, the method "getInt" might cause a MethodNotFound exception.

Well, that is one way to approach the problem.
Your equals will probably work provided that you change Object object = (Object)o; to cast to the real class.
However, equals ought to cope with the case where o is not of the expected type. The contract requires you should return false rather than throwing a ClassCastException ...
You would then use list.contains(o) to test if an object with the same int value exists in the list. For example:
if (!list.contains(o)) {
list.add(o);
}
But when you override equals, it is best practice to also override hashcode ... so that your class continues to satisfy the equals / hashcode invariants. (If you neglect to do that, hash-based data structures will break for your class.)
However, this won't scale well, because the contains operation on an ArrayList has to test each element in the list, one at a time. As the list gets longer, the contains call takes longer ... in direct proportion; i.e. O(N) ... using Big O complexity notation.
So it may be better to use a Set implementation of some kind instead on ArrayList. Fepending on which set implementation you choose, you will get complexity of O(1) or O(logN). But the catch is that you will either have to to implement hashcode (for a HashSet or LinkedHashSet), or implement either Comparable or a Comparator (for a TreeSet).

How to create an equals method that compares two object arrays of a class?

I am trying to create a equals method that compares two objects. The thing is, I'm a bit a of new to this stuff so I'll try to explain my goal as easy as possible.
public class A {
...
}
public class B {
private A[] arr = new A[10];
public boolean equals(A[] temp) {
//compare
}
}
Assume the code above is a summary of what I have. Now, assume I had: arr.equals(Obj)
Obj being another A[] object. Now in my equals statement, I want to reference the original arr array, how do I go about doing that?
For example, let's say I wanted to compare arr's length to temp's length (aka Obj's length), how would I do that? I know it would be something like (temp.length == arr.length) but how do I access arr when I pass it through by doing arr.equals(obj)?
EDIT: Just to clarify, assume the objects aren't simple arrays. So for instance, class A could have a Name, a Type (Both Strings) and possibly a Quantity (an int), so I wouldn't be able to simply compare them like they're two normal arrays.
Thanks!

You can use java.util.Arrays.equals(Object[] a, Object[] a2) which tests if the two specified arrays of Objects are equal to one another

Use the keyword this, which always represents the object you are applying the method to (immediately before the dot). For example:
public boolean equals(A[] temp) {
return this.length == temp.length ;
}
Now, in the particular case of your code, you are not defining method equals as part of class A, but of a class B whose instances contain arr. Then, the solution would be:
public boolean equals(A[] temp) {
return this.arr.length == temp.length ;
}

Write a equals mwthod in your class A
public class A {
...
//Override equals method.
}
Now if you want to compare 2 arrays of class A you can simply use java.utils.Arrays.equals(A a1[], A a2[]);
You have to override equals method in class A coz java.utils.Arrays.equals internally uses class A's equals.
Here is an example, go through it.

Java HashSet contains duplicates if contained element is modified

Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.
However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.
Here is the code which demonstrates this:
public static void main(String[] args)
{
HashSet<GraphEdge> set = new HashSet<>();
GraphEdge edge1 = new GraphEdge(1, "a");
GraphEdge edge2 = new GraphEdge(2, "b");
GraphEdge edge3 = new GraphEdge(3, "c");
set.add(edge1);
set.add(edge2);
set.add(edge3);
edge2.setId(1);
edge2.setName("a");
for(GraphEdge edge: set)
{
System.out.println(edge.toString());
}
if(edge2.equals(edge1))
{
System.out.println("Equals");
}
else
{
System.out.println("Not Equals");
}
}
public class GraphEdge
{
private int id;
private String name;
//Constructor ...
//Getters & Setters...
public int hashCode()
{
int hash = 7;
hash = 47 * hash + this.id;
hash = 47 * hash + Objects.hashCode(this.name);
return hash;
}
public boolean equals(Object o)
{
if(o == this)
{
return true;
}
if(o instanceof GraphEdge)
{
GraphEdge anotherGraphEdge = (GraphEdge) o;
if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
{
return true;
}
}
return false;
}
}
The output from the above code:
1 a
1 a
3 c
Equals
Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?
A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.

The situation you describe is invalid. See the Javadoc: "The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set."

To add to #EJP's answer, what will happen in practice if you mutate objects in a HashSet to make them duplicates (in the sense of the equals / hashcode contract) is that the hash table data structure will break.
Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g. contains and other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.
If you iterate the set, both instances will still be present ... in violation of the Set contract.
Of course, this is very broken from the application perspective.
You can avoid this problem by either:
using an immutable type for your set elements,
making a copy of the objects as you put them into the set and / or pull them out of the set,
writing your code so that it "knows" not to change the objects for the duration ...
From the perspective of correctness and robustness, the first option is clearly best.
Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?

You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.
If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.

HashSet is not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdge immutable. For example:
GraphEdge edge4 = edge2.changeName("new_name");
In the case where GraphEdge is immutable, changing a value result in returning a new instance rather changing the existing instance.

method that can be used to print the elements of a LinkedList of String objects, without any duplicate elements. The method takes a LinkedList object as an input, and then creates a new HashSet object. The method then iterates through the elements of the input LinkedList, and adds each element to the HashSet. Since a HashSet does not allow duplicate elements, this ensures that only unique elements are added to the HashSet.
Then, the method iterates through the HashSet and prints each element to the console, separated by a space. Unlike the printList method, this method does not add any newlines before or after the list of elements. It simply prints the string "Non-duplicates are: " followed by the elements of the HashSet.
public static void printSetList(LinkedList<String> list) {
Set<String> hashSet = new HashSet<>();
for (String v : list) {
hashSet.add(v);
}
System.out.print("Non-duplicates are: ");
for (String v : hashSet) {
System.out.print(v + " ");
}
}

Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.
Try replacing your implementation of hashCode with the following:
public int hashCode()
{
return Objects.hashCode(this.id, this.name);
}

You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...
public class TestIterator {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("1");
list.add("1");
list.add("2");
list.add("3");
for (String s : new UniqueIterator<String>(list)) {
System.out.println(s);
}
}
}
public class UniqueIterator<T> implements Iterable<T> {
private Set<T> hashSet = new HashSet<T>();
public UniqueIterator(Iterable<T> iterable) {
for (T t : iterable) {
hashSet.add(t);
}
}
public Iterator<T> iterator() {
return hashSet.iterator();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.