Removing duplicates in TreeSet - java

I've been using ArrayList for my project to store a cricket team players and order them.
I started thinking about using a TreeSet because of its advantage of removing duplicates.
However the problem I'm having is that if for example I create the following two players:
P p1 = new P("Jack","Daniel",33(age),180(height),78(weight),41(games played),2300
(runs scored),41(dismisses))
P p2 = new P("Jack","Daniel",37(age),185(height),79(weight),45(games played),2560
(runs scored),45(dismisses))
Notice that the two players have the same first and last name, but everything else is different. When I try to add these two players to the TreeSet, it considers them duplicates because of the names similarities and removes the second one. Obviously I don't want this to happen and I want the Set to remove a player only if everything he has is the same as another player, and not just the first and last names.
Is there a way of achieving this?
Also my TreeSet takes a Player object.

Originally, this answer neglected the fact that a TreeSet does its comparisons based on compareTo(), rather than equals(). Edits have been made to address this.
You need to define equals(), hashCode() and compareTo() for your Player object correctly. (Since it's a TreeSet and not a HashSet, implementing hashCode() isn't so important - but it's good practice.)
Equals and hashCode need to take into account all of the fields. Eclipse can auto-generate one for you that will look similar to this (Source > Generate hashcode and equals).
If you already have a natural sort order that doesn't use all of the fields, then you could supply a custom comparator to your TreeSet. However, even if you really only want to sort by a subset of the fields, there's nothing stopping you sorting by all fields (with the uninteresting fields only playing a part of the interesting parts are identical). The important thing to note here is that a TreeSet determines equality not by the equals() method, but by compareTo() == 0.
Here's an example equals():
#Override
public boolean equals(Object obj)
{
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
Player that = (Player) obj;
return this.age == that.age &&
this.height == that.height &&
this.weight == that.weight &&
this.games == that.games &&
this.runs == that.runs &&
this.dismisses == that.dismisses &&
this.given.equals(that.given) &&
this.family.equals(that.family);
}
And here's hashcode:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + this.age;
result = prime * result + this.dismisses;
result = prime * result + this.family.hashCode());
result = prime * result + this.games;
result = prime * result + this.given.hashCode());
result = prime * result + this.height;
result = prime * result + this.runs;
result = prime * result + this.weight;
return result;
}
Finally, here's a compareTo:
public int compareTo(Player that)
{
int result;
result = this.family.compareTo(that.family);
if (result != 0) // is the family name different?
{
return result; // yes ... use it to discriminate
}
result = this.given.compareTo(that.given);
if (result != 0) // is the given name different?
{
return result; // yes ... use it to discriminate
}
result = this.age - that.age; // is the age different?
if (result != 0)
{
return result; // yes ... use it to discriminate
}
... (and so on) ...
... with the final one ...
return this.dismisses - that.dismisses; // only thing left to discriminate by
}

a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
From Java Platform Standard Edition 8 Documentation TreeSet part.

class Student implements Comparable<Student> {
String name;
public Student(String name) {
this.name=name;
}
public String toString(){
return name;
}
public int compareTo(Student gStudent) {
if(!this.name.equals(gStudent.getName()))
return 1;
return 0;
}
private String getName() {
return name;
}
}

Related

How to sort sortedset by value that can be duplicate?

In Java 1.7, I have a "Post class" that has the Post ID and the number of votes of every Post. I want to create sorted set of Posts that can be always sorted by the number of votes. Please be informed that different Posts can have the same number of votes.
The problem is that when I create 2 different Posts with 2 different IDs and different number of votes, the sorted set detects that they are different Posts and thus add them twice instead of replacing the existing thread with the number of new votes. The example below
Post Class:
public class Post implements Comparable<Post> {
protected int id;
protected int votes;
public Post(int id) {
this.id = id;
this.votes = 0;
}
public Post(int id, int votes) {
this.id = id;
this.votes = votes;
}
#Override
public boolean equals(Object o) {
if (o == null || getClass() != o.getClass()) {
return false;
}
Post post= (Post) o;
return id == employee.id;
}
#Override
public int hashCode() {
return Objects.hash(this.id);
}
#Override
public int compareTo(Post t) {
int diff = ((Integer) t.votes).compareTo(this.votes);
if (diff == 0) {
return ((Integer) t.id).compareTo(this.id);
}
return diff;
}
}
Run Method:
public void run() {
SortedSet<Post> set = new TreeSet<Post>();
Post t1 = new Post(1, 30);
Post t2 = new Post(1, 40);
Post t3 = new Post(2, 100);
set.add(t1);
set.add(t2);
set.add(t3);
for (Post t : set) {
System.err.println(t.id + " >> " + t.votes);
}
}
Expected Output:
2 >> 100
1 >> 40
Actual Output
2 >> 100
1 >> 40
1 >> 30
As you can see the problem is that the same Post appeared twice in the set which is not the desired output.
I also tried to avoid using Comparable interface and instead I used Comparator, yet, I got the same result.
Comparator Class:
class CompareByVotes implements Comparator<Post> {
#Override
public int compare(Post t1, Post t2) {
int diff = ((Integer) t2.votes).compareTo(t1.votes);
if (diff == 0) {
return ((Integer) t2.id).compareTo(t1.id);
}
return diff;
}
}
Question:
Any changes required to get it work as desired ?
Your compareTo() method doesn't return 0 when the objects you compare are equal based on the equals() method. However, this is required by the SortedSet interface:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
So your compareTo() method must return 0 when they are equal. One possible solution would be something like this:
public int compareTo(Post t) {
if (equals(t)) {
return 0;
}
int diff = ((Integer) t.votes).compareTo(this.votes);
if (diff == 0) {
return ((Integer) t.id).compareTo(this.id);
}
return diff;
}
Also, keep in mind that add() does not "overwrite" the object, when an equal object is already in the set. See the documentation of add():
[...] If this set already contains the element, the call leaves the set unchanged and returns false.

Compare two Java Collections using Comparator instead of equals()

Problem Statement
I have two Collections of the same type of object that I want to compare. In this case, I want to compare them based on an attribute that does not factor into equals() for the Objects. In my example, I'm using ranked collections of Names for instance:
public class Name {
private String name;
private int weightedRank;
//getters & setters
#Override
public boolean equals(Object obj) {
return this.name.equals(obj.name); //Naive implementation just to show
//equals is based on the name field.
}
}
I want to compare the two Collections to assert that, for position i in each Collection, the weightedRank of each Name at that position is the same value. I did some Googling but didn't find a suitable method in Commons Collections or any other API so I came up with the following:
public <T> boolean comparatorEquals(Collection<T> col1, Collection<T> col2,
Comparator<T> c)
{
if (col1 == null)
return col2 == null;
if (col2 == null)
return false;
if (col1.size() != col2.size())
return false;
Iterator<T> i1 = col1.iterator(), i2 = col2.iterator();
while(i1.hasNext() && i2.hasNext()) {
if (c.compare(i1.next(), i2.next()) != 0) {
return false;
}
}
return true;
}
Question
Is there another way to do this? Did I miss an obvious method from Commons Collections?
Related
I also spotted this question on SO which is similar though in that case I'm thinking overriding equals() makes a little more sense.
Edit
Something very similar to this will be going into a release of Apache Commons Collections in the near future (at the time of this writing). See https://issues.apache.org/jira/browse/COLLECTIONS-446.
You could use the Guava Equivalence class in order to decouple the notions of "comparing" and "equivalence". You would still have to write your comparing method (AFAIK Guava does not have it) that accepts an Equivalence subclass instead of the Comparator, but at least your code would be less confusing, and you could compare your collections based on any equivalence criteria.
Using a collection of equivance-wrapped objects (see the wrap method in Equivalence) would be similar to the Adapter-based solution proposed by sharakan, but the equivalence implementation would be decoupled from the adapter implementation, allowing you to easily use multiple Equivalence criteria.
You can use new isEqualCollection method added to CollectionUtils since version 4. This method uses external comparsion mechanism provided by Equator interface implementation. Please, check this javadocs: CollectionUtils.isEqualCollection(...) and Equator.
I'm not sure this way is actually better, but it is "another way"...
Take your original two collections, and create new ones containing an Adapter for each base object. The Adapter should have .equals() and .hashCode() implemented as being based on Name.calculateWeightedRank(). Then you can use normal Collection equality to compare the collections of Adapters.
* Edit *
Using Eclipse's standard hashCode/equals generation for the Adapter. Your code would just call adaptCollection on each of your base collections, then List.equals() the two results.
public class Adapter {
public List<Adapter> adaptCollection(List<Name> names) {
List<Adapter> adapters = new ArrayList<Adapter>(names.size());
for (Name name : names) {
adapters.add(new Adapter(name));
}
return adapters;
}
private final int name;
public Adapter(Name name) {
this.name = name.getWeightedResult();
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + name;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Adapter other = (Adapter) obj;
if (name != other.name)
return false;
return true;
}
}
EDIT: Removed old answer.
Another option that you have is creating an interface called Weighted that could look like this:
public interface Weighted {
int getWeightedRank();
}
Then have your Name class implement this interface. Then you could change your method to look like this:
public <T extends Weighted> boolean weightedEquals(Collection<T> col1, Collection<T> col2)
{
if (col1 == null)
return col2 == null;
if (col2 == null)
return false;
if (col1.size() != col2.size())
return false;
Iterator<T> i1 = col1.iterator(), i2 = col2.iterator();
while(i1.hasNext() && i2.hasNext()) {
if (i1.next().getWeightedRank() != i2.next().getWeightedRank()) {
return false;
}
}
return true;
}
Then as you find additional classes that need to be weighted and compared you can put them in your collection and they could be compared with each other as well. Just an idea.

Need proper hashCode when comparing Object with unordered pair of integers as variables

I have a class
final class BuildingPair {
int mBA;
int mBB;
public BuildingPair(int pBuildingA,int pBuildingB) {
mBA = pBuildingA;
mBB = pBuildingB;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + mBA;
result = prime * result + mBB;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
BuildingPair other = (BuildingPair) obj;
if ((mBA==other.mBA&&mBB==other.mBB)||(mBA==other.mBB&&mBB==other.mBA)) return true;
return false;
}
}
I want to compare two objects , and when both have the same buildings ids they are equal
so they need to be equal in both directions when :
BuildingPair(1,2) vs BuildingPair(2,1)
BuildingPair(1,2) vs BuildingPair(1,2)
BuildingPair(2,1) vs BuildingPair(1,2)
i think equals method is ok, but hashcode is wrong.
You need something that computes the same result whether passed A,B or B,A. There may be far more subtle solutions, but I'd probably just go for:
#Override
public int hashCode() {
return mBA * mBB;
}
Or anything else which uses an operator that is commutative.
Alternatively, you could change your constructor so that it always stores min(a,b) in mBA and max(a,b) in mBB - you can then simplify your comparison code and keep your hash code as it currently is.
You need a symmetric hashcode (hashcode(a,b) == hashcode(b,a)), for example:
return mBB ^ mBA;
(your current code is not symmetric - for example: hascode (2,1) = 1024 but hashcode(1,2) = 994)
Note: this is inspired from the hashcode of Long:
return (int)(value ^ (value >>> 32));
If they are unordered you can use an arbitrary order which simplifies the rest of the code.
public BuildingPair(int pBuildingA,int pBuildingB) {
mBA = Math.min(pBuildingA, pBuildingB);
mBB = Math.max(pBuildingA, pBuildingB);
}
code the rest of the methods as normal and BuildingPair(2,1) will be exactly the same as BuildingPair(1,2)

java: Object1 & Object2 same as Object2 & Object1

I try to draw lines between different GridPositions(x,y). Every GridPos has 4 Connections North, East, South, West. The Problem is if I paint a line from GridPos(1,1) to GridPos(2,2) the program will paint also a line in reverse direction between GridPos(2,2) and GridPos(1,1) later.
I tried to solve the problem with this class (WarpGate is the same as GridPos):
public class GateConnection {
private WarpGate gate1 = null;
private WarpGate gate2 = null;
public GateConnection(WarpGate gate1, WarpGate gate2) {
super();
this.gate1 = gate1;
this.gate2 = gate2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = prime * ((gate1 == null) ? 0 : gate1.hashCode());
result += prime * ((gate2 == null) ? 0 : gate2.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
GateConnection other = (GateConnection) obj;
if ((gate1.equals(other.gate1) || gate1.equals(other.gate2)) && (gate2.equals(other.gate2) || gate2.equals(other.gate1))) {
return true;
}
return false;
}
}
This Class could be added to an HashSet and the double painting would be gone but I don't know if the hashValue is always unique.
HashCode of WarpGate (auto-generated by eclipse):
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + gridX;
result = prime * result + gridY;
return result;
}
For now I use an ArrayList. I look if the GateConnection exists, if not then add. But this version takes much more ressources than using a HashSet.
EDIT:
The white rectangles are the connections which are painted, the numbers are the GridPositions(x|y) and the red Arrows are the two directions the rectangle is painted because GridPos(2|2) has a connection to GridPos(4|2) and (4|2) to (2|2)
A TreeSet neither uses hashCode() nor equals(). It uses compareTo(), though you should ensure it is consistent with equals() to respect Set semantics.
For a HashSet, the hashCode() of a stored object does not have to be unique. In fact, you can return the same code for every item if you want and they will still be stored without losing any items, if your equals() is implemented correctly. A good hashCode() will improve performance only.
The only critical rule is that two equal items must generate the same hash code.
Your implementation looks OK as long as you can guarantee that gate1 and gate2 are never equal within the same GateConnection object. If they are equal, two GateConnection objects could have different hash codes but be reported as equal. That would lead to unpredictable behaviour if they are stored in a HashSet.
E.g. GateConnection((1,1), (1,1)) equals GateConnection((1,1), (7,9)) but the hash codes are different.

Sorting custom data structure on Key in TreeMap

I am trying to sort a TreeMap on key. Key is some custom DataStructure having int, List, String, etc.
The member on which I am expecting a sort has some duplicates. Let's say that member is Rank. More than 1 object can have same rank.
Simplified version example:
NOTE: in the CompareTo method below 0 is not returned intentionally to NOT ignore duplicates.(Please correct me if this is not the right way to avoid duplicates)
import java.util.TreeMap;
public class TreeTest {
public static void main(String[] args) {
TreeMap<Custom,String> t = new TreeMap<Custom,String>();
Custom c1 = new Custom();
c1.setName("a");
c1.setRank(0);
Custom c2 = new Custom();
c2.setName("b");
c2.setRank(1);
Custom c3 = new Custom();
c3.setName("c");
c3.setRank(0);
t.put(c1, "first");
t.put(c2, "Second");
t.put(c3, "Third");
System.out.println(t.keySet());
for(Custom c:t.keySet()){
System.out.println(t.get(c));
}
}
}
And Custom Object
package com.example.ui;
public class Custom implements Comparable<Custom>{
int rank;
String name;
public int getRank() {
return rank;
}
public void setRank(int rank) {
this.rank = rank;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
result = prime * result + rank;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Custom other = (Custom) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
if (rank != other.rank)
return false;
return true;
}
// 0 is not returned intentionally to NOT ignore duplicates.
public int compareTo(Custom o) {
if(o.rank>this.rank)
return 1;
if(o.rank==this.rank)
return -1;
return -1;
}
}
Output::
[com.example.ui.Custom#fa0, com.example.ui.Custom#fbe, com.example.ui.Custom#f80]
null
null
null
Expected:
First, Second, Third based on Rank 0,1,0 respectively.
I looked at couple of examples on Google. Most of them were basic usage on TreeMap sort using keys or values with primitive datatypes, but none with duplicates when sorting member
is a part of custom key DataStructure.
Please help?
The problem is that your implementation of compareTo is not consistent with equals, which is required by TreeMap. From the API docs:
Note that the ordering maintained by a sorted map (whether or not an
explicit comparator is provided) must be consistent with equals if
this sorted map is to correctly implement the Map interface.
One possible consistent implementation would be to first compare by rank and then by name if the rank values are equal. For two instances of Custom with equal ranks and identical names you should not expect to be able to store them both as keys within the same Map - This violates the contract of Map.
public int compareTo(Custom o) {
int ret = this.rank - o.rank;
// Equal rank so fall back to comparing by name.
if (ret == 0) {
ret = this.name.compareTo(o.name);
}
return ret;
}
As mentioned, your implementation of equals and compareTo are not consistent with each other. If I read your question correctly, what you require is to preserve duplicates that have the same key. I'd recommend you to look into the TreeMultimap of the Google Guava collections. It creates set containers for each value object sothat different values having the same key are preserved.
e.g.
treeMultimap.put ("rank1", "Joe");
treeMultimap.put ("rank1", Jane");
treeMultimap.get ("rank1"); // Set("Joe","Jane");
The constrain in this data structure is that K,V pairs must be unique. That is, you can't insert ("rank1", "Joe") twice in the Multimap.
One important note: The reason why you see so many examples of Map, using simple types and, in particular, strings, is that keys in a map must be immutable. The equals and hashcode values of an object must not change in the time it's used as a key in a map. Translated to your example, you cannot do customObject.setRank(...) and updates a rank value when it's used as a key. To do so, you first need to remove the key and its values, update it and then insert it again.
You can also do it by implementing Comparator as anonymous inner type and override compare() to return desired comparison.
public class TreeMaps
{
public static void main(String[] args)
{
Custom c1 = new Custom(1,"A");
Custom c2 = new Custom(3,"C");
Custom c3 = new Custom(2,"B");
TreeMap<Custom , Integer > tree = new TreeMap<Custom, Integer> (new Comparator<Custom>() {
#Override
public int compare(Custom o1, Custom o2) {
return o1.rank - o2.rank;
}
});
tree.put(c1, 1);
tree.put(c2, 2);
tree.put(c3, 3);
System.out.println(tree);
}
}
class Custom
{
int rank ;
String name ;
public Custom(int rank , String name) {
this.rank = rank ;
this.name = name ;
}
#Override
public String toString()
{
return "Custom[" + this.rank + "-" + this.name + "]" ;
}
}

Categories