I want to remove duplicates from a list but what I am doing is not working:
List<Customer> listCustomer = new ArrayList<Customer>();
for (Customer customer: tmpListCustomer)
{
if (!listCustomer.contains(customer))
{
listCustomer.add(customer);
}
}
Assuming you want to keep the current order and don't want a Set, perhaps the easiest is:
List<Customer> depdupeCustomers =
new ArrayList<>(new LinkedHashSet<>(customers));
If you want to change the original list:
Set<Customer> depdupeCustomers = new LinkedHashSet<>(customers);
customers.clear();
customers.addAll(dedupeCustomers);
If the code in your question doesn't work, you probably have not implemented equals(Object) on the Customer class appropriately.
Presumably there is some key (let us call it customerId) that uniquely identifies a customer; e.g.
class Customer {
private String customerId;
...
An appropriate definition of equals(Object) would look like this:
public boolean equals(Object obj) {
if (obj == this) {
return true;
}
if (!(obj instanceof Customer)) {
return false;
}
Customer other = (Customer) obj;
return this.customerId.equals(other.customerId);
}
For completeness, you should also implement hashCode so that two Customer objects that are equal will return the same hash value. A matching hashCode for the above definition of equals would be:
public int hashCode() {
return customerId.hashCode();
}
It is also worth noting that this is not an efficient way to remove duplicates if the list is large. (For a list with N customers, you will need to perform N*(N-1)/2 comparisons in the worst case; i.e. when there are no duplicates.) For a more efficient solution you could use a HashSet to do the duplicate checking. Another option would be to use a LinkedHashSet as explained in Tom Hawtin's answer.
java 8 update
you can use stream of array as below:
Arrays.stream(yourArray).distinct()
.collect(Collectors.toList());
Does Customer implement the equals() contract?
If it doesn't implement equals() and hashCode(), then listCustomer.contains(customer) will check to see if the exact same instance already exists in the list (By instance I mean the exact same object--memory address, etc). If what you are looking for is to test whether or not the same Customer( perhaps it's the same customer if they have the same customer name, or customer number) is in the list already, then you would need to override equals() to ensure that it checks whether or not the relevant fields(e.g. customer names) match.
Note: Don't forget to override hashCode() if you are going to override equals()! Otherwise, you might get trouble with your HashMaps and other data structures. For a good coverage of why this is and what pitfalls to avoid, consider having a look at Josh Bloch's Effective Java chapters on equals() and hashCode() (The link only contains iformation about why you must implement hashCode() when you implement equals(), but there is good coverage about how to override equals() too).
By the way, is there an ordering restriction on your set? If there isn't, a slightly easier way to solve this problem is use a Set<Customer> like so:
Set<Customer> noDups = new HashSet<Customer>();
noDups.addAll(tmpListCustomer);
return new ArrayList<Customer>(noDups);
Which will nicely remove duplicates for you, since Sets don't allow duplicates. However, this will lose any ordering that was applied to tmpListCustomer, since HashSet has no explicit ordering (You can get around that by using a TreeSet, but that's not exactly related to your question). This can simplify your code a little bit.
List → Set → List (distinct)
Just add all your elements to a Set: it does not allow it's elements to be repeated. If you need a list afterwards, use new ArrayList(theSet) constructor afterwards (where theSet is your resulting set).
I suspect you might not have Customer.equals() implemented properly (or at all).
List.contains() uses equals() to verify whether any of its elements is identical to the object passed as parameter. However, the default implementation of equals tests for physical identity, not value identity. So if you have not overwritten it in Customer, it will return false for two distinct Customer objects having identical state.
Here are the nitty-gritty details of how to implement equals (and hashCode, which is its pair - you must practically always implement both if you need to implement either of them). Since you haven't shown us the Customer class, it is difficult to give more concrete advice.
As others have noted, you are better off using a Set rather than doing the job by hand, but even for that, you still need to implement those methods.
private void removeTheDuplicates(List<Customer>myList) {
for(ListIterator<Customer>iterator = myList.listIterator(); iterator.hasNext();) {
Customer customer = iterator.next();
if(Collections.frequency(myList, customer) > 1) {
iterator.remove();
}
}
System.out.println(myList.toString());
}
The "contains" method searched for whether the list contains an entry that returns true from Customer.equals(Object o). If you have not overridden equals(Object) in Customer or one of its parents then it will only search for an existing occurrence of the same object. It may be this was what you wanted, in which case your code should work. But if you were looking for not having two objects both representing the same customer, then you need to override equals(Object) to return true when that is the case.
It is also true that using one of the implementations of Set instead of List would give you duplicate removal automatically, and faster (for anything other than very small Lists). You will still need to provide code for equals.
You should also override hashCode() when you override equals().
Nearly all of the above answers are right but what I suggest is to use a Map or Set while creating the related list, not after to gain performance. Because converting a list to a Set or Map and then reconverting it to a List again is a trivial work.
Sample Code:
Set<String> stringsSet = new LinkedHashSet<String>();//A Linked hash set
//prevents the adding order of the elements
for (String string: stringsList) {
stringsSet.add(string);
}
return new ArrayList<String>(stringsSet);
Two suggestions:
Use a HashSet instead of an ArrayList. This will speed up the contains() checks considerably if you have a long list
Make sure Customer.equals() and Customer.hashCode() are implemented properly, i.e. they should be based on the combined values of the underlying fields in the customer object.
As others have mentioned, you are probably not implementing equals() correctly.
However, you should also note that this code is considered quite inefficient, since the runtime could be the number of elements squared.
You might want to consider using a Set structure instead of a List instead, or building a Set first and then turning it into a list.
The cleanest way is:
List<XXX> lstConsultada = dao.findByPropertyList(YYY);
List<XXX> lstFinal = new ArrayList<XXX>(new LinkedHashSet<GrupoOrigen>(XXX));
and override hascode and equals over the Id's properties of each entity
IMHO best way how to do it these days:
Suppose you have a Collection "dups" and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick.
Collection<collectionType> noDups = new HashSet<collectionType>(dups);
It works by creating a Set which, by definition, cannot contain duplicates.
Based on oracle doc.
The correct answer for Java is use a Set. If you already have a List<Customer> and want to de duplicate it
Set<Customer> s = new HashSet<Customer>(listCustomer);
Otherise just use a Set implemenation HashSet, TreeSet directly and skip the List construction phase.
You will need to override hashCode() and equals() on your domain classes that are put in the Set as well to make sure that the behavior you want actually what you get. equals() can be as simple as comparing unique ids of the objects to as complex as comparing every field. hashCode() can be as simple as returning the hashCode() of the unique id' String representation or the hashCode().
Using java 8 stream api.
List<String> list = new ArrayList<>();
list.add("one");
list.add("one");
list.add("two");
System.out.println(list);
Collection<String> c = list.stream().collect(Collectors.toSet());
System.out.println(c);
Output:
Before values : [one, one, two]
After Values : [one, two]
Related
I have following scenario (modified one than actual business purpose).
I have a program which predicts how much calories a person will
loose for the next 13 weeks based on certain attributes.
I want to cache this result in the database so that i don't call the
prediction again for the same combination.
I have class person
class Person { int personId; String weekStartDate; }
I have HashMap<List<Person>, Integer> - The key is 13 weeks data of a person and the value is the prediction
I will keep the hashvalue in the database for caching purpose
Is there a better way to handle above scenario? Any design pattern to support such scenarios
Depends: the implementation of hashCode() uses the elements of your list. So adding elements later on changes the result of that operation:
public int hashCode() {
int hashCode = 1;
for (E e : this)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
return hashCode;
}
Maps aren't build for keys that can change their hash values! And of course, it doesn't really make sense to implement that method differently.
So: it can work when your lists are all immutable, meaning that neither the list nor any of its members is modified after the list was used as key. But there is a certain risk: if you forget about that contract later on, and these lists see modifications, then you will run into interesting issues.
This works because the hashcode of the standard List implementations is computed with the hashcodes of the contents. You need to make sure, however, to also implement hashCode and equals in the Person class, otherwise you will get the same problem this guy had. See also my answer on that question.
I would suggest you define a class (say Data) and use it as a key in your hashmap. Override equals/hashcode accordingly with knowledge of data over weeks.
I want to handle a set of objects of class (MyClass) in a HashSet. When I try to add an object that already exists (relying on equals an hashCode of MyClass), the method return false. Is there a way/method to get in return the actual object that already exists?
Please give me any advice to handle that collection of object be able to get the existing object in return when add returns false?
Just check if the hashset contains you're object:
if (hashSet.contains(obj)) {
doWhateverWith(obj);
}
Short of iterating over the set, no, there is no way to get the existing member of the set that is equal to the value just added. The best way to do that would be to write a set wrapper around HashMap that maps each added value to itself.
If equals(..) returns true, then the objects are the same, so you can use the one you are trying to add to the set.
Why would you let it return the object which you're trying to add? You already have it there!
Just do something like:
if (!set.add(item)) {
// It already contains the item.
doSomethingWith(item);
}
If that does not achieve the desired result, then it simply means that the item's equals() is poorly implemented.
One possible way would be:
myClass[] myArray = mySet.toArray(new myClass[mySet.size]);
List myList = Arrays.asList(myArray);
MyClass myObject = myList.get(myList.indexof(myObject));
But of course as some people pointed out if it failed to get inserted, then that element is the element you are looking for, unless of course that you want what is stored in that memory location, and not what the equals and hashCode tells you, i.e. not the logically equal object, but the == object.
When using a HashSet, no, as far as I know you can't do that except by iterating over the whole thing and calling equals() on each one. You could, however, use a HashMap and just map every object to itself. Then call put(), which will return the previously mapped value, if any.
http://download.oracle.com/javase/6/docs/api/java/util/HashSet.html#contains(java.lang.Object)
You can use that method to check if you like; but the thing you have to keep in mind is that you already have the object that exists.
I've been working all day and I somehow can't get this probably easy task figured out - probably a lack of coffee...
I have a synchronizedList where some Objects are being stored. Those objects have a field which is something like an ID. These objects carry information about a user and his current state (simplified).
The point is, that I only want one object for each user. So when the state of this user changes, I'd like to remove the "old" entry and store a new one in the List.
protected static class Objects{
...
long time;
Object ID;
...
}
...
if (Objects.contains(ID)) {
Objects.remove(ID);
Objects.add(newObject);
} else {
Objects.add(newObject);
}
Obviously this is not the way to go but should illustrate what I'm looking for...
Maybe the data structure is not the best for this purpose but any help is welcome!
EDIT:
Added some information...
A Set does not really seem to fit my purpose. The Objects store some other fields besides the ID which change all the time. The purpose is, that the list will somehow represent the latest activities of a user. I only need to track the last state and only keep that object which describes this situation.
I think I will try out re-arranging my code with a Map and see if that works...
You could use a HashMap (or LinkedHashMap/TreeMap if order is important) with a key of ID and a value of Objects. With generics that would be HashMap<Object, Objects>();
Then you can use
if (map.containsKey(ID)) {
map.remove(ID);
}
map.put(newID, newObject);
Alternatively, you could continue to use a List, but we can't just modify the collection while iterating, so instead we can use an iterator to remove the existing item, and then add the new item outside the loop (now that you're sure the old item is gone):
List<Objects> syncList = ...
for (Iterator<Objects> iterator = syncList.iterator(); iterator.hasNext();) {
Objects current = iterator.next();
if (current.getID().equals(ID)) {
iterator.remove();
}
}
syncList.add(newObject);
And you can't use a Set to have only the first one stored ?
because it basically is precisely what you require.
You could use a HashSet to store the objects and then override the hashCode method in the class that the HashSet will contain to return the hashcode of your identifying field.
A Map is easiest, but a Set reflects your logic better. In that case I'd advice a Set.
There are 2 ways to use a set, depending on the equals and hashCode of your data object.
If YourObject already uses the ID object to determine equals (and hashCode obeys the contract) you can use any Set you want, a HashSet is probably best then.
If YourObjects business logic requires a different equals, taking into account multiple fields beside the ID field, then a custom comparator should be used. A TreeSet is a Set which can use such a Comparator.
An example:
Comparator<MyObject> comp = new Comparator<MyObject>{
public int compare(MyObject o1, MyObject o2) {
// NOTE this compare is not very good as it obeys the contract but
// is not consistent with equals. compare() == 0 -> equals() != true here
// Better to use some more fields
return o1.getId().hashCode() < o2.getId().hashCode();
}
public boolean equals(Object other) {
return 01.getId().equals(o2.getId());
}
}
Set<MyObject> myObjects = new TreeSet(comp);
EDIT
I have updated the code above to reflect that id is not an int, as suggested by the question.
My first option would be a HashSet, this would require that you override the hashCode and equals methods (don't forget: if you override one, override consistently the other !) so that objects with the same ID field are considered equal.
But this might break something if this assumption is NOT to be made in other parts of your application. In that case you might opt for using a HashMap (with the ID as key) or implement your own MyHashSet class (backed by such a HashMap).
I have a list/collection of objects that may or may not have the same property values. What's the easiest way to get a distinct list of the objects with equal properties? Is one collection type best suited for this purpose? For example, in C# I could do something like the following with LINQ.
var recipients = (from recipient in recipientList
select recipient).Distinct();
My initial thought was to use lambdaj (link text), but it doesn't appear to support this.
return new ArrayList(new HashSet(recipients));
Use an implementation of the interface Set<T> (class T may need a custom .equals() method, and you may have to implement that .equals() yourself). Typically a HashSet does it out of the box : it uses Object.hashCode() and Object.equals() method to compare objects. That should be unique enough for simple objects. If not, you'll have to implement T.equals() and T.hashCode() accordingly.
See Gaurav Saini's comment below for libraries helping to implement equals and hashcode.
Place them in a TreeSet which holds a custom Comparator, which checks the properties you need:
SortedSet<MyObject> set = new TreeSet<MyObject>(new Comparator<MyObject>(){
public int compare(MyObject o1, MyObject o2) {
// return 0 if objects are equal in terms of your properties
}
});
set.addAll(myList); // eliminate duplicates
Java 8:
recipients = recipients.stream()
.distinct()
.collect(Collectors.toList());
See java.util.stream.Stream#distinct.
order preserving version of the above response
return new ArrayList(new LinkedHashSet(recipients));
If you're using Eclipse Collections, you can use the method distinct().
ListIterable<Integer> integers = Lists.mutable.with(1, 3, 1, 2, 2, 1);
Assert.assertEquals(
Lists.mutable.with(1, 3, 2),
integers.distinct());
The advantage of using distinct() instead of converting to a Set and then back to a List is that distinct() preserves the order of the original List, retaining the first occurrence of each element. It's implemented by using both a Set and a List.
MutableSet<T> seenSoFar = Sets.mutable.with();
int size = list.size();
for (int i = 0; i < size; i++)
{
T item = list.get(i);
if (seenSoFar.add(item))
{
targetCollection.add(item);
}
}
return targetCollection;
If you cannot convert your original List into an Eclipse Collections type, you can use ListAdapter to get the same API.
MutableList<Integer> distinct = ListAdapter.adapt(integers).distinct();
Note: I am a committer for Eclipse Collections.
You can use a Set. There's couple of implementations:
HashSet uses an object's hashCode and equals.
TreeSet uses compareTo (defined by Comparable) or compare (defined by Comparator). Keep in mind that the comparison must be consistent with equals. See TreeSet JavaDocs for more info.
Also keep in mind that if you override equals you must override hashCode such that two equals objects has the same hash code.
The ordinary way of doing this would be to convert to a Set, then back to a List. But you can get fancy with Functional Java. If you liked Lamdaj, you'll love FJ.
recipients = recipients
.sort(recipientOrd)
.group(recipientOrd.equal())
.map(List.<Recipient>head_());
You'll need to have defined an ordering for recipients, recipientOrd. Something like:
Ord<Recipient> recipientOrd = ord(new F2<Recipient, Recipient, Ordering>() {
public Ordering f(Recipient r1, Recipient r2) {
return stringOrd.compare(r1.getEmailAddress(), r2.getEmailAddress());
}
});
Works even if you don't have control of equals() and hashCode() on the Recipient class.
Actually lambdaj implements this feature through the selectDistinctArgument method
http://lambdaj.googlecode.com/svn/trunk/html/apidocs/ch/lambdaj/Lambda.html#selectDistinctArgument(java.lang.Object,%20A)
I'm having problems with Iterator.remove() called on a HashSet.
I've a Set of time stamped objects. Before adding a new item to the Set, I loop through the set, identify an old version of that data object and remove it (before adding the new object). the timestamp is included in hashCode and equals(), but not equalsData().
for (Iterator<DataResult> i = allResults.iterator(); i.hasNext();)
{
DataResult oldData = i.next();
if (data.equalsData(oldData))
{
i.remove();
break;
}
}
allResults.add(data)
The odd thing is that i.remove() silently fails (no exception) for some of the items in the set. I've verified
The line i.remove() is actually called. I can call it from the debugger directly at the breakpoint in Eclipse and it still fails to change the state of Set
DataResult is an immutable object so it can't have changed after being added to the set originally.
The equals and hashCode() methods use #Override to ensure they are the correct methods. Unit tests verify these work.
This also fails if I just use a for statement and Set.remove instead. (e.g. loop through the items, find the item in the list, then call Set.remove(oldData) after the loop).
I've tested in JDK 5 and JDK 6.
I thought I must be missing something basic, but after spending some significant time on this my colleague and I are stumped. Any suggestions for things to check?
EDIT:
There have been questions - is DataResult truly immutable. Yes. There are no setters. And when the Date object is retrieved (which is a mutable object), it is done by creating a copy.
public Date getEntryTime()
{
return DateUtil.copyDate(entryTime);
}
public static Date copyDate(Date date)
{
return (date == null) ? null : new Date(date.getTime());
}
FURTHER EDIT (some time later):
For the record -- DataResult was not immutable! It referenced an object which had a hashcode which changed when persisted to the database (bad practice, I know). It turned out that if a DataResult was created with a transient subobject, and the subobject was persisted, the DataResult hashcode was changed.
Very subtle -- I looked at this many times and didn't notice the lack of immutability.
I was very curious about this one still, and wrote the following test:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
import java.util.Set;
public class HashCodeTest {
private int hashCode = 0;
#Override public int hashCode() {
return hashCode ++;
}
public static void main(String[] args) {
Set<HashCodeTest> set = new HashSet<HashCodeTest>();
set.add(new HashCodeTest());
System.out.println(set.size());
for (Iterator<HashCodeTest> iter = set.iterator();
iter.hasNext();) {
iter.next();
iter.remove();
}
System.out.println(set.size());
}
}
which results in:
1
1
If the hashCode() value of an object has changed since it was added to the HashSet, it seems to render the object unremovable.
I'm not sure if that's the problem you're running into, but it's something to look into if you decide to re-visit this.
Under the covers, HashSet uses HashMap, which calls HashMap.removeEntryForKey(Object) when either HashSet.remove(Object) or Iterator.remove() is called. This method uses both hashCode() and equals() to validate that it is removing the proper object from the collection.
If both Iterator.remove() and HashSet.remove(Object) are not working, then something is definitely wrong with your equals() or hashCode() methods. Posting the code for these would be helpful in diagnosis of your issue.
Are you absolutely certain that DataResult is immutable? What is the type of the timestamp? If it's a java.util.Date are you making copies of it when you're initializing the DataResult? Keep in mind that java.util.Date is mutable.
For instance:
Date timestamp = new Date();
DataResult d = new DataResult(timestamp);
System.out.println(d.getTimestamp());
timestamp.setTime(System.currentTimeMillis());
System.out.println(d.getTimestamp());
Would print two different times.
It would also help if you could post some source code.
You should all be careful of any Java Collection that fetches its children by hashcode, in the case that its child type's hashcode depends on its mutable state. An example:
HashSet<HashSet<?>> or HashSet<AbstaractSet<?>> or HashMap variant:
HashSet retrieves an item by its hashCode, but its item type
is a HashSet, and hashSet.hashCode depends on its item's state.
Code for that matter:
HashSet<HashSet<String>> coll = new HashSet<HashSet<String>>();
HashSet<String> set1 = new HashSet<String>();
set1.add("1");
coll.add(set1);
print(set1.hashCode()); //---> will output X
set1.add("2");
print(set1.hashCode()); //---> will output Y
coll.remove(set1) // WILL FAIL TO REMOVE (SILENTLY)
Reason being is HashSet's remove method uses HashMap and it identifies keys by hashCode, while AbstractSet's hashCode is dynamic and depends upon the mutable properties of itself.
Thanks for all the help. I suspect the problem must be with equals() and hashCode() as suggested by spencerk. I did check those in my debugger and with unit tests, but I've got to be missing something.
I ended up doing a workaround-- copying all the items except one to a new Set. For kicks, I used Apache Commons CollectionUtils.
Set<DataResult> tempResults = new HashSet<DataResult>();
CollectionUtils.select(allResults,
new Predicate()
{
public boolean evaluate(Object oldData)
{
return !data.equalsData((DataResult) oldData);
}
}
, tempResults);
allResults = tempResults;
I'm going to stop here-- too much work to simplify down to a simple test case. But the help is miuch appreciated.
It's almost certainly the case the hashcodes don't match for the old and new data that are "equals()". I've run into this kind of thing before and you essentially end up spewing hashcodes for every object and the string representation and trying to figure out why the mismatch is happening.
If you're comparing items pre/post database, sometimes it loses the nanoseconds (depending on your DB column type) which can cause hashcodes to change.
Have you tried something like
boolean removed = allResults.remove(oldData)
if (!removed) // COMPLAIN BITTERLY!
In other words, remove the object from the Set and break the loop. That won't cause the Iterator to complain. I don't think this is a long term solution but would probably give you some information about the hashCode, equals and equalsData methods
The Java HashSet has an issue in "remove()" method. Check the link below. I switched to TreeSet and it works fine. But I need the O(1) time complexity.
https://bugs.openjdk.java.net/browse/JDK-8154740
If there are two entries with the same data, only one of them is replaced... have you accounted for that? And just in case, have you tried another collection data structure that doesn't use a hashcode, say a List?
I'm not up to speed on my Java, but I know that you can't remove an item from a collection when you are iterating over that collection in .NET, although .NET will throw an exception if it catches this. Could this be the problem?