Java - Distinct List of Objects

Java - Distinct List of Objects - java

I have a list/collection of objects that may or may not have the same property values. What's the easiest way to get a distinct list of the objects with equal properties? Is one collection type best suited for this purpose? For example, in C# I could do something like the following with LINQ.
var recipients = (from recipient in recipientList
select recipient).Distinct();
My initial thought was to use lambdaj (link text), but it doesn't appear to support this.

return new ArrayList(new HashSet(recipients));

Use an implementation of the interface Set<T> (class T may need a custom .equals() method, and you may have to implement that .equals() yourself). Typically a HashSet does it out of the box : it uses Object.hashCode() and Object.equals() method to compare objects. That should be unique enough for simple objects. If not, you'll have to implement T.equals() and T.hashCode() accordingly.
See Gaurav Saini's comment below for libraries helping to implement equals and hashcode.

Place them in a TreeSet which holds a custom Comparator, which checks the properties you need:
SortedSet<MyObject> set = new TreeSet<MyObject>(new Comparator<MyObject>(){
public int compare(MyObject o1, MyObject o2) {
// return 0 if objects are equal in terms of your properties
}
});
set.addAll(myList); // eliminate duplicates

Java 8:
recipients = recipients.stream()
.distinct()
.collect(Collectors.toList());
See java.util.stream.Stream#distinct.

order preserving version of the above response
return new ArrayList(new LinkedHashSet(recipients));

If you're using Eclipse Collections, you can use the method distinct().
ListIterable<Integer> integers = Lists.mutable.with(1, 3, 1, 2, 2, 1);
Assert.assertEquals(
Lists.mutable.with(1, 3, 2),
integers.distinct());
The advantage of using distinct() instead of converting to a Set and then back to a List is that distinct() preserves the order of the original List, retaining the first occurrence of each element. It's implemented by using both a Set and a List.
MutableSet<T> seenSoFar = Sets.mutable.with();
int size = list.size();
for (int i = 0; i < size; i++)
{
T item = list.get(i);
if (seenSoFar.add(item))
{
targetCollection.add(item);
}
}
return targetCollection;
If you cannot convert your original List into an Eclipse Collections type, you can use ListAdapter to get the same API.
MutableList<Integer> distinct = ListAdapter.adapt(integers).distinct();
Note: I am a committer for Eclipse Collections.

You can use a Set. There's couple of implementations:
HashSet uses an object's hashCode and equals.
TreeSet uses compareTo (defined by Comparable) or compare (defined by Comparator). Keep in mind that the comparison must be consistent with equals. See TreeSet JavaDocs for more info.
Also keep in mind that if you override equals you must override hashCode such that two equals objects has the same hash code.

The ordinary way of doing this would be to convert to a Set, then back to a List. But you can get fancy with Functional Java. If you liked Lamdaj, you'll love FJ.
recipients = recipients
.sort(recipientOrd)
.group(recipientOrd.equal())
.map(List.<Recipient>head_());
You'll need to have defined an ordering for recipients, recipientOrd. Something like:
Ord<Recipient> recipientOrd = ord(new F2<Recipient, Recipient, Ordering>() {
public Ordering f(Recipient r1, Recipient r2) {
return stringOrd.compare(r1.getEmailAddress(), r2.getEmailAddress());
}
});
Works even if you don't have control of equals() and hashCode() on the Recipient class.

Actually lambdaj implements this feature through the selectDistinctArgument method
http://lambdaj.googlecode.com/svn/trunk/html/apidocs/ch/lambdaj/Lambda.html#selectDistinctArgument(java.lang.Object,%20A)

Related

implant a comparator to string array in java

Suppose I have a method - private static void sort(String[] arr)
Now I want to sort this array using the comparator - String.CASE_INSENSITIVE_ORDER
Is it possible to set this as the default comparator to the String arr before passing
the argument to this function sort. It's like I wont have to change anything inside the sort function to modify its behavior. It's like the comparator gets implanted into
the arr

Arrays are meant to be basic, simple containers.
If you want more features, use a class from the Java Collections Framework.
If your elements are distinct (no duplicates need be tracked), use a NavigableSet implementation. Java bundles two. One is TreeSet.
NavigableSet< String > navSet = new TreeSet<>() ;
By default, the navigable set uses the “natural order” of your objects, by calling their compareTo method. Alternatively, you can pass a Comparator to the constructor, to be used for sorting.
In your case the String class carries a Comparator implementation as a constant: String.CASE_INSENSITIVE_ORDER .
NavigableSet< String > navSet = new TreeSet<>( String.CASE_INSENSITIVE_ORDER ) ;
Caveat: That comparator does not account for locale in sorting your strings. For locale-savvy sorting, use a Collator instead.
You said:
It's like I wont have to change anything inside the sort function to modify its behavior. It's like the comparator gets implanted into the arr
That is exactly what you get by specifying a Comparator for a collection class.

Check all values of object in list are unique [duplicate]

I want to remove duplicates from a list but what I am doing is not working:
List<Customer> listCustomer = new ArrayList<Customer>();
for (Customer customer: tmpListCustomer)
{
if (!listCustomer.contains(customer))
{
listCustomer.add(customer);
}
}

Assuming you want to keep the current order and don't want a Set, perhaps the easiest is:
List<Customer> depdupeCustomers =
new ArrayList<>(new LinkedHashSet<>(customers));
If you want to change the original list:
Set<Customer> depdupeCustomers = new LinkedHashSet<>(customers);
customers.clear();
customers.addAll(dedupeCustomers);

If the code in your question doesn't work, you probably have not implemented equals(Object) on the Customer class appropriately.
Presumably there is some key (let us call it customerId) that uniquely identifies a customer; e.g.
class Customer {
private String customerId;
...
An appropriate definition of equals(Object) would look like this:
public boolean equals(Object obj) {
if (obj == this) {
return true;
}
if (!(obj instanceof Customer)) {
return false;
}
Customer other = (Customer) obj;
return this.customerId.equals(other.customerId);
}
For completeness, you should also implement hashCode so that two Customer objects that are equal will return the same hash value. A matching hashCode for the above definition of equals would be:
public int hashCode() {
return customerId.hashCode();
}
It is also worth noting that this is not an efficient way to remove duplicates if the list is large. (For a list with N customers, you will need to perform N*(N-1)/2 comparisons in the worst case; i.e. when there are no duplicates.) For a more efficient solution you could use a HashSet to do the duplicate checking. Another option would be to use a LinkedHashSet as explained in Tom Hawtin's answer.

java 8 update
you can use stream of array as below:
Arrays.stream(yourArray).distinct()
.collect(Collectors.toList());

Does Customer implement the equals() contract?
If it doesn't implement equals() and hashCode(), then listCustomer.contains(customer) will check to see if the exact same instance already exists in the list (By instance I mean the exact same object--memory address, etc). If what you are looking for is to test whether or not the same Customer( perhaps it's the same customer if they have the same customer name, or customer number) is in the list already, then you would need to override equals() to ensure that it checks whether or not the relevant fields(e.g. customer names) match.
Note: Don't forget to override hashCode() if you are going to override equals()! Otherwise, you might get trouble with your HashMaps and other data structures. For a good coverage of why this is and what pitfalls to avoid, consider having a look at Josh Bloch's Effective Java chapters on equals() and hashCode() (The link only contains iformation about why you must implement hashCode() when you implement equals(), but there is good coverage about how to override equals() too).
By the way, is there an ordering restriction on your set? If there isn't, a slightly easier way to solve this problem is use a Set<Customer> like so:
Set<Customer> noDups = new HashSet<Customer>();
noDups.addAll(tmpListCustomer);
return new ArrayList<Customer>(noDups);
Which will nicely remove duplicates for you, since Sets don't allow duplicates. However, this will lose any ordering that was applied to tmpListCustomer, since HashSet has no explicit ordering (You can get around that by using a TreeSet, but that's not exactly related to your question). This can simplify your code a little bit.

List → Set → List (distinct)
Just add all your elements to a Set: it does not allow it's elements to be repeated. If you need a list afterwards, use new ArrayList(theSet) constructor afterwards (where theSet is your resulting set).

I suspect you might not have Customer.equals() implemented properly (or at all).
List.contains() uses equals() to verify whether any of its elements is identical to the object passed as parameter. However, the default implementation of equals tests for physical identity, not value identity. So if you have not overwritten it in Customer, it will return false for two distinct Customer objects having identical state.
Here are the nitty-gritty details of how to implement equals (and hashCode, which is its pair - you must practically always implement both if you need to implement either of them). Since you haven't shown us the Customer class, it is difficult to give more concrete advice.
As others have noted, you are better off using a Set rather than doing the job by hand, but even for that, you still need to implement those methods.

private void removeTheDuplicates(List<Customer>myList) {
for(ListIterator<Customer>iterator = myList.listIterator(); iterator.hasNext();) {
Customer customer = iterator.next();
if(Collections.frequency(myList, customer) > 1) {
iterator.remove();
}
}
System.out.println(myList.toString());
}

The "contains" method searched for whether the list contains an entry that returns true from Customer.equals(Object o). If you have not overridden equals(Object) in Customer or one of its parents then it will only search for an existing occurrence of the same object. It may be this was what you wanted, in which case your code should work. But if you were looking for not having two objects both representing the same customer, then you need to override equals(Object) to return true when that is the case.
It is also true that using one of the implementations of Set instead of List would give you duplicate removal automatically, and faster (for anything other than very small Lists). You will still need to provide code for equals.
You should also override hashCode() when you override equals().

Nearly all of the above answers are right but what I suggest is to use a Map or Set while creating the related list, not after to gain performance. Because converting a list to a Set or Map and then reconverting it to a List again is a trivial work.
Sample Code:
Set<String> stringsSet = new LinkedHashSet<String>();//A Linked hash set
//prevents the adding order of the elements
for (String string: stringsList) {
stringsSet.add(string);
}
return new ArrayList<String>(stringsSet);

Two suggestions:
Use a HashSet instead of an ArrayList. This will speed up the contains() checks considerably if you have a long list
Make sure Customer.equals() and Customer.hashCode() are implemented properly, i.e. they should be based on the combined values of the underlying fields in the customer object.

As others have mentioned, you are probably not implementing equals() correctly.
However, you should also note that this code is considered quite inefficient, since the runtime could be the number of elements squared.
You might want to consider using a Set structure instead of a List instead, or building a Set first and then turning it into a list.

The cleanest way is:
List<XXX> lstConsultada = dao.findByPropertyList(YYY);
List<XXX> lstFinal = new ArrayList<XXX>(new LinkedHashSet<GrupoOrigen>(XXX));
and override hascode and equals over the Id's properties of each entity

IMHO best way how to do it these days:
Suppose you have a Collection "dups" and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick.
Collection<collectionType> noDups = new HashSet<collectionType>(dups);
It works by creating a Set which, by definition, cannot contain duplicates.
Based on oracle doc.

The correct answer for Java is use a Set. If you already have a List<Customer> and want to de duplicate it
Set<Customer> s = new HashSet<Customer>(listCustomer);
Otherise just use a Set implemenation HashSet, TreeSet directly and skip the List construction phase.
You will need to override hashCode() and equals() on your domain classes that are put in the Set as well to make sure that the behavior you want actually what you get. equals() can be as simple as comparing unique ids of the objects to as complex as comparing every field. hashCode() can be as simple as returning the hashCode() of the unique id' String representation or the hashCode().

Using java 8 stream api.
List<String> list = new ArrayList<>();
list.add("one");
list.add("one");
list.add("two");
System.out.println(list);
Collection<String> c = list.stream().collect(Collectors.toSet());
System.out.println(c);
Output:
Before values : [one, one, two]
After Values : [one, two]

Is there an alternative to NavigableSet when a "Set" is not appropriate?

The NavigableSet interface offers a number of useful methods that a normal Set does not (specifically I'm thinking about methods like headSet and tailSet for instance). However, being a Set, it does not support duplicate elements. Also, being a SortedSet, the ordering must be consistent with equals and hashCode to avoid violating the contract of the Set interface.
Is there any good alternative data structure for when there might be duplicate elements or multiple elements that are "equal" according to the natural ordering or Comparator but not "equal" according to the equals method? As a motivating example, consider the following code that shows why a NavigableSet is not appropriate:
public class Foo implements Comparable<Foo>{
double x;
double y;
#Override
public int compareTo(Foo o) {
return Double.compare(x, o.x); // only x matters for sort order
}
public static void main(String...args){
Foo a = new Foo();
a.x = 1;
a.y = 2;
Foo b = new Foo();
b.x = 1;
b.y = 42;
Foo c = new Foo();
c.x = 2;
c.y = 12.34;
NavigableSet<Foo> set = new TreeSet<Foo>();
set.add(a);
set.add(a);
set.add(b);
set.add(c);
System.out.println(set.size());
}
}
Notice that element a only gets added once (of course, since this is a Set). Also, notice that b does not get added, since there is already an element for which the comparison returns 0.
I felt like this was probably a fairly common thing, so I hoped to find an existing implementation rather than rolling my own. Is there a good, widely-used data structure for my purposes?
I'll add that while writing this question I did come across the Biscotti Project, but a) I'm not convinced it solves the comparison/equals issue and b) the FAQ explicitly says it's not really safe to use.

Let me reformulate your question to make sure I understand it well.
The need for headSet and tailSet implies the collection has to be sorted. Which is kind of in conflict with the need for allowing the duplicate members according to compareTo.
The conflict comes from the effective usage of such collection. Adding a member to sorted collection is done utilizing the compareTo method in O(log n) - kind of binary search and then add. The TreeSet is implemented using TreeMap which can't two same members according to compareTo.
What you are looking for won't be effective.
You may try to use a simple ArrayList and sort it by Collections.sort and then use sublist method. The problem with this is that it don't deal with duplicates at all.
You may also use the LinkedHashSet which deals with duplicates (according to equals() and it is immune to compareTo()), but it is not sorted. Of course, you may convert the LinkedHashSet instance to the SortedSet by passing its instance in constructor.

Sort a ArrayList<String> by number value

I have an ArrayList of video resolutions that looks like this:
"1024x768", "800x600", "1280x1024", etc
I want to sort it based on numeric value in first part of string.
Ie, the above would sort out to look like this:
"800x600","1024x768","1280x1024"
Is there a quick and dirty way to do this, by that I mean in less then 2-3 lines of code? If not, what would be the proper way? The values I get are from an object not my own. It does have a getWidth() and getHeight() methods that return ints.

If the objects in the array are Resolution instances with getWidth methods then you can use a Comparator to sort on those:
Collections.sort(resolutions, new Comparator {
public int compare(Resolution r1, Resolution r2) {
return Integer.valueOf(r1.getWidth()).compareTo(Integer.valueOf(r2.getWidth()));
}
});

The proper way is to write a Comparator implementation that operates on Strings, except that it parses up to the first non-numeric character. It then creates an int out of that and compares the ints.
You can then pass an instance of that Comparator into Collections.sort() along with your List.

Use a custom Comparator to sort the ArrayList via the Collections api.
Collections.sort(resolutionArrayList, new ResolutionComparator())

Using a Comparator will work, but will get slow if you have lots of values because it will parse each String more than once.
An alternative is to add each value to a TreeMap, with the number you want as the key, i.e., Integer.valueOf(s.substring(0,s.indexOf('x'))), then create a new ArrayList from the sorted values in treeMap.values().

The solution suggested by Shadwell in his answer is correct and idiomatic.
But if you're looking for a more concise solution, then I'd advise you use lambdaj which will enable you to write code like:
List<Resolution> sortedResolutions = sort(resolutions, on(Resolution.class).getWidth());

Java HashMap with Int Array

I am using this code to check that array is present in the HashMap:
public class Test {
public static void main(String[] arg) {
HashMap<int[], String> map = new HashMap<int[], String>();
map.put(new int[]{1, 2}, "sun");
System.out.println(map.containsKey((new int[]{1, 2})));
}
}
But this prints False. How can I check that array is present in the HashMap?

The problem is because the two int[] aren't equal.
System.out.println(
(new int[] { 1, 2 }).equals(new int[] { 1, 2 })
); // prints "false"
Map and other Java Collections Framework classes defines its interface in terms of equals. From Map API:
Many methods in Collections Framework interfaces are defined in terms of the equals method. For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))."
Note that they don't have to be the same object; they just have to be equals. Arrays in Java extends from Object, whose default implementation of equals returns true only on object identity; hence why it prints false in above snippet.
You can solve your problem in one of many ways:
Define your own wrapper class for arrays whose equals uses java.util.Arrays equals/deepEquals method.
And don't forget that when you #Override equals(Object), you must also #Override hashCode
Use something like List<Integer> that does define equals in terms of the values they contain
Or, if you can work with reference equality for equals, you can just stick with what you have. Just as you shouldn't expect the above snippet to ever print true, you shouldn't ever expect to be able to find your arrays by its values alone; you must hang-on to and use the original references every time.
See also:
Overriding equals and hashCode in Java
How to ensure hashCode() is consistent with equals()?
Understanding the workings of equals and hashCode in a HashMap
API
Object.equals and Object.hashCode
It's essential for a Java programmer to be aware of these contracts and how to make them work with/for the rest of the system

You are comparing two different references.
Something like this will work:
public class Test {
public static void main(String[] arg)
{
HashMap<int[],String> map= new HashMap<int[],String>();
int[] a = new int[]{1,2};
map.put(a, "sun");
System.out.println(map.containsKey(a));
}
}
Since a is the same reference, you will receive true as expected. If your application has no option of passing references to do the comparison, I would make a new object type which contains the int[] and override the equals() method (don't forget to override hashCode() at the same time), so that will reflect in the containsKey() call.

I would use a different approach. As mentioned before, the problem is with arrays equality, which is based on reference equality and makes your map useless for your needs.
Another potential problem, assuming that you use ArrayList instead, is the problem of consistency: if you change a list after is has been added to the map, you will have a hashmap corruption since the position of the list will not reflect its hashcode anymore.
In order to solve these two problems, I would use some kind of immutable list. You may want to make an immutable wrapper on int array for example, and implement equals() and hashCode() yourself.

I think the problem is your array is doing an '==' comparison, i.e. it's checking the reference. When you do containsKey(new int[] { ... }), it's creating a new object and thus the reference is not the same.
If you change the array type to something like ArrayList<Integer> that should work, however I would tend to avoid using Lists as map keys as this is not going to be very efficient.

The hashCode() implementation for arrays is derived from Object.hashCode(), so it depends on the memory location of the array. Since the two arrays are instantiated separately, they have different memory locations and thus different hashcodes. If you made one array it would work:
int[] arr = {1, 2};
map.put(arr, "sun");
System.out.println(map.containsKey(arr));

You've got two different objects that happen to contain the same values, because you've called new twice.
One approach you might use is to create a "holder" class of your own, and define that class's equals and hash methods.

Are you sure you don't want to map Strings to arrays instead of the other way around?
Anyway, to answer your question, the problem is you are creating a new array when you call containsKey(). This returns false between you you have two separately newed arrays that happen to have the same elements and dimension. See Yuval's answer to see the correct way of checking if an array is contained as a key.
An alternative, more advanced, approach is to create your own class that wraps an array and overwrites hashCode() so that two arrays with the same dimension and elements will have equal hash codes.

The two instances of int[] are different and not equal.
A nice approach would be to convert the int array to String using Arrays.toString(arr):
HashMap<String, String> h = new HashMap<>();
int[] a = new int[]{1, 2};
h.put(Arrays.toString(a), "sun");
h.get(Arrays.toString(new int[]{1, 2})); // returns sun

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Distinct List of Objects - java

return new ArrayList(new HashSet(recipients));

Java 8: recipients = recipients.stream() .distinct() .collect(Collectors.toList()); See java.util.stream.Stream#distinct.

order preserving version of the above response return new ArrayList(new LinkedHashSet(recipients));

Actually lambdaj implements this feature through the selectDistinctArgument method http://lambdaj.googlecode.com/svn/trunk/html/apidocs/ch/lambdaj/Lambda.html#selectDistinctArgument(java.lang.Object,%20A)

Related

implant a comparator to string array in java

Check all values of object in list are unique [duplicate]

Is there an alternative to NavigableSet when a "Set" is not appropriate?

Sort a ArrayList<String> by number value

Java HashMap with Int Array

Categories

Resources