I have two collections of the same object, Collection<Foo> oldSet and Collection<Foo> newSet. The required logic is as follow:
if foo is in(*) oldSet but not newSet, call doRemove(foo)
else if foo is not in oldSet but in newSet, call doAdd(foo)
else if foo is in both collections but modified, call doUpdate(oldFoo, newFoo)
else if !foo.activated && foo.startDate >= now, call doStart(foo)
else if foo.activated && foo.endDate <= now, call doEnd(foo)
(*) "in" means the unique identifier matches, not necessarily the content.
The current (legacy) code does many comparisons to figure out removeSet, addSet, updateSet, startSet and endSet, and then loop to act on each item.
The code is quite messy (partly because I have left out some spaghetti logic already) and I am trying to refactor it. Some more background info:
As far as I know, the oldSet and newSet are actually backed by ArrayList
Each set contains less than 100 items, most likely max out at 20
This code is called frequently (measured in millions/day), although the sets seldom differ
My questions:
If I convert oldSet and newSet into HashMap<Foo> (order is not of concern here), with the IDs as keys, would it made the code easier to read and easier to compare? How much of time & memory performance is loss on the conversion?
Would iterating the two sets and perform the appropriate operation be more efficient and concise?
Apache's commons.collections library has a CollectionUtils class that provides easy-to-use methods for Collection manipulation/checking, such as intersection, difference, and union.
The org.apache.commons.collections.CollectionUtils API docs are here.
You can use Java 8 streams, for example
set1.stream().filter(s -> set2.contains(s)).collect(Collectors.toSet());
or Sets class from Guava:
Set<String> intersection = Sets.intersection(set1, set2);
Set<String> difference = Sets.difference(set1, set2);
Set<String> symmetricDifference = Sets.symmetricDifference(set1, set2);
Set<String> union = Sets.union(set1, set2);
I have created an approximation of what I think you are looking for just using the Collections Framework in Java. Frankly, I think it is probably overkill as #Mike Deck points out. For such a small set of items to compare and process I think arrays would be a better choice from a procedural standpoint but here is my pseudo-coded (because I'm lazy) solution. I have an assumption that the Foo class is comparable based on it's unique id and not all of the data in it's contents:
Collection<Foo> oldSet = ...;
Collection<Foo> newSet = ...;
private Collection difference(Collection a, Collection b) {
Collection result = a.clone();
result.removeAll(b)
return result;
}
private Collection intersection(Collection a, Collection b) {
Collection result = a.clone();
result.retainAll(b)
return result;
}
public doWork() {
// if foo is in(*) oldSet but not newSet, call doRemove(foo)
Collection removed = difference(oldSet, newSet);
if (!removed.isEmpty()) {
loop removed {
Foo foo = removedIter.next();
doRemove(foo);
}
}
//else if foo is not in oldSet but in newSet, call doAdd(foo)
Collection added = difference(newSet, oldSet);
if (!added.isEmpty()) {
loop added {
Foo foo = addedIter.next();
doAdd(foo);
}
}
// else if foo is in both collections but modified, call doUpdate(oldFoo, newFoo)
Collection matched = intersection(oldSet, newSet);
Comparator comp = new Comparator() {
int compare(Object o1, Object o2) {
Foo f1, f2;
if (o1 instanceof Foo) f1 = (Foo)o1;
if (o2 instanceof Foo) f2 = (Foo)o2;
return f1.activated == f2.activated ? f1.startdate.compareTo(f2.startdate) == 0 ? ... : f1.startdate.compareTo(f2.startdate) : f1.activated ? 1 : 0;
}
boolean equals(Object o) {
// equal to this Comparator..not used
}
}
loop matched {
Foo foo = matchedIter.next();
Foo oldFoo = oldSet.get(foo);
Foo newFoo = newSet.get(foo);
if (comp.compareTo(oldFoo, newFoo ) != 0) {
doUpdate(oldFoo, newFoo);
} else {
//else if !foo.activated && foo.startDate >= now, call doStart(foo)
if (!foo.activated && foo.startDate >= now) doStart(foo);
// else if foo.activated && foo.endDate <= now, call doEnd(foo)
if (foo.activated && foo.endDate <= now) doEnd(foo);
}
}
}
As far as your questions:
If I convert oldSet and newSet into HashMap (order is not of concern here), with the IDs as keys, would it made the code easier to read and easier to compare? How much of time & memory performance is loss on the conversion?
I think that you would probably make the code more readable by using a Map BUT...you would probably use more memory and time during the conversion.
Would iterating the two sets and perform the appropriate operation be more efficient and concise?
Yes, this would be the best of both worlds especially if you followed #Mike Sharek 's advice of Rolling your own List with the specialized methods or following something like the Visitor Design pattern to run through your collection and process each item.
I think the easiest way to do that is by using apache collections api - CollectionUtils.subtract(list1,list2) as long the lists are of the same type.
I'd move to lists and solve it this way:
Sort both lists by id ascending using custom Comparator if objects in lists aren't Comparable
Iterate over elements in both lists like in merge phase in merge sort algorithm, but instead of merging lists, you check your logic.
The code would be more or less like this:
/* Main method */
private void execute(Collection<Foo> oldSet, Collection<Foo> newSet) {
List<Foo> oldList = asSortedList(oldSet);
List<Foo> newList = asSortedList(newSet);
int oldIndex = 0;
int newIndex = 0;
// Iterate over both collections but not always in the same pace
while( oldIndex < oldList.size()
&& newIndex < newIndex.size()) {
Foo oldObject = oldList.get(oldIndex);
Foo newObject = newList.get(newIndex);
// Your logic here
if(oldObject.getId() < newObject.getId()) {
doRemove(oldObject);
oldIndex++;
} else if( oldObject.getId() > newObject.getId() ) {
doAdd(newObject);
newIndex++;
} else if( oldObject.getId() == newObject.getId()
&& isModified(oldObject, newObject) ) {
doUpdate(oldObject, newObject);
oldIndex++;
newIndex++;
} else {
...
}
}// while
// Check if there are any objects left in *oldList* or *newList*
for(; oldIndex < oldList.size(); oldIndex++ ) {
doRemove( oldList.get(oldIndex) );
}// for( oldIndex )
for(; newIndex < newList.size(); newIndex++ ) {
doAdd( newList.get(newIndex) );
}// for( newIndex )
}// execute( oldSet, newSet )
/** Create sorted list from collection
If you actually perform any actions on input collections than you should
always return new instance of list to keep algorithm simple.
*/
private List<Foo> asSortedList(Collection<Foo> data) {
List<Foo> resultList;
if(data instanceof List) {
resultList = (List<Foo>)data;
} else {
resultList = new ArrayList<Foo>(data);
}
Collections.sort(resultList)
return resultList;
}
public static boolean doCollectionsContainSameElements(
Collection<Integer> c1, Collection<Integer> c2){
if (c1 == null || c2 == null) {
return false;
}
else if (c1.size() != c2.size()) {
return false;
} else {
return c1.containsAll(c2) && c2.containsAll(c1);
}
}
For a set that small is generally not worth it to convert from an Array to a HashMap/set. In fact, you're probably best off keeping them in an array and then sorting them by key and iterating over both lists simultaneously to do the comparison.
For comaparing a list or set we can use Arrays.equals(object[], object[]). It will check for the values only. To get the Object[] we can use Collection.toArray() method.
Related
I have a complicated requirement where a list records has comments in it. We have a functionality of reporting where each and every change should be logged and reported. Hence as per our design, we create a whole new record even if a single field has been updated.
Now we wanted to get history of comments(reversed sorted by timestamp) stored in our db. After running query I got the list of comments but it contains duplicate entries because some other field was changed. It also contains null entries.
I wrote the following code to remove duplicate and null entries.
List<Comment> toRet = new ArrayList<>();
dbCommentHistory.forEach(ele -> {
//Directly copy if toRet is empty.
if (!toRet.isEmpty()) {
int lastIndex = toRet.size() - 1;
Comment lastAppended = toRet.get(lastIndex);
// If comment is null don't proceed
if (ele.getComment() == null) {
return;
}
// remove if we have same comment as last time
if (StringUtils.compare(ele.getComment(), lastAppended.getComment()) == 0) {
toRet.remove(lastIndex);
}
}
//add element to new list
toRet.add(ele);
});
This logic works fine and have been tested now, But I want to convert this code to use lambda, streams and other java 8's feature.
You can use the following snippet:
Collection<Comment> result = dbCommentHistory.stream()
.filter(c -> c.getComment() != null)
.collect(Collectors.toMap(Comment::getComment, Function.identity(), (first, second) -> second, LinkedHashMap::new))
.values();
If you need a List instead of a Collection you can use new ArrayList<>(result).
If you have implemented the equals() method in your Comment class like the following
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
return Objects.equals(comment, ((Comment) o).comment);
}
you can just use this snippet:
List<Comment> result = dbCommentHistory.stream()
.filter(c -> c.getComment() != null)
.distinct()
.collect(Collectors.toList());
But this would keep the first comment, not the last.
If I'm understanding the logic in the question code you want to remove consecutive repeated comments but keep duplicates if there is some different comment in between in the input list.
In this case a simply using .distinct() (and once equals and hashCode) has been properly defined, won't work as intended as non-consecutive duplicates will be eliminated as well.
The more "streamy" solution here is to use a custom Collector that when folding elements into the accumulator removes the consecutive duplicates only.
static final Collector<Comment, List<Comment>, List<Comment>> COMMENT_COLLECTOR = Collector.of(
ArrayDeque::new, //// supplier.
(list, comment) -> { /// folder
if (list.isEmpty() || !Objects.equals(list.getLast().getComment(), comment.getComment()) {
list.addLast(comment);
}
}),
(list1, list2) -> { /// the combiner. we discard list2 first element if identical to last on list1.
if (list1.isEmpty()) {
return list2;
} else {
if (!list2.isEmpty()) {
if (!Objects.equals(list1.getLast().getComment(),
list2.getFirst().getComment()) {
list1.addAll(list2);
} else {
list1.addAll(list2.subList(1, list2.size());
}
}
return list1;
}
});
Notice that Deque (in java.util.*) is an extended type of List that have convenient operations to access the first and last element of the list. ArrayDeque is the nacked array based implementation (equivalent to ArrayList to List).
By default the collector will always receive the elements in the input stream order so this must work. I know it is not much less code but it is as good as it gets. If you define a Comment comparator static method that can handle null elements or comment with grace you can make it a bit more compact:
static boolean sameComment(final Comment a, final Comment b) {
if (a == b) {
return true;
} else if (a == null || b == null) {
return false;
} else {
Objects.equals(a.getComment(), b.getComment());
}
}
static final Collector<Comment, List<Comment>, List<Comment>> COMMENT_COLLECTOR = Collector.of(
ArrayDeque::new, //// supplier.
(list, comment) -> { /// folder
if (!sameComment(list.peekLast(), comment) {
list.addLast(comment);
}
}),
(list1, list2) -> { /// the combiner. we discard list2 first element if identical to last on list1.
if (list1.isEmpty()) {
return list2;
} else {
if (!sameComment(list1.peekLast(), list2.peekFirst()) {
list1.addAll(list2);
} else {
list1.addAll(list2.subList(1, list2.size());
}
return list1;
}
});
----------
Perhaps you would prefer to declare a proper (named) class that implements the Collector to make it more clear and avoid the definition of lambdas for each Collector action. or at least implement the lambdas passed to Collector.of by static methods to improve readability.
Now the code to do the actual work is rather trivial:
List<Comment> unique = dbCommentHistory.stream()
.collect(COMMENT_COLLECTOR);
That is it. However if it may become a bit more involved if you want to handle null comments (element) instances. The code above already handles the comment's string being null by considering it equals to another null string:
List<Comment> unique = dbCommentHistory.stream()
.filter(Objects::nonNull)
.collect(COMMENT_COLLECTOR);
Your code can be simplified a bit. Notice that this solution does not use stream/lambdas but it seems to be the most succinct option:
List<Comment> toRet = new ArrayList<>(dbCommentHistory.size());
Comment last = null;
for (final Comment ele : dbCommentHistory) {
if (ele != null && (last == null || !Objects.equals(last.getComment(), ele.getComment()))) {
toRet.add(last = ele);
}
}
The outcome is not exactly the same as the question code as in the latter null elements might be added to the toRet but it seems to me that you actually may want to remove the completely instead. Is easy to modify the code (make it a bit longer) to get the same output though.
If you insist in using a .forEach that would not be that difficult, in that case last whould need to be calculated at the beggining of the lambda. In this case you may want to use a ArrayDeque so that you can coveniently use peekLast:
Deque<Comment> toRet = new ArrayDeque<>(dbCommentHistory.size());
dbCommentHistory.forEach( ele -> {
if (ele != null) {
final Comment last = toRet.peekLast();
if (last == null || !Objects.equals(last.getComment(), ele.getComment())) {
toRet.addLast(ele);
}
}
});
The goal here is to insert elements into a TreeSet where the ordering is done by a custom objects internal List.
public class CustomObject implements Comparable<CustomObject> {
List<Integer> innerObjectList = new ArrayList<>();
#Override
public boolean compareTo(CustomObject o) {
#compare natural ordering of elements in array.
}
}
This is what I'm currently doing, it works but I was hoping to find a native way to do it.
#Override
public int compareTo(#NotNull AisleMap o) {
if (o.aisles.size() > aisles.size())
return 1;
if (o.aisles.size() < aisles.size())
return -1;
for (int i = 0; i<aisles.size(); i++) {
int compare = o.aisles.get(i).compareTo(aisles.get(i));
if( compare != 0) {
return compare;
}
}
return 0;
}
I'm trying to figure out how to do something like o.innerObjectList.compare(toInnerObjectList); However I'm not entirely sure how to do this so that the TreeSet orders the objects by the array properly. Is there an easy way to compare the "natural" order of elements in an array to each other? Only way I thought of was to check for size then if equal check contents until one index of one array is larger than the same index of another.
ex: the ordering would be like so {1,2,5} -> {1,2,4} -> {1,2,3}
Suppose I have the following List, which contains items of type ENTITY. ENTITY has an integer field which determines its natural ordering. I want to get the ENTITY which is the maximum or minimum based on that field's value. How can I implement this in Java?
List<ENTITY> lt = new ArrayList<ENTITY>();
class ENTITY
{
int field;
/* Constructor, getters, setters... */
}
Use Collections.sort with a Comparator to sort your list. Depending on whether you sort ascending or descending, the positions of the max and min elements will differ. In either case, they will be at opposite ends, one at the top of the list and one at the bottom. To sort in ascending order (smallest element at the beginning of the list, largest at the end) you can use something like this:
Collections.sort(lt, new Comparator<ENTITY> {
public int compare(ENTITY o1, ENTITY o2) {
if (o1 == null) {
if (o2 == null) {
return 0;
}
return -1;
}
else if (o2 == null) {
return 1;
}
// If field is Comparable:
return o1.getField().compareTo(o2.getField());
// OR - If field is an int
return o1.getField() < o2.getField() ? -1 : (o1.getField() > o2.getField() ? 1 : 0);
}
});
//stream the elements, map to their fields, and get the max
return lt.stream().max((e1, e2) -> Integer.compare(e1.filed, e2.filed)).orElse(/* default */);
Just one of the many applications of Java 8's stream api.
Though I would suggest working on some coding conventions first.
Is there a tool or library to find duplicate entries in a Collection according to specific criteria that can be implemented?
To make myself clear: I want to compare the entries to each other according to specific criteria. So I think a Predicate returning just true or false isn't enough.
I can't use equals.
It depends on the semantic of the criterion:
If your criterion is always the same for a given class, and is inherent to the underlying concept, you should just implement equals and hashCode and use a set.
If your criterion depend on the context, org.apache.commons.collections.CollectionUtils.select(java.util.Collection, org.apache.commons.collections.Predicate) might be the right solution for you.
If you want to find duplicates, rather than just removing them, one approach would be to throw the Collection into an array, sort the array via a Comparator that implements your criteria, then linearly walk through the array, looking for adjacent duplicates.
Here's a sketch (not tested):
MyComparator myComparator = new MyComparator();
MyType[] myArray = myList.toArray();
Arrays.sort( myArray, myComparator );
for ( int i = 1; i < myArray.length; ++i ) {
if ( 0 == myComparator.compare( myArray[i - 1], myArray[i] )) {
// Found a duplicate!
}
}
Edit: From your comment, you just want to know if there are duplicates. The approach above works for this too. But you could more simply just create a java.util.SortedSet with a custom Comparator. Here's a sketch:
MyComparator myComparator = new MyComparator();
TreeSet treeSet = new TreeSet( myComparator );
treeSet.addAll( myCollection );
boolean containsDuplicates = (treeSet.size() != myCollection.size());
You can adapt a Java set to search for duplicates among objects of an arbitrary type: wrap your target class in a private wrapper that evaluates equality based on your criteria, and construct a set of wrappers.
Here is a somewhat lengthy example that illustrates the technique. It considers two people with the same first name to be equal, and so it detects three duplicates in the array of five objects.
import java.util.*;
import java.lang.*;
class Main {
static class Person {
private String first;
private String last;
public String getFirst() {return first;}
public String getLast() {return last;}
public Person(String f, String l) {
first = f;
last = l;
}
public String toString() {
return first+" "+last;
}
}
public static void main (String[] args) throws java.lang.Exception {
List<Person> people = new ArrayList<Person>();
people.add(new Person("John", "Smith"));
people.add(new Person("John", "Scott"));
people.add(new Person("Jack", "First"));
people.add(new Person("John", "Walker"));
people.add(new Person("Jack", "Black"));
Set<Object> seen = new HashSet<Object>();
for (Person p : people) {
final Person thisPerson = p;
class Wrap {
public int hashCode() { return thisPerson.getFirst().hashCode(); }
public boolean equals(Object o) {
Wrap other = (Wrap)o;
return other.wrapped().getFirst().equals(thisPerson.getFirst());
}
public Person wrapped() { return thisPerson; }
};
Wrap wrap = new Wrap();
if (seen.add(wrap)) {
System.out.println(p + " is new");
} else {
System.out.println(p + " is a duplicate");
}
}
}
}
You can play with this example on ideone [link].
You could use a map and while iterating over the collection put the elements into the map (the predicates would form the key) and if there's already an entry you've found a duplicate.
For more information see here: Finding duplicates in a collection
I've created a new interface akin to the IEqualityComparer<T> interface in .NET.
Such a EqualityComparator<T> I then pass to the following method which detects duplicates.
public static <T> boolean hasDuplicates(Collection<T> collection,
EqualsComparator<T> equalsComparator) {
List<T> list = new ArrayList<>(collection);
for (int i = 0; i < list.size(); i++) {
T object1 = list.get(i);
for (int j = (i + 1); j < list.size(); j++) {
T object2 = list.get(j);
if (object1 == object2
|| equalsComparator.equals(object1, object2)) {
return true;
}
}
}
return false;
}
This way I can customise the comparison to my needs.
Treeset allows you to do this easily:
Set uniqueItems = new TreeSet<>(yourComparator);
List<?> duplicates = objects.stream().filter(o -> !uniqueItems.add(o)).collect(Collectors.toList());
yourComarator is used when calling uniqueItems.add(o), which adds the item to the set and returns true if the item is unique. If the comparator considers the item a duplicate, add(o) will return false.
Note that the item's equals method must be consistent with yourComarator as per the TreeSet documentation for this to work.
Iterate the ArrayList which contains duplicates and add them to the HashSet. When the add method returns false in the HashSet just log the duplicate to the console.
Assuming I have
final Iterable<String> unsorted = asList("FOO", "BAR", "PREFA", "ZOO", "PREFZ", "PREFOO");
What can I do to transform this unsorted list into this:
[PREFZ, PREFA, BAR, FOO, PREFOO, ZOO]
(a list which begin with known values that must appears first (here "PREFA" and "PREFZ") and the rest is alphabetically sorted)
I think there are some usefull classes in guava that can make the job (Ordering, Predicates...), but I have not yet found a solution...
I would keep separate lists.
One for known values and unknown values. And sort them separately, when you need them in a one list you can just concatenate them.
knownUnsorted.addAll(unsorted.size - 1, unknonwUnsorted);
I suggest filling List with your values and using Collections.sort(...).
Something like
Collections.sort(myList, new FunkyComparator());
using this:
class FunkyComparator implements Comparator {
private static Map<String,Integer> orderedExceptions =
new HashMap<String,Integer>(){{
put("PREFZ", Integer.valueOf(1));
put("PREFA", Integer.valueOf(2));
}};
public int compare(Object o1, Object o2) {
String s1 = (String) o1;
String s2 = (String) o2;
Integer i1 = orderedExceptions.get(s1);
Integer i2 = orderedExceptions.get(s2);
if (i1 != null && i2 != null) {
return i1 - i2;
}
if (i1 != null) {
return -1;
}
if (i2 != null) {
return +1;
}
return s1.compareTo(s2);
}
}
Note: This is not the most efficient solution. It is just a simple, straightforward solution that gets the job done.
I would first use Collections.sort(list) to sort the list.
Then, I would remove the known items, and add them to the front.
String special = "PREFA";
if (list.remove(special)
list.add(0, special);
Or, if you have a list of array of these values you need in the front you could do:
String[] knownValues = {};
for (String s: knownValues) {
if (list.remove(s))
list.add(0, s);
}
Since I'm a fan of the guava lib, I wanted to find a solution using it. I don't know if it's efficient, neither if you find it as simple as others solution, but it's here:
final Iterable<String> all = asList("FOO", "BAR", "PREFA", "ZOO", "PREFOO", "PREFZ");
final List<String> mustAppearFirst = asList("PREFZ", "PREFA");
final Iterable<String> sorted =
concat(
Ordering.explicit(mustAppearFirst).sortedCopy(filter(all, in(mustAppearFirst))),
Ordering.<String>natural().sortedCopy(filter(all, not(in(mustAppearFirst)))));
You specifically mentioned guava; along with Sylvain M's answer, here's another way (more as an academic exercise and demonstration of guava's flexibility than anything else)
// List is not efficient here; for large problems, something like SkipList
// is more suitable
private static final List<String> KNOWN_INDEXES = asList("PREFZ", "PREFA");
private static final Function<Object, Integer> POSITION_IN_KNOWN_INDEXES
= new Function<Object, Integer>() {
public Integer apply(Object in) {
int index = KNOWN_INDEXES.indexOf(in);
return index == -1 ? null : index;
}
};
...
List<String> values = asList("FOO", "BAR", "PREFA", "ZOO", "PREFZ", "PREFOO");
Collections.sort(values,
Ordering.natural().nullsLast().onResultOf(POSITION_IN_KNOWN_INDEXES).compound(Ordering.natural())
);
So, in other words, sort on natural order of the Integer returned by List.indexOf(), then break ties with natural order of the object itself.
Messy, perhaps, but fun.
I would also use Collections.sort(list) but I think I would use a Comparator and within the comparator you could define your own rules, e.g.
class MyComparator implements Comparator<String> {
public int compare(String o1, String o2) {
// Now you can define the behaviour for your sorting.
// For example your special cases should always come first,
// but if it is not a special case then just use the normal string comparison.
if (o1.equals(SPECIAL_CASE)) {
// Do something special
}
// etc.
return o1.compareTo(o2);
}
}
Then sort by doing:
Collections.sort(list, new MyComparator());