Comparing iterables for same content, but not regarding order - java

I'm attempting to compare two Iterables in Java of same size. I only need to know that the contents are the same. However, something like [1, 2] and [1, 2, 2] should not be equal, while [1, 2, 2, 4] should equal [1, 2, 4, 2].
boolean functionName() {
boolean pvk;
... setup ...
for(Edge e : pMST.edges()) {
pvk = false;
for(Edge f : kMST.edges()) {
if(e == f) {
pvk = true;
System.out.println("True.");
}
}
if(!pvk) return false;
}
return true;
}
There's my initial lousy attempt, but not only does this always return false, it doesn't account for duplicates properly.

You could sort the items and compare the resulting lists, but this is potentially slow O(n lg n) and it relies on the items either being Comparable or having a total order imposed on them by a Comparator. This might be infeasible.
This other answer suggests using a Guava Multiset. This makes sense, as it keeps track of the elements and the count of occurrences, which is significant for your question. It should be O(n) for reasonable implementations such as a HashMultiset. Other libraries such as Apache Commons (MultiSet) and Eclipse Collections (Bag) have collection implementations that are functionally equivalent to Guava’s Multiset.
If you don't want to include a dependency on any of these libraries, you can do this in the JDK by itself. Unfortunately Java doesn't have a Bag implementation, but for this purpose it's easy to emulate it using a Map from your item type to a count, either Integer or Long.
If you have Lists, you can do this:
boolean unorderedEquals(List<Item> list1, List<Item> list2) {
Map<Item, Long> freq1 = list1.stream().collect(groupingBy(i -> i, counting()));
Map<Item, Long> freq2 = list2.stream().collect(groupingBy(i -> i, counting()));
return freq1.equals(freq2);
}
If you have Iterables, you need to build up the maps using forEach instead:
boolean unorderedEquals(Iterable<Item> iter1, Iterable<Item> iter2) {
Map<Item, Integer> freq1 = new HashMap<>();
iter1.forEach(it -> freq1.merge(it, 1, (a, b) -> a + b));
Map<Item, Integer> freq2 = new HashMap<>();
iter2.forEach(it -> freq2.merge(it, 1, (a, b) -> a + b));
return freq1.equals(freq2);
}

Combining this answer with ideas from this thread, notably this answer to create an efficient but readable solution, you may use
static boolean unorderedEquals(Collection<?> coll1, Collection<?> coll2) {
if(coll1.size() != coll2.size()) return false;
Map<Object, Integer> freq = new HashMap<>();
for(Object o: coll1) freq.merge(o, 1, Integer::sum);
for(Object o: coll2)
if(freq.merge(o, -1, Integer::sum) < 0) return false;
return true;
}
The first loop creates a frequency map like in the linked answer, but instead of building a second map, to perform an expensive comparison, the second loop decreases the counts on each occurrence, returning immediately, if a count became negative. The merge method smoothly handles the case of absent keys.
Since it has been checked right at the beginning of the method that both lists have the same size, after increasing and decreasing, the total count must be zero. Since we have proven that there are no negative numbers, as we returned immediately for them, there can’t be positive non-zero values either. So we can return true after the second loop without further checks.
Supporting arbitrary Iterables, which differ from Collection in not necessarily having a size() method, is a bit trickier, as we can’t do the pre-check then and hence, have to maintain the count:
static boolean unorderedEquals(Iterable<?> iter1, Iterable<?> iter2) {
Map<Object, Integer> freq = new HashMap<>();
int size = 0;
for(Object o: iter1) {
freq.merge(o, 1, Integer::sum);
size++;
}
for(Object o: iter2)
if(--size < 0 || freq.merge(o, -1, Integer::sum) < 0) return false;
return size == 0;
}
If we want avoid the boxing overhead, we have to resort to a mutable value for the map, e.g.
static boolean unorderedEquals(Collection<?> coll1, Collection<?> coll2) {
if(coll1.size() != coll2.size()) return false;
Map<Object, int[]> freq = new HashMap<>();
for(Object o: coll1) freq.computeIfAbsent(o, x -> new int[1])[0]++;
int[] absent = { 0 };
for(Object o: coll2) if(freq.getOrDefault(o, absent)[0]-- == 0) return false;
return true;
}
But I don’t think that his will pay off. For small numbers of occurrences, boxing will reuse the Integer instances whereas we need a distinct int[] object for each distinct element when using mutable values.
But using compute might be interesting for the Iterable solution, when using it like
static boolean unorderedEquals(Iterable<?> coll1, Iterable<?> coll2) {
Map<Object, int[]> freq = new HashMap<>();
for(Object o: coll1) freq.computeIfAbsent(o, x -> new int[1])[0]++;
int[] absent = {};
for(Object o: coll2)
if(freq.compute(o, (key,c) -> c == null || c[0] == 0? absent:
--c[0] == 0? null: c) == absent) return false;
return freq.isEmpty();
}
which removes entries from the map when their count reaches zero, so we only have to check the map for emptiness at the end.

I would sort them. But first I would compare the sizes before doing the sort. You would need to provide a Comparator<T> to be used by the sort method. If you're sorting Integers, you could use:
List<Integer> a = new ArrayList<>(List.of(1, 2, 3, 3, 3, 3, 4, 5, 6));
List<Integer> b = new ArrayList<>(List.of(2, 3, 1, 3, 4, 5, 6, 3, 3));
System.out.println(compareLists(a, b, Comparator.naturalOrder()));
public static <T> boolean compareList(List<T> list1, List<T> list2,
Comparator<T> comp) {
if (list1 == list2) {
return true;
}
if (list1.size() != list2.size()) {
return false;
}
Collections.sort(list1, comp);
Collections.sort(list2, comp);
return list1.equals(list2);
}

Related

Counting each distinct array occurrence in a list of arrays with duplicates

PROBLEM
I have a list of arrays and I want to count the occurrences of duplicates.
For example, if I have this :
{{1,2,3},
{1,0,3},
{1,2,3},
{5,2,6},
{5,2,6},
{5,2,6}}
I want a map (or any relevant collection) like this :
{ {1,2,3} -> 2,
{1,0,3} -> 1,
{5,2,6} -> 3 }
I can even lose the arrays values, I'm only interested in cardinals (e.g. 2, 1 and 3 here).
MY SOLUTION
I use the following algorithm :
First hash the arrays, and check if each hash is in an HashMap<Integer, ArrayList<int[]>>, let's name it distinctHash, where the key is the hash and the value is an ArrayList, let's name it rowList, containing the different arrays for this hash (to avoid collisions).
If the hash is not in distinctHash, put it with the value 1 in another HashMap<int[], Long> that counts each occurrence, let's call it distinctElements.
Then if the hash is in distinctHash, check if the corresponding array is contained in rowList. If it is, increment the value in distinctElements associated to the identical array found in rowList. (If you use the new array as a key you will create another key since their reference are different).
Here is the code, the boolean returned tells if a new distinct array was found, I apply this function sequentially on all of my arrays :
HashMap<int[], Long> distinctElements;
HashMap<Integer, ArrayList<int[]>> distinctHash;
private boolean addRow(int[] row) {
if (distinctHash.containsKey(hash)) {
int[] indexRow = distinctHash.get(hash).get(0);
for (int[] previousRow: distinctHash.get(hash)) {
if (Arrays.equals(previousRow, row)) {
distinctElements.put(
indexRow,
distinctElements.get(indexRow) + 1
);
return false;
}
}
distinctElements.put(row, 1L);
ArrayList<int[]> rowList = distinctHash.get(hash);
rowList.add(row);
distinctHash.put(hash, rowList);
return true;
} else {
distinctElements.put(row, 1L);
ArrayList<int[]> newValue = new ArrayList<>();
newValue.add(row);
distinctHash.put(hash, newValue);
return true;
}
}
QUESTION
The problem is that my algorithm is too slow for my needs (40s for 5,000,000 arrays, and 2h-3h for 20,000,000 arrays). Profiling with NetBeans told me that the hashing takes 70% of runtime (using Google Guava murmur3_128 hash function).
Is there another algorithm that could be faster? As I said I'm not interested in arrays values, only in the number of their occurrences. I am ready to sacrifice precision for speed so a probabilistic algorithm is fine.
Wrap the int[] in a class that implements equals and hashCode, then build Map of the wrapper class to instance count.
class IntArray {
private int[] array;
public IntArray(int[] array) {
this.array = array;
}
#Override
public int hashCode() {
return Arrays.hashCode(this.array);
}
#Override
public boolean equals(Object obj) {
return (obj instanceof IntArray && Arrays.equals(this.array, ((IntArray) obj).array));
}
#Override
public String toString() {
return Arrays.toString(this.array);
}
}
Test
int[][] input = {{1,2,3},
{1,0,3},
{1,2,3},
{5,2,6},
{5,2,6},
{5,2,6}};
Map<IntArray, Long> map = Arrays.stream(input).map(IntArray::new)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
map.entrySet().forEach(System.out::println);
Output
[1, 2, 3]=2
[1, 0, 3]=1
[5, 2, 6]=3
Note: The above solution is faster and uses less memory than solution by Ravindra Ranwala, but it does require the creation of an extra class, so it is debatable which is better.
For smaller arrays, use the simpler solution below by Ravindra Ranwala.
For larger arrays, the above solution is likely better.
Map<List<Integer>, Long> map = Stream.of(input)
.map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
You may do it like so,
Map<List<Integer>, Long> result = Stream.of(source)
.map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
And here's the output,
{[1, 2, 3]=2, [1, 0, 3]=1, [5, 2, 6]=3}
If the sequence of elements for all duplication of that array is like each other and the length of each array is not much, you can map each array to an int number and using from last part of your method. Although this method decrease the time of hashing, there are some assumptions here which might not be true for your case.

Java 8: How to turn a list into a list of lists using lambda

I'm trying to split a list into a list of list where each list has a maximum size of 4.
I would like to know how this is possible to do using lambdas.
Currently the way I'm doing it is as follow:
List<List<Object>> listOfList = new ArrayList<>();
final int MAX_ROW_LENGTH = 4;
int startIndex =0;
while(startIndex <= listToSplit.size() )
{
int endIndex = ( ( startIndex+MAX_ROW_LENGTH ) < listToSplit.size() ) ? startIndex+MAX_ROW_LENGTH : listToSplit.size();
listOfList.add(new ArrayList<>(listToSplit.subList(startIndex, endIndex)));
startIndex = startIndex+MAX_ROW_LENGTH;
}
UPDATE
It seems that there isn't a simple way to use lambdas to split lists. While all of the answers are much appreciated, they're also a wonderful example of when lambdas do not simplify things.
Try this approach:
static <T> List<List<T>> listSplitter(List<T> incoming, int size) {
// add validation if needed
return incoming.stream()
.collect(Collector.of(
ArrayList::new,
(accumulator, item) -> {
if(accumulator.isEmpty()) {
accumulator.add(new ArrayList<>(singletonList(item)));
} else {
List<T> last = accumulator.get(accumulator.size() - 1);
if(last.size() == size) {
accumulator.add(new ArrayList<>(singletonList(item)));
} else {
last.add(item);
}
}
},
(li1, li2) -> {
li1.addAll(li2);
return li1;
}
));
}
System.out.println(
listSplitter(
Arrays.asList(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
4
)
);
Also note that this code could be optimized, instead of:
new ArrayList<>(Collections.singletonList(item))
use this one:
List<List<T>> newList = new ArrayList<>(size);
newList.add(item);
return newList;
If you REALLY need a lambda it can be done like this. Otherwise the previous answers are better.
List<List<Object>> lists = new ArrayList<>();
AtomicInteger counter = new AtomicInteger();
final int MAX_ROW_LENGTH = 4;
listToSplit.forEach(pO -> {
if(counter.getAndIncrement() % MAX_ROW_LENGTH == 0) {
lists.add(new ArrayList<>());
}
lists.get(lists.size()-1).add(pO);
});
Surely the below is sufficient
final List<List<Object>> listOfList = new ArrayList<>(
listToSplit.stream()
.collect(Collectors.groupingBy(el -> listToSplit.indexOf(el) / MAX_ROW_LENGTH))
.values()
);
Stream it, collect with a grouping: this gives a Map of Object -> List, pull the values of the map and pass directly into whatever constructor (map.values() gives a Collection not a List).
Perhaps you can use something like that
BiFunction<List,Integer,List> splitter= (list2, count)->{
//temporary list of lists
List<List> listOfLists=new ArrayList<>();
//helper implicit recursive function
BiConsumer<Integer,BiConsumer> splitterHelper = (offset, func) -> {
if(list2.size()> offset+count){
listOfLists.add(list2.subList(offset,offset+count));
//implicit self call
func.accept(offset+count,func);
}
else if(list2.size()>offset){
listOfLists.add(list2.subList(offset,list2.size()));
//implicit self call
func.accept(offset+count,func);
}
};
//pass self reference
splitterHelper.accept(0,splitterHelper);
return listOfLists;
};
Usage example
List<Integer> list=new ArrayList<Integer>(){{
add(1);
add(2);
add(3);
add(4);
add(5);
add(6);
add(7);
add(8);
add(8);
}};
//calling splitter function
List listOfLists = splitter.apply(list, 3 /*max sublist size*/);
System.out.println(listOfLists);
And as a result we have
[[1, 2, 3], [4, 5, 6], [7, 8, 8]]
The requirement is a bit odd, but you could do:
final int[] counter = new int[] {0};
List<List<Object>> listOfLists = in.stream()
.collect(Collectors.groupingBy( x -> counter[0]++ / MAX_ROW_LENGTH ))
.entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.map(Map.Entry::getValue)
.collect(Collectors.toList());
You could probably streamline this by using the variant of groupingBy that takes a mapSupplier lambda, and supplying a SortedMap. This should return an EntrySet that iterates in order. I leave it as an exercise.
What we're doing here is:
Collecting your list items into a Map<Integer,Object> using a counter to group. The counter is held in a single-element array because the lambda can only use local variables if they're final.
Getting the map entries as a stream, and sorting by the Integer key.
Using Stream::map() to convert the stream of Map.Entry<Integer,Object> into a stream of Object values.
Collecting this into a list.
This doesn't benefit from any "free" parallelisation. It has a memory overhead in the intermediate Map. It's not particularly easy to read.
However, I wouldn't do this, just for the sake of using a lambda. I would do something like:
for(int i=0; i<in.size(); i += MAX_ROW_LENGTH) {
listOfList.add(
listToSplit.subList(i, Math.min(i + MAX_ROW_LENGTH, in.size());
}
(Yours had a defensive copy new ArrayList<>(listToSplit.subList(...)). I've not duplicated it because it's not always necessary - for example if the input list is unmodifiable and the output lists aren't intended to be modifiable. But do put it back in if you decide you need it in your case.)
This will be extremely fast on any in-memory list. You're very unlikely to want to parallelise it.
Alternatively, you could write your own (unmodifiable) implementation of List that's a view over the underlying List<Object>:
public class PartitionedList<T> extends AbstractList<List<T>> {
private final List<T> source;
private final int sublistSize;
public PartitionedList(T source, int sublistSize) {
this.source = source;
this.sublistSize = sublistSize;
}
#Override
public int size() {
return source.size() / sublistSize;
}
#Override
public List<T> get(int index) {
int sourceIndex = index * sublistSize
return source.subList(sourceIndex,
Math.min(sourceIndex + sublistSize, source.size());
}
}
Again, it's up to you whether you want to make defensive copies here.
This will be have equivalent big-O access time to the underlying list.
You can use:
ListUtils.partition(List list, int size)
OR
List<List> partition(List list, int size)
Both return consecutive sublists of a list, each of the same size (the final list may be smaller).

Using java8 Streams merge internal lists within a list

I want to merge inner Lists using java8 streams for following like this:
When
List<List<Integer>> mainList = new ArrayList<List<Integer>>();
mainList.add(Arrays.asList(0,1));
mainList.add(Arrays.asList(0,1,2));
mainList.add(Arrays.asList(1,2));
mainList.add(Arrays.asList(3));
should be merged into
[[0,1,2],[3]];
And When
List<List<Integer>> mainList = new ArrayList<List<Integer>>();
mainList.add(Arrays.asList(0,2));
mainList.add(Arrays.asList(1,4));
mainList.add(Arrays.asList(0,2,4));
mainList.add(Arrays.asList(3,4));
mainList.add(Arrays.asList(1,3,4));
should be merged into
[[0,1,2,3,4]];
This is so far what I have done
static void mergeCollections(List<List<Integer>> collectionTomerge) {
boolean isMerge = false;
List<List<Integer>> mergeCollection = new ArrayList<List<Integer>>();
for (List<Integer> listInner : collectionTomerge) {
List<Integer> mergeAny = mergeCollection.stream().map(
lc -> lc.stream().filter(listInner::contains)
).findFirst()
.orElse(null)
.collect(Collectors.toList());
}
}
but I am getting this exception:
Exception in thread "main" java.lang.NullPointerException
at linqArraysOperations.LinqOperations.mergeCollections(LinqOperations.java:87)
Updated with mine version of answer
That's what I want to achieve but great answer of Tagir is without recursion
I change things a bit in Mikhaal answer to achieve this, by using logic from Tagir answer without flat map
public static <T> List<List<T>> combineList(List<List<T>> argList) {
boolean isMerge = false;
List<List<T>> result = new ArrayList<>();
for (List<T> list : argList) {
List<List<T>> mergedFound =
result.stream()
.filter(mt->list.stream().anyMatch(mt::contains))
.map(
t -> Stream.concat(t.stream(),list.stream()).distinct()
.collect(Collectors.toList())
)
.collect(Collectors.toList());
//if(mergedFound !=null && ( mergedFound.size() > 0 && mergedFound.stream().findFirst().get().size() > 0 )){
if(mergedFound !=null && mergedFound.size() > 0 && ){
result = Stream.concat(result.stream().filter(t->list.stream().noneMatch(t::contains)),mergedFound.stream()).distinct().collect(Collectors.toList());
isMerge = true;
}
else
result.add(list);
}
if(isMerge && result.size() > 1)
return combineList(result);
return result;
}
Here's very simple, yet not very efficient solution:
static List<List<Integer>> mergeCollections(List<List<Integer>> input) {
List<List<Integer>> result = Collections.emptyList();
for (List<Integer> listInner : input) {
List<Integer> merged = Stream.concat(
// read current results and select only those which contain
// numbers from current list
result.stream()
.filter(list -> list.stream().anyMatch(listInner::contains))
// flatten them into single stream
.flatMap(List::stream),
// concatenate current list, remove repeating numbers and collect
listInner.stream()).distinct().collect(Collectors.toList());
// Now we need to remove used lists from the result and add the newly created
// merged list
result = Stream.concat(
result.stream()
// filter out used lists
.filter(list -> list.stream().noneMatch(merged::contains)),
Stream.of(merged)).collect(Collectors.toList());
}
return result;
}
The tricky part is that next listInner may merge several lists which were already added. For example, if we had partial result like [[1, 2], [4, 5], [7, 8]], and processing a new listInner which content is [2, 3, 5, 7], then partial result should become [[1, 2, 3, 4, 5, 7, 8]] (that is, all lists are merged together). So on every iteration we are looking for existing partial results which have common numbers with current listInner, flatten them, concatenating with current listInner and dumping into the new merged list. Next we filter out from the current result lists which were used in merged and adding merged there.
You can make the solution somewhat more efficient using partitioningBy collector to perform both filtering steps at once:
static List<List<Integer>> mergeCollections(List<List<Integer>> input) {
List<List<Integer>> result = Collections.emptyList();
for (List<Integer> listInner : input) {
// partition current results by condition: whether they contain
// numbers from listInner
Map<Boolean, List<List<Integer>>> map = result.stream().collect(
Collectors.partitioningBy(
list -> list.stream().anyMatch(listInner::contains)));
// now map.get(true) contains lists which intersect with current
// and should be merged with current
// and map.get(false) contains other lists which should be preserved
// in result as is
List<Integer> merged = Stream.concat(
map.get(true).stream().flatMap(List::stream),
listInner.stream()).distinct().collect(Collectors.toList());
result = Stream.concat(map.get(false).stream(), Stream.of(merged))
.collect(Collectors.toList());
}
return result;
}
Here map.get(true) contains the lists which have elements from listInner and map.get(false) contains other lists which should be preserved from the previous result.
The order of elements is probably not what you would expect, but you could easily sort nested lists or use List<TreeSet<Integer>> as resulting data structure if you want.
For the exception you're getting, I would guess that the List<List<Integer>> that the mergeCollections is being passed contains null values, and that those throw NullPointerException on listInner::contains.
Second, if I understand your problem correctly, you want an algorithm that can merge lists that share a common element. I came up with this to solve your problem:
public class Combiner {
public static void main(String[] args) {
List<List<Integer>> mainList = new ArrayList<>();
mainList.add(Arrays.asList(1, 2));
mainList.add(Arrays.asList(4, 5));
mainList.add(Arrays.asList(7, 8));
mainList.add(Arrays.asList(6, 19));
mainList.add(Arrays.asList(2, 3, 5, 7));
System.out.println(combineList(new ArrayList<>(mainList)));
List<List<Integer>> result = mergeCollections(new ArrayList<>(mainList));
System.out.println(result);
}
public static <T> List<List<T>> combineList(List<List<T>> argList) {
List<List<T>> result = new ArrayList<>();
for (List<T> list : argList) {
//Copy the given list
List<T> addedList = new ArrayList<>(list);
result.add(addedList);
for (List<T> otherList : argList) {
if (list.equals(otherList)) continue;
//If at least one element is shared between the two lists
if (list.stream().anyMatch(otherList::contains)) {
//Add all elements that are exclusive to the second list
addedList.addAll(otherList.stream().map(t -> addedList.contains(t) ? null : t)
.filter(t -> t != null).collect(Collectors.toList()));
}
}
}
List<List<T>> del = new ArrayList<>();
for (int i = 0; i < result.size(); i++) {
for (int j = i + 1; j < result.size(); j++) {
List<T> list = result.get(j);
if (listEqualsUnOrdered(list, result.get(i))) {
//Modified this
del.add(result.get(i));
}
}
//Can't use listIterator here because of iterating starting at j + 1
result.removeAll(del);
}
//Recursion
if (!result.equals(argList)) {
result = combineList(result);
}
return result;
}
private static <T> boolean listEqualsUnOrdered(List<T> list1, List<T> list2) {
if (list1.size() != list2.size()) return false;
List<T> testOne = new ArrayList<>(list1);
testOne.removeAll(list2);
boolean testOnePassed = (testOne.size() == 0);
List<T> testTwo = new ArrayList<>(list2);
testTwo.removeAll(list1);
if (testTwo.size() == 0 && testOnePassed) return true;
return false;
}
}
The algorithm is fairly simple, it uses a simple recursion to run the algorithm until the output is "clean", i.e. until the lists have been completely merged. I haven't done any optimizing, but it does what it's supposed to do.
Note that this method will also merge your existing list.

How to force max to return ALL maximum values in a Java Stream?

I've tested a bit the max function on Java 8 lambdas and streams, and it seems that in case max is executed, even if more than one object compares to 0, it returns an arbitrary element within the tied candidates without further consideration.
Is there an evident trick or function for such a max expected behavior, so that all max values are returned? I don't see anything in the API but I am sure it must exist something better than comparing manually.
For instance:
// myComparator is an IntegerComparator
Stream.of(1, 3, 5, 3, 2, 3, 5)
.max(myComparator)
.forEach(System.out::println);
// Would print 5, 5 in any order.
I believe the OP is using a Comparator to partition the input into equivalence classes, and the desired result is a list of members of the equivalence class that is the maximum according to that Comparator.
Unfortunately, using int values as a sample problem is a terrible example. All equal int values are fungible, so there is no notion of preserving the ordering of equivalent values. Perhaps a better example is using string lengths, where the desired result is to return a list of strings from an input that all have the longest length within that input.
I don't know of any way to do this without storing at least partial results in a collection.
Given an input collection, say
List<String> list = ... ;
...it's simple enough to do this in two passes, the first to get the longest length, and the second to filter the strings that have that length:
int longest = list.stream()
.mapToInt(String::length)
.max()
.orElse(-1);
List<String> result = list.stream()
.filter(s -> s.length() == longest)
.collect(toList());
If the input is a stream, which cannot be traversed more than once, it is possible to compute the result in only a single pass using a collector. Writing such a collector isn't difficult, but it is a bit tedious as there are several cases to be handled. A helper function that generates such a collector, given a Comparator, is as follows:
static <T> Collector<T,?,List<T>> maxList(Comparator<? super T> comp) {
return Collector.of(
ArrayList::new,
(list, t) -> {
int c;
if (list.isEmpty() || (c = comp.compare(t, list.get(0))) == 0) {
list.add(t);
} else if (c > 0) {
list.clear();
list.add(t);
}
},
(list1, list2) -> {
if (list1.isEmpty()) {
return list2;
}
if (list2.isEmpty()) {
return list1;
}
int r = comp.compare(list1.get(0), list2.get(0));
if (r < 0) {
return list2;
} else if (r > 0) {
return list1;
} else {
list1.addAll(list2);
return list1;
}
});
}
This stores intermediate results in an ArrayList. The invariant is that all elements within any such list are equivalent in terms of the Comparator. When adding an element, if it's less than the elements in the list, it's ignored; if it's equal, it's added; and if it's greater, the list is emptied and the new element is added. Merging isn't too difficult either: the list with the greater elements is returned, but if their elements are equal the lists are appended.
Given an input stream, this is pretty easy to use:
Stream<String> input = ... ;
List<String> result = input.collect(maxList(comparing(String::length)));
I would group by value and store the values into a TreeMap in order to have my values sorted, then I would get the max value by getting the last entry as next:
Stream.of(1, 3, 5, 3, 2, 3, 5)
.collect(groupingBy(Function.identity(), TreeMap::new, toList()))
.lastEntry()
.getValue()
.forEach(System.out::println);
Output:
5
5
I implemented more generic collector solution with custom downstream collector. Probably some readers might find it useful:
public static <T, A, D> Collector<T, ?, D> maxAll(Comparator<? super T> comparator,
Collector<? super T, A, D> downstream) {
Supplier<A> downstreamSupplier = downstream.supplier();
BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
BinaryOperator<A> downstreamCombiner = downstream.combiner();
class Container {
A acc;
T obj;
boolean hasAny;
Container(A acc) {
this.acc = acc;
}
}
Supplier<Container> supplier = () -> new Container(downstreamSupplier.get());
BiConsumer<Container, T> accumulator = (acc, t) -> {
if(!acc.hasAny) {
downstreamAccumulator.accept(acc.acc, t);
acc.obj = t;
acc.hasAny = true;
} else {
int cmp = comparator.compare(t, acc.obj);
if (cmp > 0) {
acc.acc = downstreamSupplier.get();
acc.obj = t;
}
if (cmp >= 0)
downstreamAccumulator.accept(acc.acc, t);
}
};
BinaryOperator<Container> combiner = (acc1, acc2) -> {
if (!acc2.hasAny) {
return acc1;
}
if (!acc1.hasAny) {
return acc2;
}
int cmp = comparator.compare(acc1.obj, acc2.obj);
if (cmp > 0) {
return acc1;
}
if (cmp < 0) {
return acc2;
}
acc1.acc = downstreamCombiner.apply(acc1.acc, acc2.acc);
return acc1;
};
Function<Container, D> finisher = acc -> downstream.finisher().apply(acc.acc);
return Collector.of(supplier, accumulator, combiner, finisher);
}
So by default it can be collected to a list using:
public static <T> Collector<T, ?, List<T>> maxAll(Comparator<? super T> comparator) {
return maxAll(comparator, Collectors.toList());
}
But you can use other downstream collectors as well:
public static String joinLongestStrings(Collection<String> input) {
return input.stream().collect(
maxAll(Comparator.comparingInt(String::length), Collectors.joining(","))));
}
If I understood well, you want the frequency of the max value in the Stream.
One way to achieve that would be to store the results in a TreeMap<Integer, List<Integer> when you collect elements from the Stream. Then you grab the last key (or first depending on the comparator you give) to get the value which will contains the list of max values.
List<Integer> maxValues = st.collect(toMap(i -> i,
Arrays::asList,
(l1, l2) -> Stream.concat(l1.stream(), l2.stream()).collect(toList()),
TreeMap::new))
.lastEntry()
.getValue();
Collecting it from the Stream(4, 5, -2, 5, 5) will give you a List [5, 5, 5].
Another approach in the same spirit would be to use a group by operation combined with the counting() collector:
Entry<Integer, Long> maxValues = st.collect(groupingBy(i -> i,
TreeMap::new,
counting())).lastEntry(); //5=3 -> 5 appears 3 times
Basically you firstly get a Map<Integer, List<Integer>>. Then the downstream counting() collector will return the number of elements in each list mapped by its key resulting in a Map. From there you grab the max entry.
The first approaches require to store all the elements from the stream. The second one is better (see Holger's comment) as the intermediate List is not built. In both approached, the result is computed in a single pass.
If you get the source from a collection, you may want to use Collections.max one time to find the maximum value followed by Collections.frequency to find how many times this value appears.
It requires two passes but uses less memory as you don't have to build the data-structure.
The stream equivalent would be coll.stream().max(...).get(...) followed by coll.stream().filter(...).count().
I'm not really sure whether you are trying to
(a) find the number of occurrences of the maximum item, or
(b) Find all the maximum values in the case of a Comparator that is not consistent with equals.
An example of (a) would be [1, 5, 4, 5, 1, 1] -> [5, 5].
An example of (b) would be:
Stream.of("Bar", "FOO", "foo", "BAR", "Foo")
.max((s, t) -> s.toLowerCase().compareTo(t.toLowerCase()));
which you want to give [Foo, foo, Foo], rather than just FOO or Optional[FOO].
In both cases, there are clever ways to do it in just one pass. But these approaches are of dubious value because you would need to keep track of unnecessary information along the way. For example, if you start with [2, 0, 2, 2, 1, 6, 2], it would only be when you reach 6 that you would realise it was not necessary to track all the 2s.
I think the best approach is the obvious one; use max, and then iterate the items again putting all the ties into a collection of your choice. This will work for both (a) and (b).
If you'd rather rely on a library than the other answers here, StreamEx has a collector to do this.
Stream.of(1, 3, 5, 3, 2, 3, 5)
.collect(MoreCollectors.maxAll())
.forEach(System.out::println);
There's a version which takes a Comparator too for streams of items which don't have a natural ordering (i.e. don't implement Comparable).
System.out.println(
Stream.of(1, 3, 5, 3, 2, 3, 5)
.map(a->new Integer[]{a})
.reduce((a,b)->
a[0]==b[0]?
Stream.concat(Stream.of(a),Stream.of(b)).toArray() :
a[0]>b[0]? a:b
).get()
)
I was searching for a good answer on this question, but a tad more complex and couldn't find anything until I figured it out myself, which is why I'm posting if this helps anybody.
I have a list of Kittens.
Kitten is an object which has a name, age and gender. I had to return a list of all the youngest kittens.
For example:
So kitten list would contain kitten objects (k1, k2, k3, k4) and their ages would be (1, 2, 3, 1) accordingly. We want to return [k1, k4], because they are both the youngest. If only one youngest exists, the function should return [k1(youngest)].
Find the min value of the list (if it exists):
Optional<Kitten> minKitten = kittens.stream().min(Comparator.comparingInt(Kitten::getAge));
filter the list by the min value
return minKitten.map(value -> kittens.stream().filter(kitten -> kitten.getAge() == value.getAge())
.collect(Collectors.toList())).orElse(Collections.emptyList());
The following two lines will do it without implementing a separate comparator:
List<Integer> list = List.of(1, 3, 5, 3, 2, 3, 5);
list.stream().filter(i -> i == (list.stream().max(Comparator.comparingInt(i2 -> i2))).get()).forEach(System.out::println);

Intersection and union of ArrayLists in Java

Are there any methods to do so? I was looking but couldn't find any.
Another question: I need these methods so I can filter files.
Some are AND filters and some are OR filters (like in set theory), so I need to filter according to all files and the unite/intersects ArrayLists that holds those files.
Should I use a different data structure to hold the files? Is there anything else that would offer a better runtime?
Here's a plain implementation without using any third-party library. Main advantage over retainAll, removeAll and addAll is that these methods don't modify the original lists input to the methods.
public class Test {
public static void main(String... args) throws Exception {
List<String> list1 = new ArrayList<String>(Arrays.asList("A", "B", "C"));
List<String> list2 = new ArrayList<String>(Arrays.asList("B", "C", "D", "E", "F"));
System.out.println(new Test().intersection(list1, list2));
System.out.println(new Test().union(list1, list2));
}
public <T> List<T> union(List<T> list1, List<T> list2) {
Set<T> set = new HashSet<T>();
set.addAll(list1);
set.addAll(list2);
return new ArrayList<T>(set);
}
public <T> List<T> intersection(List<T> list1, List<T> list2) {
List<T> list = new ArrayList<T>();
for (T t : list1) {
if(list2.contains(t)) {
list.add(t);
}
}
return list;
}
}
Collection (so ArrayList also) have:
col.retainAll(otherCol) // for intersection
col.addAll(otherCol) // for union
Use a List implementation if you accept repetitions, a Set implementation if you don't:
Collection<String> col1 = new ArrayList<String>(); // {a, b, c}
// Collection<String> col1 = new TreeSet<String>();
col1.add("a");
col1.add("b");
col1.add("c");
Collection<String> col2 = new ArrayList<String>(); // {b, c, d, e}
// Collection<String> col2 = new TreeSet<String>();
col2.add("b");
col2.add("c");
col2.add("d");
col2.add("e");
col1.addAll(col2);
System.out.println(col1);
//output for ArrayList: [a, b, c, b, c, d, e]
//output for TreeSet: [a, b, c, d, e]
This post is fairly old, but nevertheless it was the first one popping up on google when looking for that topic.
I want to give an update using Java 8 streams doing (basically) the same thing in a single line:
List<T> intersect = list1.stream()
.filter(list2::contains)
.collect(Collectors.toList());
List<T> union = Stream.concat(list1.stream(), list2.stream())
.distinct()
.collect(Collectors.toList());
If anyone has a better/faster solution let me know, but this solution is a nice one liner that can be easily included in a method without adding a unnecessary helper class/method and still keep the readability.
list1.retainAll(list2) - is intersection
union will be removeAll and then addAll.
Find more in the documentation of collection(ArrayList is a collection)
http://download.oracle.com/javase/1.5.0/docs/api/java/util/Collection.html
Unions and intersections defined only for sets, not lists. As you mentioned.
Check guava library for filters. Also guava provides real intersections and unions
static <E> Sets.SetView<E >union(Set<? extends E> set1, Set<? extends E> set2)
static <E> Sets.SetView<E> intersection(Set<E> set1, Set<?> set2)
You can use CollectionUtils from apache commons.
The solution marked is not efficient. It has a O(n^2) time complexity. What we can do is to sort both lists, and the execute an intersection algorithm as the one below.
private static ArrayList<Integer> interesect(ArrayList<Integer> f, ArrayList<Integer> s) {
ArrayList<Integer> res = new ArrayList<Integer>();
int i = 0, j = 0;
while (i != f.size() && j != s.size()) {
if (f.get(i) < s.get(j)) {
i ++;
} else if (f.get(i) > s.get(j)) {
j ++;
} else {
res.add(f.get(i));
i ++; j ++;
}
}
return res;
}
This one has a complexity of O(n log n + n) which is in O(n log n).
The union is done in a similar manner. Just make sure you make the suitable modifications on the if-elseif-else statements.
You can also use iterators if you want (I know they are more efficient in C++, I dont know if this is true in Java as well).
One-liners since JAVA 8
Union
if there are no duplicates:
return concat(a.stream(), b.stream()).collect(toList());
union and distinct:
return concat(a.stream(), b.stream()).distinct().collect(toList());
union and distinct if Collection/Set return type:
return concat(a.stream(), b.stream()).collect(toSet());
Intersect
if no duplicates:
return a.stream().filter(b::contains).collect(toList());
PERFORMANCE: If collection b is huge and not O(1), then pre-optimize the filter performance by adding 1 line before return: Copy to HasSet (import java.util.Set;):
... b = Set.copyOf(b);
intersect and distinct:
return a.stream().distinct().filter(b::contains).collect(toList());
- imports
import static java.util.stream.Stream.concat;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toSet;
I think you should use a Set to hold the files if you want to do intersection and union on them. Then you can use Guava's Sets class to do union, intersection and filtering by a Predicate as well. The difference between these methods and the other suggestions is that all of these methods create lazy views of the union, intersection, etc. of the two sets. Apache Commons creates a new collection and copies data to it. retainAll changes one of your collections by removing elements from it.
Here is a way how you can do an intersection with streams (remember that you have to use java 8 for streams):
List<foo> fooList1 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<foo> fooList2 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
fooList1.stream().filter(f -> fooList2.contains(f)).collect(Collectors.toList());
An example for lists with different types. If you have a realtion between foo and bar and you can get a bar-object from foo than you can modify your stream:
List<foo> fooList = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<bar> barList = new ArrayList<>(Arrays.asList(new bar(), new bar()));
fooList.stream().filter(f -> barList.contains(f.getBar()).collect(Collectors.toList());
You can use commons-collections4 CollectionUtils
Collection<Integer> collection1 = Arrays.asList(1, 2, 4, 5, 7, 8);
Collection<Integer> collection2 = Arrays.asList(2, 3, 4, 6, 8);
Collection<Integer> intersection = CollectionUtils.intersection(collection1, collection2);
System.out.println(intersection); // [2, 4, 8]
Collection<Integer> union = CollectionUtils.union(collection1, collection2);
System.out.println(union); // [1, 2, 3, 4, 5, 6, 7, 8]
Collection<Integer> subtract = CollectionUtils.subtract(collection1, collection2);
System.out.println(subtract); // [1, 5, 7]
retainAll will modify your list
Guava doesn't have APIs for List (only for set)
I found ListUtils very useful for this use case.
Use ListUtils from org.apache.commons.collections if you do not want to modify existing list.
ListUtils.intersection(list1, list2)
In Java 8, I use simple helper methods like this:
public static <T> Collection<T> getIntersection(Collection<T> coll1, Collection<T> coll2){
return Stream.concat(coll1.stream(), coll2.stream())
.filter(coll1::contains)
.filter(coll2::contains)
.collect(Collectors.toSet());
}
public static <T> Collection<T> getMinus(Collection<T> coll1, Collection<T> coll2){
return coll1.stream().filter(not(coll2::contains)).collect(Collectors.toSet());
}
public static <T> Predicate<T> not(Predicate<T> t) {
return t.negate();
}
If the objects in the list are hashable (i.e. have a decent hashCode and equals function), the fastest approach between tables approx. size > 20 is to construct a HashSet for the larger of the two lists.
public static <T> ArrayList<T> intersection(Collection<T> a, Collection<T> b) {
if (b.size() > a.size()) {
return intersection(b, a);
} else {
if (b.size() > 20 && !(a instanceof HashSet)) {
a = new HashSet(a);
}
ArrayList<T> result = new ArrayList();
for (T objb : b) {
if (a.contains(objb)) {
result.add(objb);
}
}
return result;
}
}
I was also working on the similar situation and reached here searching for help. Ended up finding my own solution for Arrays.
ArrayList AbsentDates = new ArrayList(); // Will Store Array1-Array2
Note : Posting this if it can help someone reaching this page for help.
ArrayList<String> AbsentDates = new ArrayList<String>();//This Array will store difference
public void AbsentDays() {
findDates("April", "2017");//Array one with dates in Month April 2017
findPresentDays();//Array two carrying some dates which are subset of Dates in Month April 2017
for (int i = 0; i < Dates.size(); i++) {
for (int j = 0; j < PresentDates.size(); j++) {
if (Dates.get(i).equals(PresentDates.get(j))) {
Dates.remove(i);
}
}
AbsentDates = Dates;
}
System.out.println(AbsentDates );
}
Intersection of two list of different object based on common key - Java 8
private List<User> intersection(List<User> users, List<OtherUser> list) {
return list.stream()
.flatMap(OtherUser -> users.stream()
.filter(user -> user.getId()
.equalsIgnoreCase(OtherUser.getId())))
.collect(Collectors.toList());
}
public static <T> Set<T> intersectCollections(Collection<T> col1, Collection<T> col2) {
Set<T> set1, set2;
if (col1 instanceof Set) {
set1 = (Set) col1;
} else {
set1 = new HashSet<>(col1);
}
if (col2 instanceof Set) {
set2 = (Set) col2;
} else {
set2 = new HashSet<>(col2);
}
Set<T> intersection = new HashSet<>(Math.min(set1.size(), set2.size()));
for (T t : set1) {
if (set2.contains(t)) {
intersection.add(t);
}
}
return intersection;
}
JDK8+ (Probably Best Performance)
public static <T> Set<T> intersectCollections(Collection<T> col1, Collection<T> col2) {
boolean isCol1Larger = col1.size() > col2.size();
Set<T> largerSet;
Collection<T> smallerCol;
if (isCol1Larger) {
if (col1 instanceof Set) {
largerSet = (Set<T>) col1;
} else {
largerSet = new HashSet<>(col1);
}
smallerCol = col2;
} else {
if (col2 instanceof Set) {
largerSet = (Set<T>) col2;
} else {
largerSet = new HashSet<>(col2);
}
smallerCol = col1;
}
return smallerCol.stream()
.filter(largerSet::contains)
.collect(Collectors.toSet());
}
If you don't care about performance and prefer smaller code just use:
col1.stream().filter(col2::contains).collect(Collectors.toList());
First, I am copying all values of arrays into a single array then I am removing duplicates values into the array. Line 12, explaining if same number occur more than time then put some extra garbage value into "j" position. At the end, traverse from start-end and check if same garbage value occur then discard.
public class Union {
public static void main(String[] args){
int arr1[]={1,3,3,2,4,2,3,3,5,2,1,99};
int arr2[]={1,3,2,1,3,2,4,6,3,4};
int arr3[]=new int[arr1.length+arr2.length];
for(int i=0;i<arr1.length;i++)
arr3[i]=arr1[i];
for(int i=0;i<arr2.length;i++)
arr3[arr1.length+i]=arr2[i];
System.out.println(Arrays.toString(arr3));
for(int i=0;i<arr3.length;i++)
{
for(int j=i+1;j<arr3.length;j++)
{
if(arr3[i]==arr3[j])
arr3[j]=99999999; //line 12
}
}
for(int i=0;i<arr3.length;i++)
{
if(arr3[i]!=99999999)
System.out.print(arr3[i]+" ");
}
}
}
After testing, here is my best intersection approach.
Faster speed compared to pure HashSet Approach. HashSet and HashMap below has similar performance for arrays with more than 1 million records.
As for Java 8 Stream approach, speed is quite slow for array size larger then 10k.
Hope this can help.
public static List<String> hashMapIntersection(List<String> target, List<String> support) {
List<String> r = new ArrayList<String>();
Map<String, Integer> map = new HashMap<String, Integer>();
for (String s : support) {
map.put(s, 0);
}
for (String s : target) {
if (map.containsKey(s)) {
r.add(s);
}
}
return r;
}
public static List<String> hashSetIntersection(List<String> a, List<String> b) {
Long start = System.currentTimeMillis();
List<String> r = new ArrayList<String>();
Set<String> set = new HashSet<String>(b);
for (String s : a) {
if (set.contains(s)) {
r.add(s);
}
}
print("intersection:" + r.size() + "-" + String.valueOf(System.currentTimeMillis() - start));
return r;
}
public static void union(List<String> a, List<String> b) {
Long start = System.currentTimeMillis();
Set<String> r= new HashSet<String>(a);
r.addAll(b);
print("union:" + r.size() + "-" + String.valueOf(System.currentTimeMillis() - start));
}
retainAll() method use for finding common element..i.e;intersection
list1.retainAll(list2)
You can use the methods:
CollectionUtils.containsAny and CollectionUtils.containsAll
from Apache Commons.
Final solution:
//all sorted items from both
public <T> List<T> getListReunion(List<T> list1, List<T> list2) {
Set<T> set = new HashSet<T>();
set.addAll(list1);
set.addAll(list2);
return new ArrayList<T>(set);
}
//common items from both
public <T> List<T> getListIntersection(List<T> list1, List<T> list2) {
list1.retainAll(list2);
return list1;
}
//common items from list1 not present in list2
public <T> List<T> getListDifference(List<T> list1, List<T> list2) {
list1.removeAll(list2);
return list1;
}
If you had your data in Sets you could use Guava's Sets class.
If the number matches than I am checking it's occur first time or not with help of "indexOf()" if the number matches first time then print and save into in a string so, that when the next time same number matches then it's won't print because due to "indexOf()" condition will be false.
class Intersection
{
public static void main(String[] args)
{
String s="";
int[] array1 = {1, 2, 5, 5, 8, 9, 7,2,3512451,4,4,5 ,10};
int[] array2 = {1, 0, 6, 15, 6, 5,4, 1,7, 0,5,4,5,2,3,8,5,3512451};
for (int i = 0; i < array1.length; i++)
{
for (int j = 0; j < array2.length; j++)
{
char c=(char)(array1[i]);
if(array1[i] == (array2[j])&&s.indexOf(c)==-1)
{
System.out.println("Common element is : "+(array1[i]));
s+=c;
}
}
}
}
}

Categories