Java 8 stream.collect( ... groupingBy ( ... mapping( ... reducing ))) reducing BinaryOperator-usage - java

I played around with a solution using groupingBy, mapping and reducing
to the following question: Elegantly create map with object fields as key/value from object stream in Java 8. Summarized the goal was to get a map with age as key and the hobbies of a person as a Set.
One of the solutions I came up with (not nice, but that's not the point) had a strange behaviour.
With the following list as input:
List<Person> personList = Arrays.asList(
new Person(/* name */ "A", /* age */ 23, /* hobbies */ asList("a")),
new Person("BC", 24, asList("b", "c")),
new Person("D", 23, asList("d")),
new Person("E", 23, asList("e"))
);
and the following solution:
Collector<List<String>, ?, Set<String>> listToSetReducer = Collectors.reducing(new HashSet<>(), HashSet::new, (strings, strings2) -> {
strings.addAll(strings2);
return strings;
});
Map<Integer, Set<String>> map = personList.stream()
.collect(Collectors.groupingBy(o -> o.age,
Collectors.mapping(o -> o.hobbies, listToSetReducer)));
System.out.println("map = " + map);
I got:
map = {23=[a, b, c, d, e], 24=[a, b, c, d, e]}
clearly not what I was expecting. I rather expected this:
map = {23=[a, d, e], 24=[b, c]}
Now if I just replace the order of (strings, strings2) of the binary operator (of the reducing collector) to (strings2, strings) I get the expected result. So what did I miss here?
Did I misinterpret the reducing-collector? Or which documentation piece did I miss that makes it obvious that my usage was not working as expected?
Java version is 1.8.0_121 if that matters.

Reduction should never modify the incoming objects. In your case, you are modifying the incoming HashSet that is supposed to be the identity value and return it, so all groups will have the same HashSet instance as result, containing all values.
What you need is a Mutable Reduction, which can be implemented via Collector.of(…) like it has been already implemented with the prebuilt collectors Collectors.toList(), Collectors.toSet(), etc.
Map<Integer, Set<String>> map = personList.stream()
.collect(Collectors.groupingBy(o -> o.age,
Collector.of(HashSet::new, (s,p) -> s.addAll(p.hobbies), (s1,s2) -> {
s1.addAll(s2);
return s1;
})));
The reason, we need a custom collector at all, is that Java 8 doesn’t have the flatMapping collector, which Java 9 is going to introduce. With that, the solution will look like:
Map<Integer, Set<String>> map = personList.stream()
.collect(Collectors.groupingBy(o -> o.age,
Collectors.flatMapping(p -> p.hobbies.stream(), Collectors.toSet())));

Related

Have Java Streams GroupingBy result Map include a key for each value of an enum, even if value is an empty List

This question is about Java Streams' groupingBy capability.
Suppose I have a class, WorldCup:
public class WorldCup {
int year;
Country champion;
// all-arg constructor, getter/setters, etc
}
and an enum, Country:
public enum Country {
Brazil, France, USA
}
and the following code snippet:
WorldCup wc94 = new WorldCup(1994, Country.Brazil);
WorldCup wc98 = new WorldCup(1998, Country.France);
List<WorldCup> wcList = new ArrayList<WorldCup>();
wcList.add(wc94);
wcList.add(wc98);
Map<Country, List<Integer>> championsMap = wcList.stream()
.collect(Collectors.groupingBy(WorldCup::getCountry, Collectors.mapping(WorldCup::getYear));
After running this code, championsMap will contain:
Brazil: [1994]
France: [1998]
Is there a succinct way to have this list include an entry for all of the values of the enum? What I'm looking for is:
Brazil: [1994]
France: [1998]
USA: []
There are several approaches you can take.
The map which would be used for accumulating the stream data can be prepopulated with entries corresponding to every enum-member. To access all existing enum-members you can use values() method or EnumSet.allOf().
It can be achieved using three-args version of collect() or through a custom collector created via Collector.of().
Map<Country, List<Integer>> championsMap = wcList.stream()
.collect(
() -> EnumSet.allOf(Country.class).stream() // supplier
.collect(Collectors.toMap(
Function.identity(),
c -> new ArrayList<>()
)),
(Map<Country, List<Integer>> map, WorldCup next) -> // accumulator
map.get(next.getCountry()).add(next.getYear()),
(left, right) -> // combiner
right.forEach((k, v) -> left.get(k).addAll(v))
);
Another option is to add missing entries to the map after reduction of the stream has been finished.
For that we can use built-in collector collectingAndThen().
Map<Country, List<Integer>> championsMap = wcList.stream()
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(WorldCup::getCountry,
Collectors.mapping(WorldCup::getYear,
Collectors.toList())),
map -> {
EnumSet.allOf(Country.class)
.forEach(country -> map.computeIfAbsent(country, k -> new ArrayList<>())); // if you're not going to mutate these lists - use Collections.emptyList()
return map;
}
));

Group the List by element occurence count, sort the Map by key and value [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have an input array as
["Setra","Mercedes","Volvo","Mercedes","Skoda","Iveco","MAN",null,"Skoda","Iveco"]
expected output should be
{Iveco=2, Mercedes=2, Skoda=2, MAN=1, Setra=1, Volvo=1}
meaning a map with the key as Vehicle brands and the value as their occurrence count and the similar valued elements should be sorted by the keys alphabetically and value.
I have tried like
public static String busRanking(List<String> busModels) {
Map<String, Long> counters = busModels.stream().skip(1).filter(Objects::nonNull)
.filter(bus -> !bus.startsWith("V"))
.collect(Collectors.groupingBy(bus-> bus, Collectors.counting()));
Map<String, Long> finalMap = new LinkedHashMap<>();
counters.entrySet().stream()
.sorted(Map.Entry.comparingByValue(
Comparator.reverseOrder()))
.forEachOrdered(
e -> finalMap.put(e.getKey(),
e.getValue()));
return finalMap.toString();
}
public static void main(String[] args) {
List<String> busModels = Arrays.asList( "Setra","Mercedes","Volvo","Mercedes","Skoda","Iveco","MAN",null,"Skoda","Iveco");
String busRanking = busRanking(busModels);
System.out.println(busRanking);
}
And the output I am getting
{Skoda=2, Mercedes=2, Iveco=2, Setra=1, MAN=1}
Any suggestion? And the output has to be obtained using single stream()
I think the nice sollution would be:
public static void main(String... args) {
List<String> busModels = Arrays.asList( "Setra","Mercedes","Volvo","Mercedes","Skoda","Iveco","MAN",null,"Skoda","Iveco");
Map<String, Long> collect = busModels.stream()
.filter(Objects::nonNull)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
TreeMap<String, Long> stringLongTreeMap = new TreeMap<>(collect);
Set<Map.Entry<String, Long>> entries = stringLongTreeMap.entrySet();
ArrayList<Map.Entry<String, Long>> list = new ArrayList<>(entries);
list.sort((o1, o2) -> o2.getValue().compareTo(o1.getValue()));
String busRanking = list.toString();
System.out.println(busRanking);
}
If you can use a third party library, this should work using Streams with Eclipse Collections:
Comparator<ObjectIntPair<String>> comparator =
Comparators.fromFunctions(each -> -each.getTwo(), ObjectIntPair::getOne);
String[] strings = {"Setra", "Mercedes", "Volvo", "Mercedes", "Skoda", "Iveco",
"MAN", null, "Skoda", "Iveco"};
List<ObjectIntPair<String>> pairs = Stream.of(strings).collect(Collectors2.toBag())
.select(Predicates.notNull())
.collectWithOccurrences(PrimitiveTuples::pair, Lists.mutable.empty())
.sortThis(comparator);
System.out.println(pairs);
Outputs:
[Iveco:2, Mercedes:2, Skoda:2, MAN:1, Setra:1, Volvo:1]
This can also be simplified slightly by not using Streams.
List<ObjectIntPair<String>> pairs = Bags.mutable.with(strings)
.select(Predicates.notNull())
.collectWithOccurrences(PrimitiveTuples::pair, Lists.mutable.empty())
.sortThis(comparator);
Note: I am a committer for Eclipse Collections
'One-liner' using the standard Java API:
Map<String, Long> map = busModels.stream()
.filter(Objects::nonNull)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.sorted(Comparator
.comparing((Entry<String, Long> e) -> e.getValue()).reversed()
.thenComparing(e -> e.getKey()))
.collect(Collectors.toMap(Entry::getKey, Entry::getValue, (left, right) -> left + right, LinkedHashMap::new));
Here's what happens:
It first filters out nulls.
Then it collects into a map, grouping by the number of occurrences.
Then it walks over the entries of the resulting map, in order to sort it by the value first, and then comparing the key (so if two number of occurrences are the same, then Iveco comes before MAN).
At last, we're extracting the entries into a LinkedHashMap, which preserves the insertion order.
Output:
{Iveco=2, Mercedes=2, Skoda=2, MAN=1, Setra=1, Volvo=1}
This uses a single, method-chaining statement. I wouldn't, however, call this a single stream operation, since in fact two streams are created, one to collect the occurrences, and a new one of the entry set to sort.
Streams are normally not designed to be traversed more than once, at least not in the standard Java API.
Can it be done with StreamEx?
There exists a library called StreamEx, which perhaps contains a method which allows us to do it with a single stream operation.
Map<String, Long> map = StreamEx.of(buses)
.filter(Objects::nonNull)
.map(t -> new AbstractMap.SimpleEntry<>(t, 1L))
.sorted(Comparator.comparing(Map.Entry::getKey))
.collapse(Objects::equals, (left, right) -> new AbstractMap.SimpleEntry<>(left.getKey(), left.getValue() + right.getValue()))
.sorted(Comparator
.comparing((Entry<String, Long> e) -> e.getValue()).reversed()
.thenComparing(e -> e.getKey()))
.collect(Collectors.toMap(Entry::getKey, Entry::getValue, (left, right) -> left + right, LinkedHashMap::new));
Here's what happens:
We filter out nulls.
We map it to a stream of Entrys, each key being the bus name, and each value being the initial number of occurrences, which is 1.
Then we sort the stream by each key, making sure that all the same buses are adjacent in the stream.
That allows us to use the collapse method, which merges series of adjacent elements which satisfy the given predicate using the merger function. So Mercedes => 1 + Mercedes => 1 becomes Mercedes => 2.
Then we sort and collect as described above.
This at least seems to be doing in a single stream operation.

Java 8 Streams reduce remove duplicates keeping the most recent entry

I have a Java bean, like
class EmployeeContract {
Long id;
Date date;
getter/setter
}
If a have a long list of these, in which we have duplicates by id but with different date, such as:
1, 2015/07/07
1, 2018/07/08
2, 2015/07/08
2, 2018/07/09
How can I reduce such a list keeping only the entries with the most recent date, such as:
1, 2018/07/08
2, 2018/07/09
?
Preferably using Java 8...
I've started with something like:
contract.stream()
.collect(Collectors.groupingBy(EmployeeContract::getId, Collectors.mapping(EmployeeContract::getId, Collectors.toList())))
.entrySet().stream().findFirst();
That gives me the mapping within individual groups, but I'm stuck as to how to collect that into a result list - my streams are not too strong I'm afraid...
Well, I am just going to put my comment here in the shape of an answer:
yourList.stream()
.collect(Collectors.toMap(
EmployeeContract::getId,
Function.identity(),
BinaryOperator.maxBy(Comparator.comparing(EmployeeContract::getDate)))
)
.values();
This will give you a Collection instead of a List, if you really care about this.
You can do it in two steps as follows :
List<EmployeeContract> finalContract = contract.stream() // Stream<EmployeeContract>
.collect(Collectors.toMap(EmployeeContract::getId,
EmployeeContract::getDate, (a, b) -> a.after(b) ? a : b)) // Map<Long, Date> (Step 1)
.entrySet().stream() // Stream<Entry<Long, Date>>
.map(a -> new EmployeeContract(a.getKey(), a.getValue())) // Stream<EmployeeContract>
.collect(Collectors.toList()); // Step 2
First step: ensures the comparison of dates with the most recent one mapped to an id.
Second step: maps these key, value pairs to a final List<EmployeeContract> as a result.
Just to complement the existing answers, as you're asking:
how to collect that into a result list
Here are some options:
Wrap the values() into an ArrayList:
List<EmployeeContract> list1 =
new ArrayList<>(list.stream()
.collect(toMap(EmployeeContract::getId,
identity(),
maxBy(comparing(EmployeeContract::getDate))))
.values());
Wrap the toMap collector into collectingAndThen:
List<EmployeeContract> list2 =
list.stream()
.collect(collectingAndThen(toMap(EmployeeContract::getId,
identity(),
maxBy(comparing(EmployeeContract::getDate))),
c -> new ArrayList<>(c.values())));
Collect the values to a new List using another stream:
List<EmployeeContract> list3 =
list.stream()
.collect(toMap(EmployeeContract::getId,
identity(),
maxBy(comparing(EmployeeContract::getDate))))
.values()
.stream()
.collect(toList());
With vavr.io you can do it like this:
var finalContract = Stream.ofAll(contract) //create io.vavr.collection.Stream
.groupBy(EmployeeContract::getId)
.map(tuple -> tuple._2.maxBy(EmployeeContract::getDate))
.collect(Collectors.toList()); //result is list from java.util package

Java-Stream, toMap with duplicate keys

So there might be one abc for several payments, now I have:
//find abc id for each payment id
Map<Long, Integer> abcIdToPmtId = paymentController.findPaymentsByIds(pmtIds)
.stream()
.collect(Collectors.toMap(Payment::getAbcId, Payment::getPaymentId));
But then I reallize this could have duplicate keys, so I want it to return a
Map<Long, List<Integer>> abcIdToPmtIds
which an entry will contain one abc and his several payments.
I know I might can use groupingBy but then I think I can only get Map<Long, List<Payments>> .
Use the other groupingBy overload.
paymentController.findPaymentsByIds(pmtIds)
.stream()
.collect(
groupingBy(Payment::getAbcId, mapping(Payment::getPaymentId, toList());
Problem statement: Converting SimpleImmutableEntry<String, List<String>> -> Map<String, List<String>>.
For Instance you have a SimpleImmutableEntry of this form [A,[1]], [B,[2]], [A, [3]] and you want your map to looks like this: A -> [1,3] , B -> [2].
This can be done with Collectors.toMap but Collectors.toMap works only with unique keys unless you provide a merge function to resolve the collision as said in java docs.
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-
So the example code looks like this:
.map(returnSimpleImmutableEntries)
.collect(Collectors.toMap(SimpleImmutableEntry::getKey,
SimpleImmutableEntry::getValue,
(oldList, newList) -> { oldList.addAll(newList); return oldList; } ));
returnSimpleImmutableEntries method returns you entries of the form [A,[1]], [B,[2]], [A, [3]] on which you can use your collectors.
With Collectors.toMap:
Map<Long, Integer> abcIdToPmtId = paymentController.findPaymentsByIds(pmtIds)
.stream()
.collect(Collectors.toMap(
Payment::getAbcId,
p -> new ArrayList<>(Arrays.asList(p.getPaymentId())),
(o, n) -> { o.addAll(n); return o; }));
Though it's more clear and readable to use Collectors.groupingBy along with Collectors.mapping.
You don't need streams to do it though:
Map<Long, Integer> abcIdToPmtId = new HashMap<>();
paymentController.findPaymentsByIds(pmtIds).forEach(p ->
abcIdToPmtId.computeIfAbsent(
p.getAbcId(),
k -> new ArrayList<>())
.add(p.getPaymentId()));

Flatten a Map<Integer, List<String>> to Map<String, Integer> with stream and lambda

I would like to flatten a Map which associates an Integer key to a list of String, without losing the key mapping.
I am curious as though it is possible and useful to do so with stream and lambda.
We start with something like this:
Map<Integer, List<String>> mapFrom = new HashMap<>();
Let's assume that mapFrom is populated somewhere, and looks like:
1: a,b,c
2: d,e,f
etc.
Let's also assume that the values in the lists are unique.
Now, I want to "unfold" it to get a second map like:
a: 1
b: 1
c: 1
d: 2
e: 2
f: 2
etc.
I could do it like this (or very similarly, using foreach):
Map<String, Integer> mapTo = new HashMap<>();
for (Map.Entry<Integer, List<String>> entry: mapFrom.entrySet()) {
for (String s: entry.getValue()) {
mapTo.put(s, entry.getKey());
}
}
Now let's assume that I want to use lambda instead of nested for loops. I would probably do something like this:
Map<String, Integer> mapTo = mapFrom.entrySet().stream().map(e -> {
e.getValue().stream().?
// Here I can iterate on each List,
// but my best try would only give me a flat map for each key,
// that I wouldn't know how to flatten.
}).collect(Collectors.toMap(/*A String value*/,/*An Integer key*/))
I also gave a try to flatMap, but I don't think that it is the right way to go, because although it helps me get rid of the dimensionality issue, I lose the key in the process.
In a nutshell, my two questions are :
Is it possible to use streams and lambda to achieve this?
Is is useful (performance, readability) to do so?
You need to use flatMap to flatten the values into a new stream, but since you still need the original keys for collecting into a Map, you have to map to a temporary object holding key and value, e.g.
Map<String, Integer> mapTo = mapFrom.entrySet().stream()
.flatMap(e->e.getValue().stream()
.map(v->new AbstractMap.SimpleImmutableEntry<>(e.getKey(), v)))
.collect(Collectors.toMap(Map.Entry::getValue, Map.Entry::getKey));
The Map.Entry is a stand-in for the nonexistent tuple type, any other type capable of holding two objects of different type is sufficient.
An alternative not requiring these temporary objects, is a custom collector:
Map<String, Integer> mapTo = mapFrom.entrySet().stream().collect(
HashMap::new, (m,e)->e.getValue().forEach(v->m.put(v, e.getKey())), Map::putAll);
This differs from toMap in overwriting duplicate keys silently, whereas toMap without a merger function will throw an exception, if there is a duplicate key. Basically, this custom collector is a parallel capable variant of
Map<String, Integer> mapTo = new HashMap<>();
mapFrom.forEach((k, l) -> l.forEach(v -> mapTo.put(v, k)));
But note that this task wouldn’t benefit from parallel processing, even with a very large input map. Only if there were additional computational intense task within the stream pipeline that could benefit from SMP, there was a chance of getting a benefit from parallel streams. So perhaps, the concise, sequential Collection API solution is preferable.
You should use flatMap as follows:
entrySet.stream()
.flatMap(e -> e.getValue().stream()
.map(s -> new SimpleImmutableEntry(e.getKey(), s)));
SimpleImmutableEntry is a nested class in AbstractMap.
Hope this would do it in simplest way. :))
mapFrom.forEach((key, values) -> values.forEach(value -> mapTo.put(value, key)));
This should work. Please notice that you lost some keys from List.
Map<Integer, List<String>> mapFrom = new HashMap<>();
Map<String, Integer> mapTo = mapFrom.entrySet().stream()
.flatMap(integerListEntry -> integerListEntry.getValue()
.stream()
.map(listItem -> new AbstractMap.SimpleEntry<>(listItem, integerListEntry.getKey())))
.collect(Collectors.toMap(AbstractMap.SimpleEntry::getKey, AbstractMap.SimpleEntry::getValue));
Same as the previous answers with Java 9:
Map<String, Integer> mapTo = mapFrom.entrySet()
.stream()
.flatMap(entry -> entry.getValue()
.stream()
.map(s -> Map.entry(s, entry.getKey())))
.collect(toMap(Entry::getKey, Entry::getValue));

Categories