Java 8 Streams reduce remove duplicates keeping the most recent entry - java

I have a Java bean, like
class EmployeeContract {
Long id;
Date date;
getter/setter
}
If a have a long list of these, in which we have duplicates by id but with different date, such as:
1, 2015/07/07
1, 2018/07/08
2, 2015/07/08
2, 2018/07/09
How can I reduce such a list keeping only the entries with the most recent date, such as:
1, 2018/07/08
2, 2018/07/09
?
Preferably using Java 8...
I've started with something like:
contract.stream()
.collect(Collectors.groupingBy(EmployeeContract::getId, Collectors.mapping(EmployeeContract::getId, Collectors.toList())))
.entrySet().stream().findFirst();
That gives me the mapping within individual groups, but I'm stuck as to how to collect that into a result list - my streams are not too strong I'm afraid...

Well, I am just going to put my comment here in the shape of an answer:
yourList.stream()
.collect(Collectors.toMap(
EmployeeContract::getId,
Function.identity(),
BinaryOperator.maxBy(Comparator.comparing(EmployeeContract::getDate)))
)
.values();
This will give you a Collection instead of a List, if you really care about this.

You can do it in two steps as follows :
List<EmployeeContract> finalContract = contract.stream() // Stream<EmployeeContract>
.collect(Collectors.toMap(EmployeeContract::getId,
EmployeeContract::getDate, (a, b) -> a.after(b) ? a : b)) // Map<Long, Date> (Step 1)
.entrySet().stream() // Stream<Entry<Long, Date>>
.map(a -> new EmployeeContract(a.getKey(), a.getValue())) // Stream<EmployeeContract>
.collect(Collectors.toList()); // Step 2
First step: ensures the comparison of dates with the most recent one mapped to an id.
Second step: maps these key, value pairs to a final List<EmployeeContract> as a result.

Just to complement the existing answers, as you're asking:
how to collect that into a result list
Here are some options:
Wrap the values() into an ArrayList:
List<EmployeeContract> list1 =
new ArrayList<>(list.stream()
.collect(toMap(EmployeeContract::getId,
identity(),
maxBy(comparing(EmployeeContract::getDate))))
.values());
Wrap the toMap collector into collectingAndThen:
List<EmployeeContract> list2 =
list.stream()
.collect(collectingAndThen(toMap(EmployeeContract::getId,
identity(),
maxBy(comparing(EmployeeContract::getDate))),
c -> new ArrayList<>(c.values())));
Collect the values to a new List using another stream:
List<EmployeeContract> list3 =
list.stream()
.collect(toMap(EmployeeContract::getId,
identity(),
maxBy(comparing(EmployeeContract::getDate))))
.values()
.stream()
.collect(toList());

With vavr.io you can do it like this:
var finalContract = Stream.ofAll(contract) //create io.vavr.collection.Stream
.groupBy(EmployeeContract::getId)
.map(tuple -> tuple._2.maxBy(EmployeeContract::getDate))
.collect(Collectors.toList()); //result is list from java.util package

Related

Have Java Streams GroupingBy result Map include a key for each value of an enum, even if value is an empty List

This question is about Java Streams' groupingBy capability.
Suppose I have a class, WorldCup:
public class WorldCup {
int year;
Country champion;
// all-arg constructor, getter/setters, etc
}
and an enum, Country:
public enum Country {
Brazil, France, USA
}
and the following code snippet:
WorldCup wc94 = new WorldCup(1994, Country.Brazil);
WorldCup wc98 = new WorldCup(1998, Country.France);
List<WorldCup> wcList = new ArrayList<WorldCup>();
wcList.add(wc94);
wcList.add(wc98);
Map<Country, List<Integer>> championsMap = wcList.stream()
.collect(Collectors.groupingBy(WorldCup::getCountry, Collectors.mapping(WorldCup::getYear));
After running this code, championsMap will contain:
Brazil: [1994]
France: [1998]
Is there a succinct way to have this list include an entry for all of the values of the enum? What I'm looking for is:
Brazil: [1994]
France: [1998]
USA: []
There are several approaches you can take.
The map which would be used for accumulating the stream data can be prepopulated with entries corresponding to every enum-member. To access all existing enum-members you can use values() method or EnumSet.allOf().
It can be achieved using three-args version of collect() or through a custom collector created via Collector.of().
Map<Country, List<Integer>> championsMap = wcList.stream()
.collect(
() -> EnumSet.allOf(Country.class).stream() // supplier
.collect(Collectors.toMap(
Function.identity(),
c -> new ArrayList<>()
)),
(Map<Country, List<Integer>> map, WorldCup next) -> // accumulator
map.get(next.getCountry()).add(next.getYear()),
(left, right) -> // combiner
right.forEach((k, v) -> left.get(k).addAll(v))
);
Another option is to add missing entries to the map after reduction of the stream has been finished.
For that we can use built-in collector collectingAndThen().
Map<Country, List<Integer>> championsMap = wcList.stream()
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(WorldCup::getCountry,
Collectors.mapping(WorldCup::getYear,
Collectors.toList())),
map -> {
EnumSet.allOf(Country.class)
.forEach(country -> map.computeIfAbsent(country, k -> new ArrayList<>())); // if you're not going to mutate these lists - use Collections.emptyList()
return map;
}
));

Convert Map<Integer, List<Strings> to Map<String, List<Integer>

I'm having a hard time converting a Map containing some integers as keys and a list of random strings as values.
E.g. :
1 = ["a", "b", "c"]
2 = ["a", "b", "z"]
3 = ["z"]
I want to transform it into a Map of distinct strings as keys and lists the integers as values.
E.g. :
a = [1, 2]
b = [1, 2]
c = [1]
z = [2,3]
Here's what I have so far:
Map<Integer, List<String>> integerListMap; // initial list is already populated
List<String> distinctStrings = new ArrayList<>();
SortedMap<String, List<Integer>> stringListSortedMap = new TreeMap<>();
for(Integer i: integers) {
integerListMap.put(i, strings);
distinctStrings.addAll(strings);
}
distinctStrings = distinctStrings.stream().distinct().collect(Collectors.toList());
for(String s : distinctStrings) {
distinctStrings.put(s, ???);
}
Iterate over your source map's value and put each value into the target map.
final Map<String, List<Integer>> target = new HashMap<>();
for (final Map.Entry<Integer, List<String>> entry = source.entrySet()) {
for (final String s : entry.getValue()) {
target.computeIfAbsent(s, k -> new ArrayList<>()).add(entry.getKey());
}
}
Or with the Stream API by abusing Map.Entry to build the inverse:
final Map<String, List<Integer>> target = source.entrySet()
.stream()
.flatMap(e -> e.getValue().stream().map(s -> Map.entry(s, e.getKey()))
.collect(Collectors.groupingBy(e::getKey, Collectors.mapping(e::getValue, Collectors.toList())));
Although this might not be as clear as introducing a new custom type to hold the inverted mapping.
Another alternative would be using a bidirectial map. Many libraries come implementations of these, such as Guava.
There's no need to apply distinct() since you're storing the data into the Map and keys are guaranteed to be unique.
You can flatten the entries of the source map, so that only one string (let's call it name) and a single integer (let's call it number) would correspond to a stream element, and then group the data by string.
To implement this problem using streams, we can utilize flatMap() operation to perform one-to-many transformation. And it's a good practice to define a custom type for that purpose as a Java 16 record, or a class (you can also use a Map.Entry, but note that approach of using a custom type is more advantages because it allows writing self-documenting code).
In order to collect the data into a TreeMap you can make use of the three-args version of groupingBy() which allows to specify mapFactory.
record NameNumber(String name, Integer number) {}
Map<Integer, List<String>> dataByProvider = Map.of(
1, List.of("a", "b", "c"),
2, List.of("a", "b", "z"),
3, List.of("z")
);
NavigableMap<String, List<Integer>> numbersByName = dataByProvider.entrySet().stream()
.flatMap(entry -> entry.getValue().stream()
.map(name -> new NameNumber(name, entry.getKey()))
)
.collect(Collectors.groupingBy(
NameNumber::name,
TreeMap::new,
Collectors.mapping(NameNumber::number, Collectors.toList())
));
numbersByName.forEach((name, numbers) -> System.out.println(name + " -> " + numbers));
Output:
a -> [2, 1]
b -> [2, 1]
c -> [1]
z -> [3, 2]
Sidenote: while using TreeMap it's more beneficial to use NavigableMap as an abstract type because it allows to access methods like higherKey(), lowerKey(), firstEntry(), lastEntry(), etc. which are declared in the SortedMap interface.

Group and Sort nested Objects using Java Streams

I have a list of DTO objects with the nested list field.
The aim is to group them by id field and merge, and then sort the list using Streams API.
class DTO {
private Long id;
private List<ItemDTO> items;
}
class ItemDTO {
private Long priority;
private Long value;
}
// input
List<DTO> dtoList = List.of(
DTO(1, List.of(ItemDTO(1, 1), ItemDTO(7, 2))),
DTO(2, List.of(ItemDTO(1, 1), ItemDTO(2, 2))),
DTO(1, List.of(ItemDTO(10, 3), ItemDTO(1, 4)))
);
I need to group these nested objects with the same id field and merge all items in descending order by field priority.
The final result for this dtoList will be something like this:
// output
List<DTO> resultList = [
DTO(1, List.of(ItemDTO(10,3), ItemDTO(7,2), ItemDTO(1,1), ItemDTO(1,4)),
DTO(2, List.of(ItemDTO(2,2), ItemDTO(1,1),
];
Can we achieve this with Streams API?
I would start by a simple grouping by to get a map Map<Long,List<DTO>> and stream over the entries of that map and map each to a new DTO. You can extract a method / function to get the ItemDTOs sorted:
import java.util.Comparator;
import java.util.List;
import java.util.function.Function;
import java.util.stream.Collectors;
....
Function<List<DTO>, List<ItemDTO>> func =
list -> list.stream()
.map(DTO::getItems)
.flatMap(List::stream)
.sorted(Comparator.comparing(ItemDTO::getPriority,Comparator.reverseOrder()))
.collect(Collectors.toList());
List<DTO> result =
dtoList.stream()
.collect(Collectors.groupingBy(DTO::getId))
.entrySet().stream()
.map(entry -> new DTO(entry.getKey(), func.apply(entry.getValue())))
//.sorted(Comparator.comparingLong(DTO::getId)) if the resulting list need to be sorted by id
.collect(Collectors.toList());
You can create an intermediate map by grouping the data by id and then transform each entry into a new DTO object.
For that, you can use a combination of built-in collectors groupingBy() and flatMapping() to create an intermediate map.
In order to sort the items mapped each id, flatMapping() is being used in conjunction with collectionAndThen().
public static void main(String[] args) {
// input
List<DTO> dtoList = List.of(
new DTO(1L, List.of(new ItemDTO(1L, 1L), new ItemDTO(7L, 2L))),
new DTO(2L, List.of(new ItemDTO(1L, 1L), new ItemDTO(2L, 2L))),
new DTO(1L, List.of(new ItemDTO(10L, 3L), new ItemDTO(1L, 4L)))
);
List<DTO> result = dtoList.stream()
.collect(Collectors.groupingBy(DTO::getId,
Collectors.collectingAndThen(
Collectors.flatMapping(dto -> dto.getItems().stream(), Collectors.toList()),
(List<ItemDTO> items) -> {
items.sort(Comparator.comparing(ItemDTO::getPriority).reversed());
return items;
})))
.entrySet().stream()
.map(entry -> new DTO(entry.getKey(), entry.getValue()))
.collect(Collectors.toList());
result.forEach(System.out::println);
}
Output
DTO{id = 1, items = [ItemDTO{10, 3}, ItemDTO{7, 2}, ItemDTO{1, 1}, ItemDTO{1, 4}]}
DTO{id = 2, items = [ItemDTO{2, 2}, ItemDTO{1, 1}]}
As #shmosel has pointed out, flatMapping() is one of the boons of Java 9. You may also think of it as a reminder, maybe it's time to move to the modular system provided by Java 9 and other useful features.
The version that is fully compliant with Java 8 will look like this:
List<DTO> result = dtoList.stream()
.collect(Collectors.groupingBy(DTO::getId,
Collectors.collectingAndThen(
Collectors.mapping(DTO::getItems, Collectors.toList()),
(List<List<ItemDTO>> items) ->
items.stream().flatMap(List::stream)
.sorted(Comparator.comparing(ItemDTO::getPriority).reversed())
.collect(Collectors.toList())
)))
.entrySet().stream()
.map(entry -> new DTO(entry.getKey(), entry.getValue()))
.collect(Collectors.toList());
Imo, this is the easiest way. I am presuming you had the appropriate getters defined for you classes.
simply covert to a map keyed on the id.
merge the appropriate lists
and return the values and convert to an ArrayList.
List<DTO> results = new ArrayList<>(dtoList.stream().collect(
Collectors.toMap(DTO::getId, dto -> dto, (a, b) -> {
a.getItems().addAll(b.getItems());
return a;
})).values());
Then simply sort them in place based on your requirements. This takes no more time that doing it in the stream construct but in my opinion is less cluttered.
for (DTO d: results) {
d.getItems().sort(Comparator.comparing(ItemDTO::getPriority)
.reversed());
}
results.forEach(System.out::println);
prints (using a simple toString for the two classes)
DTO[1, [ItemDTO[10, 3], ItemDTO[7, 2], ItemDTO[1, 1], ItemDTO[1, 4]]]
DTO[2, [ItemDTO[2, 2], ItemDTO[1, 1]]]
Note: List.of is immutable so you can't change them. I would use new ArrayList<>(List.of(...)) in your list construct.

obtaining unique number from a list of duplicate integers using java 8 streams

I’m trying to obtain a only duplicated numbers list from a list of integers:
final Set<Integer> setOfNmums = new HashSet<>();
Arrays.asList(5,6,7,7,7,6,2,4,2,4).stream()
.peek(integer -> System.out.println("XX -> " + integer))
.filter(n -> !setOfNmums.add(n))
.peek(System.out::println)
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
The output is 2,4,6,7,7
Expected : 2,4,6,7
I don’t understand how that’s happening.. is this running in parallel? how am I getting two '7'?
The hashset should return false if it exists and that used by the filter?
Yes I can use distinct, but I’m curious to know why would the filter fail.. is it being done in parallel?
Your filter rejects the first occurrence of each element and accepts all subsequent occurrences. Therefore, when an element occurs n times, you’ll add it n-1 times.
Since you want to accept all elements which occur more than once, but only accept them a single time, you could use .filter(n -> !setOfNmums.add(n)) .distinct() or you enhance the set to a map, to be able to accept an element only on its second occurrence.
Map<Integer, Integer> occurrences = new HashMap<>();
List<String> result = Stream.of(5,6,7,7,7,6,2,4,2,4)
.filter(n -> occurrences.merge(n, 1, Integer::sum) == 2)
.map(String::valueOf)
.sorted()
.collect(Collectors.toList());
But generally, using stateful filters with streams is discouraged.
A cleaner solution would be
List<String> result = Stream.of(5,6,7,7,7,6,2,4,2,4)
.collect(Collectors.collectingAndThen(
Collectors.toMap(String::valueOf, x -> true, (a,b) -> false, TreeMap::new),
map -> { map.values().removeIf(b -> b); return new ArrayList<>(map.keySet()); }));
Note that this approach doesn’t count the occurrences but only remembers whether an element is unique or has seen at least a second time. This works by mapping each element to true with the second argument to the toMap collector, x -> true, and resolving multiple occurrences with a merge function of (a,b) -> false. The subsequent map.values().removeIf(b -> b) will remove all unique elements, i.e. those mapped to true.
You can use .distinct() function in your stream check this out.
Since Holger already explained why your solution didn't work, I'll just provide an alternative.
Why not use Collections.frequency(collection, element) together with distinct()?
The solution would be quite simple(i apologize for the formatting, i just copied it from my ide and there doesn't seem to be an autoformat feature in SOF):
List<Integer> numbers = List.of(5, 6, 7, 7, 7, 6, 2, 4, 2, 4);
List<String> onlyDuplicates = numbers.stream()
.filter(n -> Collections.frequency(numbers, n) > 1)
.distinct()
.sorted()
.map(String::valueOf)
.toList();
This simply keeps all elements that occur more than once and then filters out the duplicates before sorting, converting each element to a string and collecting to a list since that seems to be what you want.
if you need a mutable list you can use collect(toCollection(ArrayList::new)) instead of toList()

Java 8 stream.collect( ... groupingBy ( ... mapping( ... reducing ))) reducing BinaryOperator-usage

I played around with a solution using groupingBy, mapping and reducing
to the following question: Elegantly create map with object fields as key/value from object stream in Java 8. Summarized the goal was to get a map with age as key and the hobbies of a person as a Set.
One of the solutions I came up with (not nice, but that's not the point) had a strange behaviour.
With the following list as input:
List<Person> personList = Arrays.asList(
new Person(/* name */ "A", /* age */ 23, /* hobbies */ asList("a")),
new Person("BC", 24, asList("b", "c")),
new Person("D", 23, asList("d")),
new Person("E", 23, asList("e"))
);
and the following solution:
Collector<List<String>, ?, Set<String>> listToSetReducer = Collectors.reducing(new HashSet<>(), HashSet::new, (strings, strings2) -> {
strings.addAll(strings2);
return strings;
});
Map<Integer, Set<String>> map = personList.stream()
.collect(Collectors.groupingBy(o -> o.age,
Collectors.mapping(o -> o.hobbies, listToSetReducer)));
System.out.println("map = " + map);
I got:
map = {23=[a, b, c, d, e], 24=[a, b, c, d, e]}
clearly not what I was expecting. I rather expected this:
map = {23=[a, d, e], 24=[b, c]}
Now if I just replace the order of (strings, strings2) of the binary operator (of the reducing collector) to (strings2, strings) I get the expected result. So what did I miss here?
Did I misinterpret the reducing-collector? Or which documentation piece did I miss that makes it obvious that my usage was not working as expected?
Java version is 1.8.0_121 if that matters.
Reduction should never modify the incoming objects. In your case, you are modifying the incoming HashSet that is supposed to be the identity value and return it, so all groups will have the same HashSet instance as result, containing all values.
What you need is a Mutable Reduction, which can be implemented via Collector.of(…) like it has been already implemented with the prebuilt collectors Collectors.toList(), Collectors.toSet(), etc.
Map<Integer, Set<String>> map = personList.stream()
.collect(Collectors.groupingBy(o -> o.age,
Collector.of(HashSet::new, (s,p) -> s.addAll(p.hobbies), (s1,s2) -> {
s1.addAll(s2);
return s1;
})));
The reason, we need a custom collector at all, is that Java 8 doesn’t have the flatMapping collector, which Java 9 is going to introduce. With that, the solution will look like:
Map<Integer, Set<String>> map = personList.stream()
.collect(Collectors.groupingBy(o -> o.age,
Collectors.flatMapping(p -> p.hobbies.stream(), Collectors.toSet())));

Categories