Java 8 Streams: Collapse/abstract streams parts - java

Say I have this Stream:
list.stream()
.map(fn1) // part1
.map(fn2) //
.filter(fn3) //
.flatMap(fn4) // part 2
.map(fn5) //
.filter(fn6) //
.map(fn7) //
.collect(Collectors.toList())
How can I make it look like:
list.stream()
.map(fnPart1)
.map(fnPart2)
.collect(Collectors.toList())
Without manually unwinding the fnX parts and putting them together (for maintenance reasons, I want to keep them untouched, and express the fnPartX with them).

You could express and compose it with functions:
Function<Stream<T1>, Stream<T2>> fnPart1 =
s -> s.map(fn1)
.map(fn2)
.filter(fn3);
Function<Stream<T2>, Stream<T3>> fnPart2 =
s -> s.flatMap(fn4)
.map(fn5)
.filter(fn6)
.map(fn7);
fnPart1.andThen(fnPart2).apply(list.stream()).collect(Collectors.toList());
The input and output types of the functions have to match accordingly.
This can be the basis for a more complex composition construct such as:
public class Composer<T>{
private final T element;
private Composer(T element){
this.element = element;
}
public <T2> Composer<T2> andThen(Function<? super T, ? extends T2> f){
return new Composer<>(f.apply(element));
}
public T get(){
return element;
}
public static <T> Composer<T> of(T element){
return new Composer<T>(element);
}
}
This can be used like this:
Composer.of(list.stream())
.andThen(fnPart1)
.andThen(fnPart2)
.get()
.collect(Collectors.toList());

You have to use flatMap not map. I don't know what your types are so I've called them T1, T2, etc.
list.stream()
.flatMap(fnPart1)
.flatMap(fnPart2)
.collect(Collectors.toList())
Stream<T2> fnPart1(T1 t1) {
return Stream.of(t1).map(fn1).map(fn2).filter(fn3);
}
Stream<T3> fnPart2(T2 t2) {
return Stream.of(t2).flatMap(fn4).map(fn5).filter(fn6).map(fn7);
}
Of course you could remove some of the stream operations:
Stream<T2> fnPart1(T1 t1) {
return Stream.of(fn2(fn1(t1))).filter(fn3);
}
Stream<T3> fnPart2(T2 t2) {
return fn4(t2).map(fn5).filter(fn6).map(fn7);
}
Further simplification is possible since fnPart1 and fnPart2 are just dealing with single elements.

Related

How to avoid multiple Streams with Java 8

I am having the below code
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Company"))
.findFirst()
.ifPresent(id -> {
domainResp.setId(id.getId());
});
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Private"))
.findFirst()
.ifPresent(id ->
domainResp.setPrivateId(id.getId())
);
Here I'm iterating/streaming the list of Id objects 2 times.
The only difference between the two streams is in the filter() operation.
How to achieve it in single iteration, and what is the best approach (in terms of time and space complexity) to do this?
You can achieve that with Stream IPA in one pass though the given set of data and without increasing memory consumption (i.e. the result will contain only ids having required attributes).
For that, you can create a custom Collector that will expect as its parameters a Collection attributes to look for and a Function responsible for extracting the attribute from the stream element.
That's how this generic collector could be implemented.
/** *
* #param <T> - the type of stream elements
* #param <F> - the type of the key (a field of the stream element)
*/
class CollectByKey<T, F> implements Collector<T, Map<F, T>, Map<F, T>> {
private final Set<F> keys;
private final Function<T, F> keyExtractor;
public CollectByKey(Collection<F> keys, Function<T, F> keyExtractor) {
this.keys = new HashSet<>(keys);
this.keyExtractor = keyExtractor;
}
#Override
public Supplier<Map<F, T>> supplier() {
return HashMap::new;
}
#Override
public BiConsumer<Map<F, T>, T> accumulator() {
return this::tryAdd;
}
private void tryAdd(Map<F, T> map, T item) {
F key = keyExtractor.apply(item);
if (keys.remove(key)) {
map.put(key, item);
}
}
#Override
public BinaryOperator<Map<F, T>> combiner() {
return this::tryCombine;
}
private Map<F, T> tryCombine(Map<F, T> left, Map<F, T> right) {
right.forEach(left::putIfAbsent);
return left;
}
#Override
public Function<Map<F, T>, Map<F, T>> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
}
main() - demo (dummy Id class is not shown)
public class CustomCollectorByGivenAttributes {
public static void main(String[] args) {
List<Id> ids = List.of(new Id(1, "Company"), new Id(2, "Fizz"),
new Id(3, "Private"), new Id(4, "Buzz"));
Map<String, Id> idByType = ids.stream()
.collect(new CollectByKey<>(List.of("Company", "Private"), Id::getType));
idByType.forEach((k, v) -> {
if (k.equalsIgnoreCase("Company")) domainResp.setId(v);
if (k.equalsIgnoreCase("Private")) domainResp.setPrivateId(v);
});
System.out.println(idByType.keySet()); // printing keys - added for demo purposes
}
}
Output
[Company, Private]
Note, after the set of keys becomes empty (i.e. all resulting data has been fetched) the further elements of the stream will get ignored, but still all remained data is required to be processed.
IMO, the two streams solution is the most readable. And it may even be the most efficient solution using streams.
IMO, the best way to avoid multiple streams is to use a classical loop. For example:
// There may be bugs ...
boolean seenCompany = false;
boolean seenPrivate = false;
for (Id id: getIds()) {
if (!seenCompany && id.getType().equalsIgnoreCase("Company")) {
domainResp.setId(id.getId());
seenCompany = true;
} else if (!seenPrivate && id.getType().equalsIgnoreCase("Private")) {
domainResp.setPrivateId(id.getId());
seenPrivate = true;
}
if (seenCompany && seenPrivate) {
break;
}
}
It is unclear whether that is more efficient to performing one iteration or two iterations. It will depend on the class returned by getIds() and the code of iteration.
The complicated stuff with two flags is how you replicate the short circuiting behavior of findFirst() in your 2 stream solution. I don't know if it is possible to do that at all using one stream. If you can, it will involve something pretty cunning code.
But as you can see your original solution with 2 stream is clearly easier to understand than the above.
The main point of using streams is to make your code simpler. It is not about efficiency. When you try to do complicated things to make the streams more efficient, you are probably defeating the (true) purpose of using streams in the first place.
For your list of ids, you could just use a map, then assign them after retrieving, if present.
Map<String, Integer> seen = new HashMap<>();
for (Id id : ids) {
if (seen.size() == 2) {
break;
}
seen.computeIfAbsent(id.getType().toLowerCase(), v->id.getId());
}
If you want to test it, you can use the following:
record Id(String getType, int getId) {
#Override
public String toString() {
return String.format("[%s,%s]", getType, getId);
}
}
Random r = new Random();
List<Id> ids = r.ints(20, 1, 100)
.mapToObj(id -> new Id(
r.nextBoolean() ? "Company" : "Private", id))
.toList();
Edited to allow only certain types to be checked
If you have more than two types but only want to check on certain ones, you can do it as follows.
the process is the same except you have a Set of allowed types.
You simply check to see that your are processing one of those types by using contains.
Map<String, Integer> seen = new HashMap<>();
Set<String> allowedTypes = Set.of("company", "private");
for (Id id : ids) {
String type = id.getType();
if (allowedTypes.contains(type.toLowerCase())) {
if (seen.size() == allowedTypes.size()) {
break;
}
seen.computeIfAbsent(type,
v -> id.getId());
}
}
Testing is similar except that additional types need to be included.
create a list of some types that could be present.
and build a list of them as before.
notice that the size of allowed types replaces the value 2 to permit more than two types to be checked before exiting the loop.
List<String> possibleTypes =
List.of("Company", "Type1", "Private", "Type2");
Random r = new Random();
List<Id> ids =
r.ints(30, 1, 100)
.mapToObj(id -> new Id(possibleTypes.get(
r.nextInt((possibleTypes.size()))),
id))
.toList();
You can group by type and check the resulting map.
I suppose the type of ids is IdType.
Map<String, List<IdType>> map = trainResponse.getIds()
.stream()
.collect(Collectors.groupingBy(
id -> id.getType().toLowerCase()));
Optional.ofNullable(map.get("company")).ifPresent(ids -> domainResp.setId(ids.get(0).getId()));
Optional.ofNullable(map.get("private")).ifPresent(ids -> domainResp.setPrivateId(ids.get(0).getId()));
I'd recommend a traditionnal for loop. In addition of being easily scalable, this prevents you from traversing the collection multiple times.
Your code looks like something that'll be generalised in the future, thus my generic approch.
Here's some pseudo code (with errors, just for the sake of illustration)
Set<String> matches = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
for(id : trainResponse.getIds()) {
if (! matches.add(id.getType())) {
continue;
}
switch (id.getType().toLowerCase()) {
case "company":
domainResp.setId(id.getId());
break;
case "private":
...
}
}
Something along these lines can might work, it would go through the whole stream though, and won't stop at the first occurrence.
But assuming a small stream and only one Id for each type, why not?
Map<String, Consumer<String>> setters = new HashMap<>();
setters.put("Company", domainResp::setId);
setters.put("Private", domainResp::setPrivateId);
trainResponse.getIds().forEach(id -> {
if (setters.containsKey(id.getType())) {
setters.get(id.getType()).accept(id.getId());
}
});
We can use the Collectors.filtering from Java 9 onwards to collect the values based on condition.
For this scenario, I have changed code like below
final Map<String, String> results = trainResponse.getIds()
.stream()
.collect(Collectors.filtering(
id -> id.getType().equals("Company") || id.getIdContext().equals("Private"),
Collectors.toMap(Id::getType, Id::getId, (first, second) -> first)));
And getting the id from results Map.

Sort list by multiple fields(not then compare) in java

Now I have an object:
public class Room{
private long roomId;
private long roomGroupId;
private String roomName;
... getter
... setter
}
I want sort list of rooms by 'roomId', but in the meantime while room objects has 'roomGroupId' greator than zero and has same value then make them close to each other.
Let me give you some example:
input:
[{"roomId":3,"roomGroupId":0},
{"roomId":6,"roomGroupId":0},
{"roomId":1,"roomGroupId":1},
{"roomId":2,"roomGroupId":0},
{"roomId":4,"roomGroupId":1}]
output:
[{"roomId":6,"roomGroupId":0},
{"roomId":4,"roomGroupId":1},
{"roomId":1,"roomGroupId":1},
{"roomId":3,"roomGroupId":0},
{"roomId":2,"roomGroupId":0}]
As shown above, the list sort by 'roomId', but 'roomId 4' and 'roomId 1' are close together, because they has the same roomGroupId.
This does not have easy nice solution (maybe I am wrong).
You can do this like this
TreeMap<Long, List<Room>> roomMap = new TreeMap<>();
rooms.stream()
.collect(Collectors.groupingBy(Room::getRoomGroupId))
.forEach((key, value) -> {
if (key.equals(0L)) {
value.forEach(room -> roomMap.put(room.getRoomId(), Arrays.asList(room)));
} else {
roomMap.put(
Collections.max(value, Comparator.comparing(Room::getRoomId))
.getRoomId(),
value
.stream()
.sorted(Comparator.comparing(Room::getRoomId)
.reversed())
.collect(Collectors.toList())
);
}
});
List<Room> result = roomMap.descendingMap()
.entrySet()
.stream()
.flatMap(entry -> entry.getValue()
.stream())
.collect(Collectors.toList());
If you're in Java 8, you can use code like this
Collections.sort(roomList, Comparator.comparing(Room::getRoomGroupId)
.thenComparing(Room::getRoomId));
If not, you should use a comparator
class SortRoom implements Comparator<Room>
{
public int compare(Room a, Room b)
{
if (a.getRoomGroupId().compareTo(b.getRoomGroupId()) == 0) {
return a.getRoomId().compareTo(b.getRoomId());
}
return a.getRoomGroupId().compareTo(b.getRoomGroupId();
}
}
and then use it like this
Collections.sort(roomList, new SortRoom());

Filter values from a list based on priority

I have a list of valid values for a type:
Set<String> validTypes = ImmutableSet.of("TypeA", "TypeB", "TypeC");
From a given list I want to extract the first value which has a valid type. In this scenario I would write something of this sort:
public class A{
private String type;
private String member;
}
List<A> classAList;
classAList.stream()
.filter(a -> validTypes.contains(a.getType()))
.findFirst();
However I would like to give preference to TypeA, i.e. if classAList has TypeA and TypeB, I want the object which has typeA. To do this one approach I've is:
Set<String> preferredValidTypes = ImmutableSet.of("TypeA");
classAList.stream()
.filter(a -> preferredValidTypes.contains(a.getType()))
.findFirst()
.orElseGet(() -> {
return classAList.stream()
.filter(a -> validTypes.contains(a.getType()))
.findFirst();
}
Is there a better approach?
filter list by type, order by type, collect to list, then just get first element
List<A> collect = classAList.stream()
.filter(a -> validTypes.contains(a.getType()))
.sorted(Comparator.comparing(A::getType))
.collect(Collectors.toList());
System.out.println(collect.get(0));
You can use a custom comparator like:
Comparator<A> comparator = (o1, o2) -> {
if (preferredValidTypes.contains(o1.getType()) && !preferredValidTypes.contains(o2.getType())) {
return 1;
} else if (!preferredValidTypes.contains(o1.getType()) && preferredValidTypes.contains(o2.getType())) {
return -1;
} else {
return 0;
}
};
to sort the list and then findFirst from that list with your conditiion.
i don't like the answers already given which use Comparator. Sorting is an expensive operation. You can do it with one walk through the list. Once you find a preferred value, you can break out, otherwise you continue to the end to find a valid.
In this case anyMatch can provide the possibility to break out from the stream processing:
MyVerifier verifier=new MyVerifier(validTypes,preferredValidTypes);
classAList.stream()
.anyMatch(verifier);
System.out.println("Preferred found:"+verifier.preferred);
System.out.println("Valid found:"+verifier.valid);
public static class MyVerifier implements Predicate<A> {
private Set<String> validTypes;
private Set<String> preferredValidTypes;
A preferred=null;
A valid=null;
public MyVerifier(Set<String> validTypes, Set<String> preferredValidTypes) {
super();
this.validTypes = validTypes;
this.preferredValidTypes = preferredValidTypes;
}
#Override
public boolean test(A a) {
if(preferred==null && preferredValidTypes.contains(a.getType())) {
preferred=a;
// we can stop because we found the first preferred
return true;
} else if(valid==null && validTypes.contains(a.getType())) {
valid=a;
}
return false;
}
}
One can, of course, define two lists, one with all valid types, and one with the preferred types.
However, here is another approach. Define one list, or actually, a Map, with the keys being the valid types, and the boolean values being whether the type is preferred.
Map<String, Boolean> validTypes = ImmutableMap.of(
"TypeA", false,
"TypeB", false,
"TypeC", true
);
Using AtomicReference
One option is the following:
AtomicReference<A> ref = new AtomicReference<>();
listOfAs.stream()
.filter(t -> validTypes.containsKey(t.getType()))
.anyMatch(t -> validTypes.get(ref.updateAndGet(u -> t).getType()));
AtomicReference now contains a preferred A if available, or another valid A, or if the stream is empty, then it contains null. This stream operation short-circuits if an A with a preferred type is found.
The drawback of this option is that it creates side-effects, which is discouraged.
Using distinct()
Another suggestion would be the following. It uses the same map structure, using a boolean to indicate which values are preferred. However, it does not create side effects.
Map<Boolean, A> map = listOfAs.stream()
.filter(t -> validTypes.containsKey(t.getType()))
.map(t -> new Carrier<>(validTypes.get(t.getType()), t))
.distinct()
.limit(2)
.collect(Collectors.toMap(Carrier::getKey, Carrier::getValue));
It works as follows.
filter discards any element that is not a valid type.
Then, each element is mapped to a Carrier<Boolean, A> instance. A Carrier is a Map.Entry<K, V> which implements its equals and hashCode methods regarding only the key; the value does not matter. This is necessary for the following step,
distinct(), which discards any duplicate element. This way, only one preferred type and only one valid type is found.
We limit the stream to have 2 elements, one for each boolean. This is because the stream, which is lazy, stops evaluating after both booleans are found.
At last, we collect the Carrier elements into a Map.
The map contains now the following elements:
Boolean.TRUE => A with a preferred type
Boolean.FALSE => A with a valid type
Retrieve the appropriate element using
A a = map.getOrDefault(true, map.get(false)); // null if not found
Well you have to take care into account that sorting is stable, that is equal elements will appear in the same order as the initial source - and you need that to correctly get the first element from that List<A> that will satisfy your requirement, thus:
String priorityType = "TypeB";
Stream.of(new A("TypeA", "A"),
new A("TypeB", "B"),
new A("TypeC", "C"))
.sorted(Comparator.comparing(A::getType, Comparator.comparing(priorityType::equals)).reversed())
.filter(x -> validTypes.contains(priorityType))
.findFirst()
.orElseThrow(RuntimeException::new);
In Java8 you can use streams:
public static Carnet findByCodeIsIn(Collection<Carnet> listCarnet, String codeIsIn) {
return listCarnet.stream().filter(carnet -> codeIsIn.equals(carnet.getCodeIsin())).findFirst().orElse(null);
}
Additionally, in case you have many different objects (not only Carnet) or you want to find it by different properties (not only by cideIsin), you could build an utility class, to ecapsulate this logic in it:
public final class FindUtils {
public static <T> T findByProperty(Collection<T> col, Predicate<T> filter) {
return col.stream().filter(filter).findFirst().orElse(null);
}
}
public final class CarnetUtils {
public static Carnet findByCodeTitre(Collection<Carnet> listCarnet, String codeTitre) {
return FindUtils.findByProperty(listCarnet, carnet -> codeTitre.equals(carnet.getCodeTitre()));
}
public static Carnet findByNomTitre(Collection<Carnet> listCarnet, String nomTitre) {
return FindUtils.findByProperty(listCarnet, carnet -> nomTitre.equals(carnet.getNomTitre()));
}
public static Carnet findByCodeIsIn(Collection<Carnet> listCarnet, String codeIsin) {
return FindUtils.findByProperty(listCarnet, carnet -> codeIsin.equals(carnet.getCodeIsin()));
}
}
If you have preferred valid types in other collection so you can follow this code.
Map<String,A> groupByType = classAList
.stream()
/* additional filter to grouping by valid types.*/
//.filter(a->validTypes.contains(a.getType()))
.collect(Collectors.toMap(A::getType, Function.identity(),(v1, v2)->v1));
then use:
A result = preferredValidTypes
.stream()
.map(groupByType::get)
.findFirst()
.orElseThrow(RuntimeException::new);
or just group by preferred valid types
A result2 = classAList
.stream()
.filter(a -> preferredValidTypes.contains(a.getType()))
.collect(Collectors.toMap(A::getType, Function.identity(), (v1, v2) -> v1))
.entrySet()
.stream()
.findFirst()
.map(Map.Entry::getValue)
.orElseThrow(RuntimeException::new);

Java - How to filter items from a list so that only one per element attribute is present? [duplicate]

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?
For example I have a list of Person object and I want to remove people with the same name,
persons.stream().distinct();
Will use the default equality check for a Person object, so I need something like,
persons.stream().distinct(p -> p.getName());
Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?
Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Then you can write:
persons.stream().filter(distinctByKey(Person::getName))
Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.
(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)
An alternative would be to place the persons in a map using the name as a key:
persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
Note that the Person that is kept, in case of a duplicate name, will be the first encontered.
You can wrap the person objects into another class, that only compares the names of the persons. Afterward, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:
persons.stream()
.map(Wrapper::new)
.distinct()
.map(Wrapper::unwrap)
...;
The class Wrapper might look as follows:
class Wrapper {
private final Person person;
public Wrapper(Person person) {
this.person = person;
}
public Person unwrap() {
return person;
}
public boolean equals(Object other) {
if (other instanceof Wrapper) {
return ((Wrapper) other).person.getName().equals(person.getName());
} else {
return false;
}
}
public int hashCode() {
return person.getName().hashCode();
}
}
Another solution, using Set. May not be the ideal solution, but it works
Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());
Or if you can modify the original list, you can use removeIf method
persons.removeIf(p -> !set.add(p.getName()));
There's a simpler approach using a TreeSet with a custom comparator.
persons.stream()
.collect(Collectors.toCollection(
() -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName()))
));
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
You can use groupingBy collector:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));
If you want to have another stream you can use this:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<Person> persons = ...;
MutableList<Person> distinct =
ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));
If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<Person> persons = ...;
MutableList<Person> distinct =
persons.distinct(HashingStrategies.fromFunction(Person::getName));
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Similar approach which Saeed Zarinfam used but more Java 8 style:)
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
.map(plans -> plans.stream().findFirst().get())
.collect(toList());
You can use StreamEx library:
StreamEx.of(persons)
.distinct(Person::getName)
.toList()
I recommend using Vavr, if you can. With this library you can do the following:
io.vavr.collection.List.ofAll(persons)
.distinctBy(Person::getName)
.toJavaSet() // or any another Java 8 Collection
Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
final Set<Object> seen = new HashSet<>();
return t -> seen.add(keyExtractor.apply(t));
}
Then call:
persons.stream().filter(distinctByKey(p -> p.getName());
My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.
List<YourPersonClass> listWithDistinctPersons = persons.stream()
//operators to remove duplicates based on person name
.collect(Collectors.groupingBy(p -> p.getName()))
.values()
.stream()
//cut short the groups to size of 1
.flatMap(group -> group.stream().limit(1))
//collect distinct users as list
.collect(Collectors.toList());
Distinct objects list can be found using:
List distinctPersons = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
ArrayList::new));
I made a generic version:
private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
return Collectors.collectingAndThen(
toMap(
keyExtractor,
t -> t,
(t1, t2) -> t1
),
(Map<R, T> map) -> map.values().stream()
);
}
An exemple:
Stream.of(new Person("Jean"),
new Person("Jean"),
new Person("Paul")
)
.filter(...)
.collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
.map(...)
.collect(toList())
Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:
Seq.seq(persons).distinct(Person::getName).toList();
Under the hood, it does practically the same thing as the accepted answer, though.
Set<YourPropertyType> set = new HashSet<>();
list
.stream()
.filter(it -> set.add(it.getYourProperty()))
.forEach(it -> ...);
While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set".
Things get even worse if list is above size of 10.
Consider you have a collection of 20 Objects, like this:
public static final List<SimpleEvent> testList = Arrays.asList(
new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
Where you object SimpleEvent looks like this:
public class SimpleEvent {
private String name;
private String type;
public SimpleEvent(String name) {
this.name = name;
this.type = "type_"+name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
}
And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = testList
.stream()
.filter(distinctByKey(SimpleEvent::getName))
.map(SimpleEvent::getName)
.collect(Collectors.toSet());
blackhole.consume(uniqueNames);
}
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = new HashSet<>();
for (SimpleEvent event : testList) {
uniqueNames.add(event.getName());
}
blackhole.consume(uniqueNames);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.mode(Mode.Throughput)
.warmupBatchSize(3)
.warmupIterations(3)
.measurementIterations(3)
.build();
new Runner(opt).run();
}
Then you'll have Benchmark results like this:
Benchmark Mode Samples Score Score error Units
c.s.MyBenchmark.aForEachBasedUniqueSet thrpt 3 2635199.952 1663320.718 ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet thrpt 3 729134.695 895825.697 ops/s
And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.
Higher the throughput, better the performance
I would like to improve Stuart Marks answer. What if the key is null, it will through NullPointerException. Here I ignore the null key by adding one more check as keyExtractor.apply(t)!=null.
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));
}
This works like a charm:
Grouping the data by unique key to form a map.
Returning the first object from every value of the map (There could be multiple person having same name).
persons.stream()
.collect(groupingBy(Person::getName))
.values()
.stream()
.flatMap(values -> values.stream().limit(1))
.collect(toList());
The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:
Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
Person previous;
public boolean test(Person p) {
if(previous!=null && c.compare(previous, p)==0)
return false;
previous=p;
return true;
}
})./* more stream operations here */;
Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.
There are lot of approaches, this one will also help - Simple, Clean and Clear
List<Employee> employees = new ArrayList<>();
employees.add(new Employee(11, "Ravi"));
employees.add(new Employee(12, "Stalin"));
employees.add(new Employee(23, "Anbu"));
employees.add(new Employee(24, "Yuvaraj"));
employees.add(new Employee(35, "Sena"));
employees.add(new Employee(36, "Antony"));
employees.add(new Employee(47, "Sena"));
employees.add(new Employee(48, "Ravi"));
List<Employee> empList = new ArrayList<>(employees.stream().collect(
Collectors.toMap(Employee::getName, obj -> obj,
(existingValue, newValue) -> existingValue))
.values());
empList.forEach(System.out::println);
// Collectors.toMap(
// Employee::getName, - key (the value by which you want to eliminate duplicate)
// obj -> obj, - value (entire employee object)
// (existingValue, newValue) -> existingValue) - to avoid illegalstateexception: duplicate key
Output - toString() overloaded
Employee{id=35, name='Sena'}
Employee{id=12, name='Stalin'}
Employee{id=11, name='Ravi'}
Employee{id=24, name='Yuvaraj'}
Employee{id=36, name='Antony'}
Employee{id=23, name='Anbu'}
Here is the example
public class PayRoll {
private int payRollId;
private int id;
private String name;
private String dept;
private int salary;
public PayRoll(int payRollId, int id, String name, String dept, int salary) {
super();
this.payRollId = payRollId;
this.id = id;
this.name = name;
this.dept = dept;
this.salary = salary;
}
}
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;
public class Prac {
public static void main(String[] args) {
int salary=70000;
PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
PayRoll payRoll2=new PayRoll(1411, 2 , "B", "Technical", salary);
PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
List<PayRoll>list=new ArrayList<PayRoll>();
list.add(payRoll);
list.add(payRoll2);
list.add(payRoll3);
list.add(payRoll4);
list.add(payRoll5);
list.add(payRoll6);
Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));
k.entrySet().forEach(p->
{
if(p.getValue().isPresent())
{
System.out.println(p.getValue().get());
}
});
}
}
Output:
PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]
Late to the party but I sometimes use this one-liner as an equivalent:
((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.
Building on #josketres's answer, I created a generic utility method:
You could make this more Java 8-friendly by creating a Collector.
public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
return input.stream()
.collect(toCollection(() -> new TreeSet<>(comparer)));
}
#Test
public void removeDuplicatesWithDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(7), new C(42), new C(42));
Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
assertEquals(2, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 7));
assertTrue(result.stream().anyMatch(c -> c.value == 42));
}
#Test
public void removeDuplicatesWithoutDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(1), new C(2), new C(3));
Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
assertEquals(3, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 1));
assertTrue(result.stream().anyMatch(c -> c.value == 2));
assertTrue(result.stream().anyMatch(c -> c.value == 3));
}
private class C {
public final int value;
private C(int value) {
this.value = value;
}
}
Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). Stream partition answer by Tagir Valeev inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.
public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
(map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
(left, right) -> {
left.putAll(right);
return left;
}, map -> new ArrayList<>(map.values()),
Collector.Characteristics.UNORDERED)); }
I had a situation, where I was suppose to get distinct elements from list based on 2 keys.
If you want distinct based on two keys or may composite key, try this
class Person{
int rollno;
String name;
}
List<Person> personList;
Function<Person, List<Object>> compositeKey = personList->
Arrays.<Object>asList(personList.getName(), personList.getRollno());
Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));
List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
.filter(settingMap ->
settingMap.getValue().size() > 1)
.collect(Collectors.toList());
A variation of the top answer that handles null:
public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
}
In my tests:
assertEquals(
asList("a", "bb"),
Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));
assertEquals(
asList(5, null, 2, 3),
Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));
val maps = asList(
hashMapWith(0, 2),
hashMapWith(1, 2),
hashMapWith(2, null),
hashMapWith(3, 1),
hashMapWith(4, null),
hashMapWith(5, 2));
assertEquals(
asList(0, 2, 3),
maps.stream()
.filter(distinctBy(m -> m.get("val")))
.map(m -> m.get("i"))
.collect(toList()));
In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.
public List<Log> fetchLogById(Long id) {
return this.findLogById(id).stream()
.filter(new LogPredicate())
.collect(Collectors.toList());
}
public class LogPredicate implements Predicate<Log> {
private Log previous;
public boolean test(Log atual) {
boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
if (isDifferent) {
previous = current;
}
return isDifferent;
}
private boolean verifyIfDifferentLog(Log current, Log previous) {
return !current.getId().equals(previous.getId());
}
}
My solution in this listing:
List<HolderEntry> result ....
List<HolderEntry> dto3s = new ArrayList<>(result.stream().collect(toMap(
HolderEntry::getId,
holder -> holder, //or Function.identity() if you want
(holder1, holder2) -> holder1
)).values());
In my situation i want to find distinct values and put their in List.

How to efficiently compute the maximum value of a collection after applying some function

Suppose you have a method like this that computes the maximum of a Collection for some ToIntFunction:
static <T> void foo1(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty())
throw new NoSuchElementException();
int max = Integer.MIN_VALUE;
T maxT = null;
for (T t : collection) {
int result = function.applyAsInt(t);
if (result >= max) {
max = result;
maxT = t;
}
}
// do something with maxT
}
With Java 8, this could be translated into
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = collection.stream()
.max(Comparator.comparingInt(function))
.get();
// do something with maxT
}
A disadvantage with the new version is that function.applyAsInt is invoked repeatedly for the same value of T. (Specifically if the collection has size n, foo1 invokes applyAsInt n times whereas foo2 invokes it 2n - 2 times).
Disadvantages of the first approach are that the code is less clear and you can't modify it to use parallelism.
Suppose you wanted to do this using parallel streams and only invoke applyAsInt once per element. Can this be written in a simple way?
You can use a custom collector that keeps running pair of the maximum value and the maximum element:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
class Pair {
int max = Integer.MIN_VALUE;
T maxT = null;
}
T maxT = collection.stream().collect(Collector.of(
Pair::new,
(p, t) -> {
int result = function.applyAsInt(t);
if (result >= p.max) {
p.max = result;
p.maxT = t;
}
},
(p1, p2) -> p2.max > p1.max ? p2 : p1,
p -> p.maxT
));
// do something with maxT
}
One advantage is that this creates a single Pair intermediate object that is used through-out the collecting process. Each time an element is accepted, this holder is updated with the new maximum. The finisher operation just returns the maximum element and disgards the maximum value.
As I stated in the comments I would suggest introducing an intermediate datastructure like:
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty()) {
throw new IllegalArgumentException();
}
class Pair {
final T value;
final int result;
public Pair(T value, int result) {
this.value = value;
this.result = result;
}
public T getValue() {
return value;
}
public int getResult() {
return result;
}
}
T maxT = collection.stream().map(t -> new Pair(t, function.applyAsInt(t)))
.max(Comparator.comparingInt(Pair::getResult)).get().getValue();
// do something with maxT
}
Another way would be to use a memoized version of function:
static <T> void foo2(Collection<? extends T> collection,
ToIntFunction<? super T> function, T defaultValue) {
T maxT = collection.parallelStream()
.max(Comparator.comparingInt(ToIntMemoizer.memoize(function)))
.orElse(defaultValue);
// do something with maxT
}
Where ToIntMemoizer.memoize(function) code would be as follows:
public class ToIntMemoizer<T> {
private final Map<T, Integer> cache = new ConcurrentHashMap<>();
private ToIntMemoizer() {
}
private ToIntFunction<T> doMemoize(ToIntFunction<T> function) {
return input -> cache.computeIfAbsent(input, function::apply);
}
public static <T> ToIntFunction<T> memoize(ToIntFunction<T> function) {
return new ToIntMemoizer<T>().doMemoize(function);
}
}
This uses a ConcurrentHashMap to cache already computed results. If you don't need to support parallelism, you can perfectly use a HashMap.
One disadvantage is that the result of the function needs to be boxed/unboxed. On the other hand, as the function is memoized, a result will be computed only once for each repeated element of the collection. Then, if the function is invoked with a repeated input value, the result will be returned from the cache.
If you don't mind using third-party library, my StreamEx optimizes all these cases in special methods like maxByInt and so on. So you can simply use:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = StreamEx.of(collection).parallel()
.maxByInt(function)
.get();
// do something with maxT
}
The implementation uses reduce with mutable container. This probably abuses API a little, but works fine for sequential and parallel streams and unlike collect solution defers the container allocation to the first accumulated element (thus no container is allocated if parallel subtask covers no elements which occurs quite often if you have the filtering operation upstream).

Categories