What's the purpose of partitioningBy - java

For example, if I intend to partition some elements, I could do something like:
Stream.of("I", "Love", "Stack Overflow")
.collect(Collectors.partitioningBy(s -> s.length() > 3))
.forEach((k, v) -> System.out.println(k + " => " + v));
which outputs:
false => [I]
true => [Love, Stack Overflow]
But for me partioningBy is only a subcase of groupingBy. Although the former accepts a Predicate as parameter while the latter a Function, I just see a partition as a normal grouping function.
So the same code does exactly the same thing:
Stream.of("I", "Love", "Stack Overflow")
.collect(Collectors.groupingBy(s -> s.length() > 3))
.forEach((k, v) -> System.out.println(k + " => " + v));
which also results in a Map<Boolean, List<String>>.
So is there any reason I should use partioningBy instead of groupingBy? Thanks

partitioningBy will always return a map with two entries, one for where the predicate is true and one for where it is false.
It is possible that both entries will have empty lists, but they will exist.
That's something that groupingBy will not do, since it only creates entries when they are needed.
At the extreme case, if you send an empty stream to partitioningBy you will still get two entries in the map whereas groupingBy will return an empty map.
EDIT: As mentioned below this behavior is not mentioned in the Java docs, however changing it would take away the added value partitioningBy is currently providing. For Java 9 this is already in the specs.

partitioningBy is slightly more efficient, using a special Map implementation optimized for when the key is just a boolean.
(It might also help to clarify what you mean; partitioningBy helps to effectively get across that there's a boolean condition being used to partition the data.)

partitioningBy method will return a map whose key is always a Boolean value, but in case of groupingBy method, the key can be of any Object type
//groupingBy
Map<Object, List<Person>> list2 = new HashMap<Object, List<Person>>();
list2 = list.stream().collect(Collectors.groupingBy(p->p.getAge()==22));
System.out.println("grouping by age -> " + list2);
//partitioningBy
Map<Boolean, List<Person>> list3 = new HashMap<Boolean, List<Person>>();
list3 = list.stream().collect(Collectors.partitioningBy(p->p.getAge()==22));
System.out.println("partitioning by age -> " + list2);
As you can see, the key for map in case of partitioningBy method is always a Boolean value, but in case of groupingBy method, the key is Object type
Detailed code is as follows:
class Person {
String name;
int age;
Person(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public int getAge() {
return age;
}
public String toString() {
return this.name;
}
}
public class CollectorAndCollectPrac {
public static void main(String[] args) {
Person p1 = new Person("Kosa", 21);
Person p2 = new Person("Saosa", 21);
Person p3 = new Person("Tiuosa", 22);
Person p4 = new Person("Komani", 22);
Person p5 = new Person("Kannin", 25);
Person p6 = new Person("Kannin", 25);
Person p7 = new Person("Tiuosa", 22);
ArrayList<Person> list = new ArrayList<>();
list.add(p1);
list.add(p2);
list.add(p3);
list.add(p4);
list.add(p5);
list.add(p6);
list.add(p7);
// groupingBy
Map<Object, List<Person>> list2 = new HashMap<Object, List<Person>>();
list2 = list.stream().collect(Collectors.groupingBy(p -> p.getAge() == 22));
System.out.println("grouping by age -> " + list2);
// partitioningBy
Map<Boolean, List<Person>> list3 = new HashMap<Boolean, List<Person>>();
list3 = list.stream().collect(Collectors.partitioningBy(p -> p.getAge() == 22));
System.out.println("partitioning by age -> " + list2);
}
}

Another difference between groupingBy and partitioningBy is that the former takes a Function<? super T, ? extends K> and the latter a Predicate<? super T>.
When you pass a method reference or a lambda expression, such as s -> s.length() > 3, they can be used by either of these two methods (the compiler will infer the functional interface type based on the type required by the method you choose).
However, if you have a Predicate<T> instance, you can only pass it to Collectors.partitioningBy(). It won't be accepted by Collectors.groupingBy().
And similarly, if you have a Function<T,Boolean> instance, you can only pass it to Collectors.groupingBy(). It won't be accepted by Collectors.partitioningBy().

As denoted by the other answers, segregating a collection into two groups is useful in some scenarios. As these two partitions would always exist, it would be easier to utilize it further. In JDK, to segregate all the class files and config files, partitioningBy is used.
private static final String SERVICES_PREFIX = "META-INF/services/";
// scan the names of the entries in the JAR file
Map<Boolean, Set<String>> map = jf.versionedStream()
.filter(e -> !e.isDirectory())
.map(JarEntry::getName)
.filter(e -> (e.endsWith(".class") ^ e.startsWith(SERVICES_PREFIX)))
.collect(Collectors.partitioningBy(e -> e.startsWith(SERVICES_PREFIX),
Collectors.toSet()));
Set<String> classFiles = map.get(Boolean.FALSE);
Set<String> configFiles = map.get(Boolean.TRUE);
Code snippet is from jdk.internal.module.ModulePath#deriveModuleDescriptor

Related

Finding duplicated objects by two properties

Considering that I have a list of Person objects like this :
Class Person {
String fullName;
String occupation;
String hobby;
int salary;
}
Using java8 streams, how can I get list of duplicated objects only by fullName and occupation property?
By using java-8 Stream() and Collectors.groupingBy() on firstname and occupation
List<Person> duplicates = list.stream()
.collect(Collectors.groupingBy(p -> p.getFullName() + "-" + p.getOccupation(), Collectors.toList()))
.values()
.stream()
.filter(i -> i.size() > 1)
.flatMap(j -> j.stream())
.collect(Collectors.toList());
I need to find if they were any duplicates in fullName - occupation pair, which has to be unique
Based on this comment it seems that you don't really care about which Person objects were duplicated, just that there were any.
In that case you can use a stateful anyMatch:
Collection<Person> input = new ArrayList<>();
Set<List<String>> seen = new HashSet<>();
boolean hasDupes = input.stream()
.anyMatch(p -> !seen.add(List.of(p.fullName, p.occupation)));
You can use a List as a 'key' for a set which contains the fullName + occupation combinations that you've already seen. If this combination is seen again you immediately return true, otherwise you finish iterating the elements and return false.
I offer solution with O(n) complexity. I offer to use Map to group given list by key (fullName + occupation) and then retrieve duplicates.
public static List<Person> getDuplicates(List<Person> persons, Function<Person, String> classifier) {
Map<String, List<Person>> map = persons.stream()
.collect(Collectors.groupingBy(classifier, Collectors.mapping(Function.identity(), Collectors.toList())));
return map.values().stream()
.filter(personList -> personList.size() > 1)
.flatMap(List::stream)
.collect(Collectors.toList());
}
Client code:
List<Person> persons = Collections.emptyList();
List<Person> duplicates = getDuplicates(persons, person -> person.fullName + ':' + person.occupation);
First implement equals and hashCode in your person class and then use.
List<Person> personList = new ArrayList<>();
Set<Person> duplicates=personList.stream().filter(p -> Collections.frequency(personList, p) ==2)
.collect(Collectors.toSet());
If objects are more than 2 then you use Collections.frequency(personList, p) >1 in filter predicate.

Converting List<MyObject> to Map<String, List<String>> in Java 8 when we have duplicate elements and custom filter criteria

I have an instances of Student class.
class Student {
String name;
String addr;
String type;
public Student(String name, String addr, String type) {
super();
this.name = name;
this.addr = addr;
this.type = type;
}
#Override
public String toString() {
return "Student [name=" + name + ", addr=" + addr + "]";
}
public String getName() {
return name;
}
public String getAddr() {
return addr;
}
}
And I have a code to create a map , where it store the student name as the key and some processed addr values (a List since we have multiple addr values for the same student) as the value.
public class FilterId {
public static String getNum(String s) {
// should do some complex stuff, just for testing
return s.split(" ")[1];
}
public static void main(String[] args) {
List<Student> list = new ArrayList<Student>();
list.add(new Student("a", "test 1", "type 1"));
list.add(new Student("a", "test 1", "type 2"));
list.add(new Student("b", "test 1", "type 1"));
list.add(new Student("c", "test 1", "type 1"));
list.add(new Student("b", "test 1", "type 1"));
list.add(new Student("a", "test 1", "type 1"));
list.add(new Student("c", "test 3", "type 2"));
list.add(new Student("a", "test 2", "type 1"));
list.add(new Student("b", "test 2", "type 1"));
list.add(new Student("a", "test 3", "type 1"));
Map<String, List<String>> map = new HashMap<>();
// This will create a Map with Student names (distinct) and the test numbers (distinct List of tests numbers) associated with them.
for (Student student : list) {
if (map.containsKey(student.getName())) {
List<String> numList = map.get(student.getName());
String value = getNum(student.getAddr());
if (!numList.contains(value)) {
numList.add(value);
map.put(student.getName(), numList);
}
} else {
map.put(student.getName(), new ArrayList<>(Arrays.asList(getNum(student.getAddr()))));
}
}
System.out.println(map.toString());
}
}
Output would be :
{a=[1, 2, 3], b=[1, 2], c=[1, 3]}
How can I just do the same in java8 in a much more elegant way, may be using the streams ?
Found this Collectors.toMap in java 8 but could't find a way to actually do the same with this.
I was trying to map the elements as CSVs but that it didn't work since I couldn't figure out a way to remove the duplicates easily and the output is not what I need at the moment.
Map<String, String> map2 = new HashMap<>();
map2 = list.stream().collect(Collectors.toMap(Student::getName, Student::getAddr, (a, b) -> a + " , " + b));
System.out.println(map2.toString());
// {a=test 1 , test 1 , test 1 , test 2 , test 3, b=test 1 , test 1 , test 2, c=test 1 , test 3}
With streams, you could use Collectors.groupingBy along with Collectors.mapping:
Map<String, Set<String>> map = list.stream()
.collect(Collectors.groupingBy(
Student::getName,
Collectors.mapping(student -> getNum(student.getAddr()),
Collectors.toSet())));
I've chosen to create a map of sets instead of a map of lists, as it seems that you don't want duplicates in the lists.
If you do need lists instead of sets, it's more efficient to first collect to sets and then convert the sets to lists:
Map<String, List<String>> map = list.stream()
.collect(Collectors.groupingBy(
Student::getName,
Collectors.mapping(s -> getNum(s.getAddr()),
Collectors.collectingAndThen(Collectors.toSet(), ArrayList::new))));
This uses Collectors.collectingAndThen, which first collects and then transforms the result.
Another more compact way, without streams:
Map<String, Set<String>> map = new HashMap<>(); // or LinkedHashMap
list.forEach(s ->
map.computeIfAbsent(s.getName(), k -> new HashSet<>()) // or LinkedHashSet
.add(getNum(s.getAddr())));
This variant uses Iterable.forEach to iterate the list and Map.computeIfAbsent to group transformed addresses by student name.
First of all, the current solution is not really elegant, regardless of any streaming solution.
The pattern of
if (map.containsKey(k)) {
Value value = map.get(k);
...
} else {
map.put(k, new Value());
}
can often be simplified with Map#computeIfAbsent. In your example, this would be
// This will create a Map with Student names (distinct) and the test
// numbers (distinct List of tests numbers) associated with them.
for (Student student : list)
{
List<String> numList = map.computeIfAbsent(
student.getName(), s -> new ArrayList<String>());
String value = getNum(student.getAddr());
if (!numList.contains(value))
{
numList.add(value);
}
}
(This is a Java 8 function, but it is still unrelated to streams).
Next, the data structure that you want to build there does not seem to be the most appropriate one. In general, the pattern of
if (!list.contains(someValue)) {
list.add(someValue);
}
is a strong sign that you should not use a List, but a Set. The set will contain each element only once, and you will avoid the contains calls on the list, which are O(n) and thus may be expensive for larger lists.
Even if you really need a List in the end, it is often more elegant and efficient to first collect the elements in a Set, and afterwards convert this Set into a List in one dedicated step.
So the first part could be solved like this:
// This will create a Map with Student names (distinct) and the test
// numbers (distinct List of tests numbers) associated with them.
Map<String, Collection<String>> map = new HashMap<>();
for (Student student : list)
{
String value = getNum(student.getAddr());
map.computeIfAbsent(student.getName(), s -> new LinkedHashSet<String>())
.add(value);
}
It will create a Map<String, Collection<String>>. This can then be converted into a Map<String, List<String>> :
// Convert the 'Collection' values of the map into 'List' values
Map<String, List<String>> result =
map.entrySet().stream().collect(Collectors.toMap(
Entry::getKey, e -> new ArrayList<String>(e.getValue())));
Or, more generically, using a utility method for this:
private static <K, V> Map<K, List<V>> convertValuesToLists(
Map<K, ? extends Collection<? extends V>> map)
{
return map.entrySet().stream().collect(Collectors.toMap(
Entry::getKey, e -> new ArrayList<V>(e.getValue())));
}
I do not recommend this, but you also could convert the for loop into a stream operation:
Map<String, Set<String>> map =
list.stream().collect(Collectors.groupingBy(
Student::getName, LinkedHashMap::new,
Collectors.mapping(
s -> getNum(s.getAddr()), Collectors.toSet())));
Alternatively, you could do the "grouping by" and the conversion from Set to List in one step:
Map<String, List<String>> result =
list.stream().collect(Collectors.groupingBy(
Student::getName, LinkedHashMap::new,
Collectors.mapping(
s -> getNum(s.getAddr()),
Collectors.collectingAndThen(
Collectors.toSet(), ArrayList<String>::new))));
Or you could introduce an own collector, that does the List#contains call, but all this tends to be far less readable than the other solutions...
I think you are looking for something like below
Map<String,Set<String>> map = list.stream().
collect(Collectors.groupingBy(
Student::getName,
Collectors.mapping(e->getNum(e.getAddr()), Collectors.toSet())
));
System.out.println("Map : "+map);
Here is a version that collects everything in sets, and converts the final result to array lists:
/*
import java.util.*;
import java.util.stream.*;
import static java.util.stream.Collectors.*;
import java.util.function.*;
*/
Map<String, List<String>> map2 = list.stream().collect(groupingBy(
Student::getName, // we will group the students by name
Collector.of(
HashSet::new, // for each student name, we will collect result in a hash set
(arr, student) -> arr.add(getNum(student.getAddr())), // which we fill with processed addresses
(left, right) -> { left.addAll(right); return left; }, // we merge subresults like this
(Function<HashSet<String>, List<String>>) ArrayList::new // finish by converting to List
)
));
System.out.println(map2);
// Output:
// {a=[1, 2, 3], b=[1, 2], c=[1, 3]}
EDIT: made the finisher shorter using Marco13's hint.

Java - How to filter items from a list so that only one per element attribute is present? [duplicate]

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?
For example I have a list of Person object and I want to remove people with the same name,
persons.stream().distinct();
Will use the default equality check for a Person object, so I need something like,
persons.stream().distinct(p -> p.getName());
Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?
Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Then you can write:
persons.stream().filter(distinctByKey(Person::getName))
Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.
(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)
An alternative would be to place the persons in a map using the name as a key:
persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
Note that the Person that is kept, in case of a duplicate name, will be the first encontered.
You can wrap the person objects into another class, that only compares the names of the persons. Afterward, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:
persons.stream()
.map(Wrapper::new)
.distinct()
.map(Wrapper::unwrap)
...;
The class Wrapper might look as follows:
class Wrapper {
private final Person person;
public Wrapper(Person person) {
this.person = person;
}
public Person unwrap() {
return person;
}
public boolean equals(Object other) {
if (other instanceof Wrapper) {
return ((Wrapper) other).person.getName().equals(person.getName());
} else {
return false;
}
}
public int hashCode() {
return person.getName().hashCode();
}
}
Another solution, using Set. May not be the ideal solution, but it works
Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());
Or if you can modify the original list, you can use removeIf method
persons.removeIf(p -> !set.add(p.getName()));
There's a simpler approach using a TreeSet with a custom comparator.
persons.stream()
.collect(Collectors.toCollection(
() -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName()))
));
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
You can use groupingBy collector:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));
If you want to have another stream you can use this:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<Person> persons = ...;
MutableList<Person> distinct =
ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));
If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<Person> persons = ...;
MutableList<Person> distinct =
persons.distinct(HashingStrategies.fromFunction(Person::getName));
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Similar approach which Saeed Zarinfam used but more Java 8 style:)
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
.map(plans -> plans.stream().findFirst().get())
.collect(toList());
You can use StreamEx library:
StreamEx.of(persons)
.distinct(Person::getName)
.toList()
I recommend using Vavr, if you can. With this library you can do the following:
io.vavr.collection.List.ofAll(persons)
.distinctBy(Person::getName)
.toJavaSet() // or any another Java 8 Collection
Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
final Set<Object> seen = new HashSet<>();
return t -> seen.add(keyExtractor.apply(t));
}
Then call:
persons.stream().filter(distinctByKey(p -> p.getName());
My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.
List<YourPersonClass> listWithDistinctPersons = persons.stream()
//operators to remove duplicates based on person name
.collect(Collectors.groupingBy(p -> p.getName()))
.values()
.stream()
//cut short the groups to size of 1
.flatMap(group -> group.stream().limit(1))
//collect distinct users as list
.collect(Collectors.toList());
Distinct objects list can be found using:
List distinctPersons = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
ArrayList::new));
I made a generic version:
private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
return Collectors.collectingAndThen(
toMap(
keyExtractor,
t -> t,
(t1, t2) -> t1
),
(Map<R, T> map) -> map.values().stream()
);
}
An exemple:
Stream.of(new Person("Jean"),
new Person("Jean"),
new Person("Paul")
)
.filter(...)
.collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
.map(...)
.collect(toList())
Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:
Seq.seq(persons).distinct(Person::getName).toList();
Under the hood, it does practically the same thing as the accepted answer, though.
Set<YourPropertyType> set = new HashSet<>();
list
.stream()
.filter(it -> set.add(it.getYourProperty()))
.forEach(it -> ...);
While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set".
Things get even worse if list is above size of 10.
Consider you have a collection of 20 Objects, like this:
public static final List<SimpleEvent> testList = Arrays.asList(
new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
Where you object SimpleEvent looks like this:
public class SimpleEvent {
private String name;
private String type;
public SimpleEvent(String name) {
this.name = name;
this.type = "type_"+name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
}
And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = testList
.stream()
.filter(distinctByKey(SimpleEvent::getName))
.map(SimpleEvent::getName)
.collect(Collectors.toSet());
blackhole.consume(uniqueNames);
}
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = new HashSet<>();
for (SimpleEvent event : testList) {
uniqueNames.add(event.getName());
}
blackhole.consume(uniqueNames);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.mode(Mode.Throughput)
.warmupBatchSize(3)
.warmupIterations(3)
.measurementIterations(3)
.build();
new Runner(opt).run();
}
Then you'll have Benchmark results like this:
Benchmark Mode Samples Score Score error Units
c.s.MyBenchmark.aForEachBasedUniqueSet thrpt 3 2635199.952 1663320.718 ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet thrpt 3 729134.695 895825.697 ops/s
And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.
Higher the throughput, better the performance
I would like to improve Stuart Marks answer. What if the key is null, it will through NullPointerException. Here I ignore the null key by adding one more check as keyExtractor.apply(t)!=null.
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));
}
This works like a charm:
Grouping the data by unique key to form a map.
Returning the first object from every value of the map (There could be multiple person having same name).
persons.stream()
.collect(groupingBy(Person::getName))
.values()
.stream()
.flatMap(values -> values.stream().limit(1))
.collect(toList());
The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:
Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
Person previous;
public boolean test(Person p) {
if(previous!=null && c.compare(previous, p)==0)
return false;
previous=p;
return true;
}
})./* more stream operations here */;
Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.
There are lot of approaches, this one will also help - Simple, Clean and Clear
List<Employee> employees = new ArrayList<>();
employees.add(new Employee(11, "Ravi"));
employees.add(new Employee(12, "Stalin"));
employees.add(new Employee(23, "Anbu"));
employees.add(new Employee(24, "Yuvaraj"));
employees.add(new Employee(35, "Sena"));
employees.add(new Employee(36, "Antony"));
employees.add(new Employee(47, "Sena"));
employees.add(new Employee(48, "Ravi"));
List<Employee> empList = new ArrayList<>(employees.stream().collect(
Collectors.toMap(Employee::getName, obj -> obj,
(existingValue, newValue) -> existingValue))
.values());
empList.forEach(System.out::println);
// Collectors.toMap(
// Employee::getName, - key (the value by which you want to eliminate duplicate)
// obj -> obj, - value (entire employee object)
// (existingValue, newValue) -> existingValue) - to avoid illegalstateexception: duplicate key
Output - toString() overloaded
Employee{id=35, name='Sena'}
Employee{id=12, name='Stalin'}
Employee{id=11, name='Ravi'}
Employee{id=24, name='Yuvaraj'}
Employee{id=36, name='Antony'}
Employee{id=23, name='Anbu'}
Here is the example
public class PayRoll {
private int payRollId;
private int id;
private String name;
private String dept;
private int salary;
public PayRoll(int payRollId, int id, String name, String dept, int salary) {
super();
this.payRollId = payRollId;
this.id = id;
this.name = name;
this.dept = dept;
this.salary = salary;
}
}
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;
public class Prac {
public static void main(String[] args) {
int salary=70000;
PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
PayRoll payRoll2=new PayRoll(1411, 2 , "B", "Technical", salary);
PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
List<PayRoll>list=new ArrayList<PayRoll>();
list.add(payRoll);
list.add(payRoll2);
list.add(payRoll3);
list.add(payRoll4);
list.add(payRoll5);
list.add(payRoll6);
Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));
k.entrySet().forEach(p->
{
if(p.getValue().isPresent())
{
System.out.println(p.getValue().get());
}
});
}
}
Output:
PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]
Late to the party but I sometimes use this one-liner as an equivalent:
((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.
Building on #josketres's answer, I created a generic utility method:
You could make this more Java 8-friendly by creating a Collector.
public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
return input.stream()
.collect(toCollection(() -> new TreeSet<>(comparer)));
}
#Test
public void removeDuplicatesWithDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(7), new C(42), new C(42));
Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
assertEquals(2, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 7));
assertTrue(result.stream().anyMatch(c -> c.value == 42));
}
#Test
public void removeDuplicatesWithoutDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(1), new C(2), new C(3));
Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
assertEquals(3, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 1));
assertTrue(result.stream().anyMatch(c -> c.value == 2));
assertTrue(result.stream().anyMatch(c -> c.value == 3));
}
private class C {
public final int value;
private C(int value) {
this.value = value;
}
}
Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). Stream partition answer by Tagir Valeev inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.
public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
(map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
(left, right) -> {
left.putAll(right);
return left;
}, map -> new ArrayList<>(map.values()),
Collector.Characteristics.UNORDERED)); }
I had a situation, where I was suppose to get distinct elements from list based on 2 keys.
If you want distinct based on two keys or may composite key, try this
class Person{
int rollno;
String name;
}
List<Person> personList;
Function<Person, List<Object>> compositeKey = personList->
Arrays.<Object>asList(personList.getName(), personList.getRollno());
Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));
List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
.filter(settingMap ->
settingMap.getValue().size() > 1)
.collect(Collectors.toList());
A variation of the top answer that handles null:
public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
}
In my tests:
assertEquals(
asList("a", "bb"),
Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));
assertEquals(
asList(5, null, 2, 3),
Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));
val maps = asList(
hashMapWith(0, 2),
hashMapWith(1, 2),
hashMapWith(2, null),
hashMapWith(3, 1),
hashMapWith(4, null),
hashMapWith(5, 2));
assertEquals(
asList(0, 2, 3),
maps.stream()
.filter(distinctBy(m -> m.get("val")))
.map(m -> m.get("i"))
.collect(toList()));
In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.
public List<Log> fetchLogById(Long id) {
return this.findLogById(id).stream()
.filter(new LogPredicate())
.collect(Collectors.toList());
}
public class LogPredicate implements Predicate<Log> {
private Log previous;
public boolean test(Log atual) {
boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
if (isDifferent) {
previous = current;
}
return isDifferent;
}
private boolean verifyIfDifferentLog(Log current, Log previous) {
return !current.getId().equals(previous.getId());
}
}
My solution in this listing:
List<HolderEntry> result ....
List<HolderEntry> dto3s = new ArrayList<>(result.stream().collect(toMap(
HolderEntry::getId,
holder -> holder, //or Function.identity() if you want
(holder1, holder2) -> holder1
)).values());
In my situation i want to find distinct values and put their in List.

Combine two sets conditionally

I have a Person object which has a name attribute and some other attributes. I have two HashSet with Person objects. Note that name is not an unique attribute meaning that two Persons with same name can have different height so using HashSet does not guarantee that two Persons with same name are not in the same set.
I need to add one set to another so there are no Persons in the result with the same name. So something like this:
public void combine(HashSet<Person> set1, HashSet<Person> set2){
for (String item2 : set2) {
boolean exists = false;
for (String item1 : set1) {
if(item2.name.equals(item1.name)){
exists = true;
}
}
if(!exists){
set1.add(item2);
}
}
}
Is there a cleaner way of doing this in java8?
set1.addAll(set2.stream().filter(e -> set1.stream()
.noneMatch(p -> p.getName().equals(e.getName())))
.collect(Collectors.toSet()));
If it makes sense for you to override equals and hashCode you can use something like this:
Set<Parent> result = Stream.concat(set1.stream(), set2.stream())
.collect(Collectors.toSet());
Without the Java 8 streams you can easily just do this:
Set<Parent> result = new HashSet<>();
result.addAll(set1);
result.addAll(set2);
But remember this solution is only feasible when it makes sense to have equals and hashCode overridden.
`
You can use a HashMap with name as key, then you avoid the O(n²) runtime complexity of your method. If you need HashSet, then there is no faster way. Even if you use Java 8 Streams. They add just more overhead.
public Map<String, Person> combine(Set<Person> set1, Set<Person> set2) {
Map<String, Person> persons = new HashMap<>();
set1.forEach(pers -> persons.computeIfAbsent(pers.getName(), key -> pers));
set2.forEach(pers -> persons.computeIfAbsent(pers.getName(), key -> pers));
return persons;
}
Alternatively, you could create your own collector. Assuming that you're certain that two persons with the same name are in fact the same person:
First you can define a collector:
static Collector<Person, ?, Map<String, Person>> groupByName() {
return Collector.of(
HashMap::new,
(a,b) -> a.putIfAbsent(b.name, b),
(a,b) -> { a.putAll(b); return a;}
);
}
Then you can use it to group persons by name:
Stream.concat(s1.stream(), s2.stream())
.collect(groupByName());
However, this would give you a Map<String, Person> and you just want the whole set of Persons found, right?
So, you could just do:
Set<Person> p = Stream.concat(s1.stream(), s2.stream())
.collect(collectingAndThen(groupByName(), p -> new HashSet<>(p.values())));

Java 8 Stream API - Selecting only values after Collectors.groupingBy(..)

Say I have the following collection of Student objects which consist of Name(String), Age(int) and City(String).
I am trying to use Java's Stream API to achieve the following sql-like behavior:
SELECT MAX(age)
FROM Students
GROUP BY city
Now, I found two different ways to do so:
final List<Integer> variation1 =
students.stream()
.collect(Collectors.groupingBy(Student::getCity, Collectors.maxBy((s1, s2) -> s1.getAge() - s2.getAge())))
.values()
.stream()
.filter(Optional::isPresent)
.map(Optional::get)
.map(Student::getAge)
.collect(Collectors.toList());
And the other one:
final Collection<Integer> variation2 =
students.stream()
.collect(Collectors.groupingBy(Student::getCity,
Collectors.collectingAndThen(Collectors.maxBy((s1, s2) -> s1.getAge() - s2.getAge()),
optional -> optional.get().getAge())))
.values();
In both ways, one has to .values() ... and filter the empty groups returned from the collector.
Is there any other way to achieve this required behavior?
These methods remind me of over partition by sql statements...
Thanks
Edit: All the answers below were really interesting, but unfortunately this is not what I was looking for, since what I try to get is just the values. I don't need the keys, just the values.
Do not always stick with groupingBy. Sometimes toMap is the thing you need:
Collection<Integer> result = students.stream()
.collect(Collectors.toMap(Student::getCity, Student::getAge, Integer::max))
.values();
Here you just create a Map where keys are cities and values are ages. In case when several students have the same city, merge function is used which just selects maximal age here. It's faster and cleaner.
As addition to Tagir’s great answer using toMap instead of groupingBy, here the short solution, if you want to stick to groupingBy:
Collection<Integer> result = students.stream()
.collect(Collectors.groupingBy(Student::getCity,
Collectors.reducing(-1, Student::getAge, Integer::max)))
.values();
Note that this three arg reducing collector already performs a mapping operation, so we don’t need to nest it with a mapping collector, further, providing an identity value avoids dealing with Optional. Since ages are always positive, providing -1 is sufficient and since a group will always have at least one element, the identity value will never show up as a result.
Still, I think Tagir’s toMap based solution is preferable in this scenario.
The groupingBy based solution becomes more interesting when you want to get the actual students having the maximum age, e.g
Collection<Student> result = students.stream().collect(
Collectors.groupingBy(Student::getCity, Collectors.reducing(null, BinaryOperator.maxBy(
Comparator.nullsFirst(Comparator.comparingInt(Student::getAge)))))
).values();
well, actually, even this can also be expressed using the toMap collector:
Collection<Student> result = students.stream().collect(
Collectors.toMap(Student::getCity, Function.identity(),
BinaryOperator.maxBy(Comparator.comparingInt(Student::getAge)))
).values();
You can express almost everything with both collectors, but groupingBy has the advantage on its side when you want to perform a mutable reduction on the values.
The second approach calls get() on an Optional; this is usually a bad idea as you don't know if the optional will be empty or not (use orElse(), orElseGet(), orElseThrow() methods instead). While you might argue that in this case there always be a value since you generate the values from the student list itself, this is something to keep in mind.
Based on that, you might turn the variation 2 into:
final Collection<Integer> variation2 =
students.stream()
.collect(collectingAndThen(groupingBy(Student::getCity,
collectingAndThen(
mapping(Student::getAge, maxBy(naturalOrder())),
Optional::get)),
Map::values));
Although it really starts to be difficult to read, I'll probably use the variant 1:
final List<Integer> variation1 =
students.stream()
.collect(groupingBy(Student::getCity,
mapping(Student::getAge, maxBy(naturalOrder()))))
.values()
.stream()
.map(Optional::get)
.collect(toList());
Here is my implementation
public class MaxByTest {
static class Student {
private int age;
private int city;
public Student(int age, int city) {
this.age = age;
this.city = city;
}
public int getCity() {
return city;
}
public int getAge() {
return age;
}
#Override
public String toString() {
return " City : " + city + " Age : " + age;
}
}
static List<Student> students = Arrays.asList(new Student[]{
new Student(10, 1),
new Student(9, 2),
new Student(8, 1),
new Student(6, 1),
new Student(4, 1),
new Student(8, 2),
new Student(9, 2),
new Student(7, 2),
});
public static void main(String[] args) {
final Comparator<Student> comparator = (p1, p2) -> Integer.compare( p1.getAge(), p2.getAge());
final List<Student> studets =
students.stream()
.collect(Collectors.groupingBy(Student::getCity,
Collectors.maxBy(comparator))).values().stream().map(Optional::get).collect(Collectors.toList());
System.out.println(studets);
}
}
List<BeanClass> list1 = new ArrayList<BeanClass>();
DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
list1.add(new BeanClass(123,abc,99.0,formatter.parse("2018-02-01")));
list1.add(new BeanClass(456,xyz,99.0,formatter.parse("2014-01-01")));
list1.add(new BeanClass(789,pqr,95.0,formatter.parse("2014-01-01")));
list1.add(new BeanClass(1011,def,99.0,formatter.parse("2014-01-01")));
Map<Object, Optional<Double>> byDate = list1.stream()
.collect(Collectors.groupingBy(p -> formatter.format(p.getCurrentDate()),
Collectors.mapping(BeanClass::getAge, Collectors.maxBy(Double::compare))));

Categories