Shuffle a list of integers with Java 8 Streams API - java

I tried to translate the following line of Scala to Java 8 using the Streams API:
// Scala
util.Random.shuffle((1 to 24).toList)
To write the equivalent in Java I created a range of integers:
IntStream.range(1, 25)
I suspected to find a toList method in the stream API, but IntStream only knows the strange method:
collect(
Supplier<R> supplier, ObjIntConsumer<R> accumulator, BiConsumer<R,R> combiner)
How can I shuffle a list with Java 8 Streams API?

Here you go:
List<Integer> integers =
IntStream.range(1, 10) // <-- creates a stream of ints
.boxed() // <-- converts them to Integers
.collect(Collectors.toList()); // <-- collects the values to a list
Collections.shuffle(integers);
System.out.println(integers);
Prints:
[8, 1, 5, 3, 4, 2, 6, 9, 7]

You may find the following toShuffledList() method useful.
private static final Collector<?, ?, ?> SHUFFLER = Collectors.collectingAndThen(
Collectors.toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list;
}
);
#SuppressWarnings("unchecked")
public static <T> Collector<T, ?, List<T>> toShuffledList() {
return (Collector<T, ?, List<T>>) SHUFFLER;
}
This enables the following kind of one-liner:
IntStream.rangeClosed('A', 'Z')
.mapToObj(a -> (char) a)
.collect(toShuffledList())
.forEach(System.out::print);
Example output:
AVBFYXIMUDENOTHCRJKWGQZSPL

You can use a custom comparator that "sorts" the values by a random value:
public final class RandomComparator<T> implements Comparator<T> {
private final Map<T, Integer> map = new IdentityHashMap<>();
private final Random random;
public RandomComparator() {
this(new Random());
}
public RandomComparator(Random random) {
this.random = random;
}
#Override
public int compare(T t1, T t2) {
return Integer.compare(valueFor(t1), valueFor(t2));
}
private int valueFor(T t) {
synchronized (map) {
return map.computeIfAbsent(t, ignore -> random.nextInt());
}
}
}
Each object in the stream is (lazily) associated a random integer value, on which we sort. The synchronization on the map is to deal with parallel streams.
You can then use it like that:
IntStream.rangeClosed(0, 24).boxed()
.sorted(new RandomComparator<>())
.collect(Collectors.toList());
The advantage of this solution is that it integrates within the stream pipeline.

If you want to process the whole Stream without too much hassle, you can simply create your own Collector using Collectors.collectingAndThen():
public static <T> Collector<T, ?, Stream<T>> toEagerShuffledStream() {
return Collectors.collectingAndThen(
toList(),
list -> {
Collections.shuffle(list);
return list.stream();
});
}
But this won't perform well if you want to limit() the resulting Stream. In order to overcome this, one could create a custom Spliterator:
package com.pivovarit.stream;
import java.util.List;
import java.util.Objects;
import java.util.Random;
import java.util.RandomAccess;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
class ImprovedRandomSpliterator<T, LIST extends RandomAccess & List<T>> implements Spliterator<T> {
private final Random random;
private final List<T> source;
private int size;
ImprovedRandomSpliterator(LIST source, Supplier<? extends Random> random) {
Objects.requireNonNull(source, "source can't be null");
Objects.requireNonNull(random, "random can't be null");
this.source = source;
this.random = random.get();
this.size = this.source.size();
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (size > 0) {
int nextIdx = random.nextInt(size);
int lastIdx = --size;
T last = source.get(lastIdx);
T elem = source.set(nextIdx, last);
action.accept(elem);
return true;
} else {
return false;
}
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.size();
}
#Override
public int characteristics() {
return SIZED;
}
}
and then:
public final class RandomCollectors {
private RandomCollectors() {
}
public static <T> Collector<T, ?, Stream<T>> toImprovedLazyShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
public static <T> Collector<T, ?, Stream<T>> toEagerShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list.stream();
});
}
}
I explained the performance considerations here: https://4comprehension.com/implementing-a-randomized-stream-spliterator-in-java/

To perform a shuffle efficiently you need all the values in advance. You can use Collections.shuffle() after you have converted the stream to a list like you do in Scala.

If you're looking for a "streaming only" solution and a deterministic, merely "haphazard" ordering versus a "random" ordering is Good Enough, you can always sort your ints by a hash value:
List<Integer> xs=IntStream.range(0, 10)
.boxed()
.sorted( (a, b) -> a.hashCode() - b.hashCode() )
.collect(Collectors.toList());
If you'd rather have an int[] than a List<Integer>, you can just unbox them afterwards. Unfortunately, you have go through the boxing step to apply a custom Comparator, so there's no eliminating that part of the process.
List<Integer> ys=IntStream.range(0, 10)
.boxed()
.sorted( (a, b) -> a.hashCode() - b.hashCode() )
.mapToInt( a -> a.intValue())
.toArray();

List<Integer> randomShuffledRange(int startInclusive, int endExclusive) {
return new Random().ints(startInclusive, endExclusive)
.distinct()
.limit(endExclusive-startInclusive)
.boxed()
.collect(Collectors.toList());
}
var shuffled = randomShuffledRange(1, 10);
System.out.println(shuffled);
Example output:
[4, 6, 8, 9, 1, 7, 3, 5, 2]

This is my one line solution:
I am picking one random color:
colourRepository.findAll().stream().sorted((o1,o2)-> RandomUtils.nextInt(-1,1)).findFirst().get()

Related

Java - How to filter items from a list so that only one per element attribute is present? [duplicate]

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?
For example I have a list of Person object and I want to remove people with the same name,
persons.stream().distinct();
Will use the default equality check for a Person object, so I need something like,
persons.stream().distinct(p -> p.getName());
Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?
Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Then you can write:
persons.stream().filter(distinctByKey(Person::getName))
Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.
(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)
An alternative would be to place the persons in a map using the name as a key:
persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
Note that the Person that is kept, in case of a duplicate name, will be the first encontered.
You can wrap the person objects into another class, that only compares the names of the persons. Afterward, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:
persons.stream()
.map(Wrapper::new)
.distinct()
.map(Wrapper::unwrap)
...;
The class Wrapper might look as follows:
class Wrapper {
private final Person person;
public Wrapper(Person person) {
this.person = person;
}
public Person unwrap() {
return person;
}
public boolean equals(Object other) {
if (other instanceof Wrapper) {
return ((Wrapper) other).person.getName().equals(person.getName());
} else {
return false;
}
}
public int hashCode() {
return person.getName().hashCode();
}
}
Another solution, using Set. May not be the ideal solution, but it works
Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());
Or if you can modify the original list, you can use removeIf method
persons.removeIf(p -> !set.add(p.getName()));
There's a simpler approach using a TreeSet with a custom comparator.
persons.stream()
.collect(Collectors.toCollection(
() -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName()))
));
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
You can use groupingBy collector:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));
If you want to have another stream you can use this:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<Person> persons = ...;
MutableList<Person> distinct =
ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));
If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<Person> persons = ...;
MutableList<Person> distinct =
persons.distinct(HashingStrategies.fromFunction(Person::getName));
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Similar approach which Saeed Zarinfam used but more Java 8 style:)
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
.map(plans -> plans.stream().findFirst().get())
.collect(toList());
You can use StreamEx library:
StreamEx.of(persons)
.distinct(Person::getName)
.toList()
I recommend using Vavr, if you can. With this library you can do the following:
io.vavr.collection.List.ofAll(persons)
.distinctBy(Person::getName)
.toJavaSet() // or any another Java 8 Collection
Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
final Set<Object> seen = new HashSet<>();
return t -> seen.add(keyExtractor.apply(t));
}
Then call:
persons.stream().filter(distinctByKey(p -> p.getName());
My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.
List<YourPersonClass> listWithDistinctPersons = persons.stream()
//operators to remove duplicates based on person name
.collect(Collectors.groupingBy(p -> p.getName()))
.values()
.stream()
//cut short the groups to size of 1
.flatMap(group -> group.stream().limit(1))
//collect distinct users as list
.collect(Collectors.toList());
Distinct objects list can be found using:
List distinctPersons = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
ArrayList::new));
I made a generic version:
private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
return Collectors.collectingAndThen(
toMap(
keyExtractor,
t -> t,
(t1, t2) -> t1
),
(Map<R, T> map) -> map.values().stream()
);
}
An exemple:
Stream.of(new Person("Jean"),
new Person("Jean"),
new Person("Paul")
)
.filter(...)
.collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
.map(...)
.collect(toList())
Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:
Seq.seq(persons).distinct(Person::getName).toList();
Under the hood, it does practically the same thing as the accepted answer, though.
Set<YourPropertyType> set = new HashSet<>();
list
.stream()
.filter(it -> set.add(it.getYourProperty()))
.forEach(it -> ...);
While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set".
Things get even worse if list is above size of 10.
Consider you have a collection of 20 Objects, like this:
public static final List<SimpleEvent> testList = Arrays.asList(
new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
Where you object SimpleEvent looks like this:
public class SimpleEvent {
private String name;
private String type;
public SimpleEvent(String name) {
this.name = name;
this.type = "type_"+name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
}
And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = testList
.stream()
.filter(distinctByKey(SimpleEvent::getName))
.map(SimpleEvent::getName)
.collect(Collectors.toSet());
blackhole.consume(uniqueNames);
}
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = new HashSet<>();
for (SimpleEvent event : testList) {
uniqueNames.add(event.getName());
}
blackhole.consume(uniqueNames);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.mode(Mode.Throughput)
.warmupBatchSize(3)
.warmupIterations(3)
.measurementIterations(3)
.build();
new Runner(opt).run();
}
Then you'll have Benchmark results like this:
Benchmark Mode Samples Score Score error Units
c.s.MyBenchmark.aForEachBasedUniqueSet thrpt 3 2635199.952 1663320.718 ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet thrpt 3 729134.695 895825.697 ops/s
And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.
Higher the throughput, better the performance
I would like to improve Stuart Marks answer. What if the key is null, it will through NullPointerException. Here I ignore the null key by adding one more check as keyExtractor.apply(t)!=null.
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));
}
This works like a charm:
Grouping the data by unique key to form a map.
Returning the first object from every value of the map (There could be multiple person having same name).
persons.stream()
.collect(groupingBy(Person::getName))
.values()
.stream()
.flatMap(values -> values.stream().limit(1))
.collect(toList());
The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:
Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
Person previous;
public boolean test(Person p) {
if(previous!=null && c.compare(previous, p)==0)
return false;
previous=p;
return true;
}
})./* more stream operations here */;
Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.
There are lot of approaches, this one will also help - Simple, Clean and Clear
List<Employee> employees = new ArrayList<>();
employees.add(new Employee(11, "Ravi"));
employees.add(new Employee(12, "Stalin"));
employees.add(new Employee(23, "Anbu"));
employees.add(new Employee(24, "Yuvaraj"));
employees.add(new Employee(35, "Sena"));
employees.add(new Employee(36, "Antony"));
employees.add(new Employee(47, "Sena"));
employees.add(new Employee(48, "Ravi"));
List<Employee> empList = new ArrayList<>(employees.stream().collect(
Collectors.toMap(Employee::getName, obj -> obj,
(existingValue, newValue) -> existingValue))
.values());
empList.forEach(System.out::println);
// Collectors.toMap(
// Employee::getName, - key (the value by which you want to eliminate duplicate)
// obj -> obj, - value (entire employee object)
// (existingValue, newValue) -> existingValue) - to avoid illegalstateexception: duplicate key
Output - toString() overloaded
Employee{id=35, name='Sena'}
Employee{id=12, name='Stalin'}
Employee{id=11, name='Ravi'}
Employee{id=24, name='Yuvaraj'}
Employee{id=36, name='Antony'}
Employee{id=23, name='Anbu'}
Here is the example
public class PayRoll {
private int payRollId;
private int id;
private String name;
private String dept;
private int salary;
public PayRoll(int payRollId, int id, String name, String dept, int salary) {
super();
this.payRollId = payRollId;
this.id = id;
this.name = name;
this.dept = dept;
this.salary = salary;
}
}
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;
public class Prac {
public static void main(String[] args) {
int salary=70000;
PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
PayRoll payRoll2=new PayRoll(1411, 2 , "B", "Technical", salary);
PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
List<PayRoll>list=new ArrayList<PayRoll>();
list.add(payRoll);
list.add(payRoll2);
list.add(payRoll3);
list.add(payRoll4);
list.add(payRoll5);
list.add(payRoll6);
Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));
k.entrySet().forEach(p->
{
if(p.getValue().isPresent())
{
System.out.println(p.getValue().get());
}
});
}
}
Output:
PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]
Late to the party but I sometimes use this one-liner as an equivalent:
((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.
Building on #josketres's answer, I created a generic utility method:
You could make this more Java 8-friendly by creating a Collector.
public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
return input.stream()
.collect(toCollection(() -> new TreeSet<>(comparer)));
}
#Test
public void removeDuplicatesWithDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(7), new C(42), new C(42));
Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
assertEquals(2, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 7));
assertTrue(result.stream().anyMatch(c -> c.value == 42));
}
#Test
public void removeDuplicatesWithoutDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(1), new C(2), new C(3));
Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
assertEquals(3, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 1));
assertTrue(result.stream().anyMatch(c -> c.value == 2));
assertTrue(result.stream().anyMatch(c -> c.value == 3));
}
private class C {
public final int value;
private C(int value) {
this.value = value;
}
}
Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). Stream partition answer by Tagir Valeev inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.
public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
(map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
(left, right) -> {
left.putAll(right);
return left;
}, map -> new ArrayList<>(map.values()),
Collector.Characteristics.UNORDERED)); }
I had a situation, where I was suppose to get distinct elements from list based on 2 keys.
If you want distinct based on two keys or may composite key, try this
class Person{
int rollno;
String name;
}
List<Person> personList;
Function<Person, List<Object>> compositeKey = personList->
Arrays.<Object>asList(personList.getName(), personList.getRollno());
Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));
List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
.filter(settingMap ->
settingMap.getValue().size() > 1)
.collect(Collectors.toList());
A variation of the top answer that handles null:
public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
}
In my tests:
assertEquals(
asList("a", "bb"),
Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));
assertEquals(
asList(5, null, 2, 3),
Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));
val maps = asList(
hashMapWith(0, 2),
hashMapWith(1, 2),
hashMapWith(2, null),
hashMapWith(3, 1),
hashMapWith(4, null),
hashMapWith(5, 2));
assertEquals(
asList(0, 2, 3),
maps.stream()
.filter(distinctBy(m -> m.get("val")))
.map(m -> m.get("i"))
.collect(toList()));
In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.
public List<Log> fetchLogById(Long id) {
return this.findLogById(id).stream()
.filter(new LogPredicate())
.collect(Collectors.toList());
}
public class LogPredicate implements Predicate<Log> {
private Log previous;
public boolean test(Log atual) {
boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
if (isDifferent) {
previous = current;
}
return isDifferent;
}
private boolean verifyIfDifferentLog(Log current, Log previous) {
return !current.getId().equals(previous.getId());
}
}
My solution in this listing:
List<HolderEntry> result ....
List<HolderEntry> dto3s = new ArrayList<>(result.stream().collect(toMap(
HolderEntry::getId,
holder -> holder, //or Function.identity() if you want
(holder1, holder2) -> holder1
)).values());
In my situation i want to find distinct values and put their in List.

Java-Streams: is there a way to produce a stream of List<Item> out of a Stream of Items? [duplicate]

How to implement "partition" operation on Java 8 Stream? By partition I mean, divide a stream into sub-streams of a given size. Somehow it will be identical to Guava Iterators.partition() method, just it's desirable that the partitions are lazily-evaluated Streams rather than List's.
It's impossible to partition the arbitrary source stream to the fixed size batches, because this will screw up the parallel processing. When processing in parallel you may not know how many elements in the first sub-task after the split, so you cannot create the partitions for the next sub-task until the first is fully processed.
However it is possible to create the stream of partitions from the random access List. Such feature is available, for example, in my StreamEx library:
List<Type> input = Arrays.asList(...);
Stream<List<Type>> stream = StreamEx.ofSubLists(input, partitionSize);
Or if you really want the stream of streams:
Stream<Stream<Type>> stream = StreamEx.ofSubLists(input, partitionSize).map(List::stream);
If you don't want to depend on third-party libraries, you can implement such ofSubLists method manually:
public static <T> Stream<List<T>> ofSubLists(List<T> source, int length) {
if (length <= 0)
throw new IllegalArgumentException("length = " + length);
int size = source.size();
if (size <= 0)
return Stream.empty();
int fullChunks = (size - 1) / length;
return IntStream.range(0, fullChunks + 1).mapToObj(
n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length));
}
This implementation looks a little bit long, but it takes into account some corner cases like close-to-MAX_VALUE list size.
If you want parallel-friendly solution for unordered stream (so you don't care which stream elements will be combined in single batch), you may use the collector like this (thanks to #sibnick for inspiration):
public static <T, A, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<List<T>, A, R> downstream) {
class Acc {
List<T> cur = new ArrayList<>();
A acc = downstream.supplier().get();
}
BiConsumer<Acc, T> accumulator = (acc, t) -> {
acc.cur.add(t);
if(acc.cur.size() == batchSize) {
downstream.accumulator().accept(acc.acc, acc.cur);
acc.cur = new ArrayList<>();
}
};
return Collector.of(Acc::new, accumulator,
(acc1, acc2) -> {
acc1.acc = downstream.combiner().apply(acc1.acc, acc2.acc);
for(T t : acc2.cur) accumulator.accept(acc1, t);
return acc1;
}, acc -> {
if(!acc.cur.isEmpty())
downstream.accumulator().accept(acc.acc, acc.cur);
return downstream.finisher().apply(acc.acc);
}, Collector.Characteristics.UNORDERED);
}
Usage example:
List<List<Integer>> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.toList()));
Result:
[[2, 3, 4], [7, 8, 9], [0, 1, 5], [12, 13, 14], [17, 18, 19], [10, 11, 15], [6, 16]]
Such collector is perfectly thread-safe and produces ordered batches for sequential stream.
If you want to apply an intermediate transformation for every batch, you may use the following version:
public static <T, AA, A, B, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<T, AA, B> batchCollector,
Collector<B, A, R> downstream) {
return unorderedBatches(batchSize,
Collectors.mapping(list -> list.stream().collect(batchCollector), downstream));
}
For example, this way you can sum the numbers in every batch on the fly:
List<Integer> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.summingInt(Integer::intValue),
Collectors.toList()));
I found an elegant solution: Iterable parts = Iterables.partition(stream::iterator, size)
Provided you want to use the Stream sequentially, it is possible to partition a Stream (as well as perform related functions such as windowing - which I think is what you really want in this case).
Two libraries that will support partitoning for standard Streams are cyclops-react (I am the author) and jOOλ which cyclops-react extends (to add functionality such as Windowing).
cyclops-streams has a collection of static functions StreamUtils for operating on Java Streams, and a series of functions such as splitAt, headAndTail, splitBy, partition for partitioning.
To window a Stream into a Stream of nested Streams of size 30 you can use the window method.
To the OPs point, in Streaming terms, splitting a Stream into multiple Streams of a given size is a Windowing operation (rather than a Partitioning operation).
Stream<Streamable<Integer>> streamOfStreams = StreamUtils.window(stream,30);
There is a Stream extension class called ReactiveSeq that extends jool.Seq and adds Windowing functionality, that may make the code a little cleaner.
ReactiveSeq<Integer> seq;
ReactiveSeq<ListX<Integer>> streamOfLists = seq.grouped(30);
As Tagir points out above though, this isn't suitable for parallel Streams. If you want to window or batch a Stream you wish to executed in a multithreaded fashion. LazyFutureStream in cyclops-reactmight be useful (Windowing is on the to-do list, but plain old batching is available now).
In this case data will be passed from the multiple threads executing the Stream to a Multi-Producer/Single-Consumer wait-free Queue and the sequential data from that queue can be windowed before being distributed to threads again.
Stream<List<Data>> batched = new LazyReact().range(0,1000)
.grouped(30)
.map(this::process);
It seem like, as Jon Skeet has shown in his comment, it's not possible to make partitions lazy. For non-lazy partitions, I already have this code:
public static <T> Stream<Stream<T>> partition(Stream<T> source, int size) {
final Iterator<T> it = source.iterator();
final Iterator<Stream<T>> partIt = Iterators.transform(Iterators.partition(it, size), List::stream);
final Iterable<Stream<T>> iterable = () -> partIt;
return StreamSupport.stream(iterable.spliterator(), false);
}
This is a pure Java solution that's evaluated lazily instead of using List.
public static <T> Stream<List<T>> partition(Stream<T> stream, int batchSize){
List<List<T>> currentBatch = new ArrayList<List<T>>(); //just to make it mutable
currentBatch.add(new ArrayList<T>(batchSize));
return Stream.concat(stream
.sequential()
.map(new Function<T, List<T>>(){
public List<T> apply(T t){
currentBatch.get(0).add(t);
return currentBatch.get(0).size() == batchSize ? currentBatch.set(0,new ArrayList<>(batchSize)): null;
}
}), Stream.generate(()->currentBatch.get(0).isEmpty()?null:currentBatch.get(0))
.limit(1)
).filter(Objects::nonNull);
}
The method returns Stream<List<T>> for flexibility. You can convert it to Stream<Stream<T>> easily by partition(something, 10).map(List::stream).
The most elegant and pure java 8 solution for this problem i found:
public static <T> List<List<T>> partition(final List<T> list, int batchSize) {
return IntStream.range(0, getNumberOfPartitions(list, batchSize))
.mapToObj(i -> list.subList(i * batchSize, Math.min((i + 1) * batchSize, list.size())))
.collect(toList());
}
//https://stackoverflow.com/questions/23246983/get-the-next-higher-integer-value-in-java
private static <T> int getNumberOfPartitions(List<T> list, int batchSize) {
return (list.size() + batchSize- 1) / batchSize;
}
I think it is possible with some sort of hack inside:
create utility class for batch:
public static class ConcurrentBatch {
private AtomicLong id = new AtomicLong();
private int batchSize;
public ConcurrentBatch(int batchSize) {
this.batchSize = batchSize;
}
public long next() {
return (id.getAndIncrement()) / batchSize;
}
public int getBatchSize() {
return batchSize;
}
}
and method:
public static <T> void applyConcurrentBatchToStream(Consumer<List<T>> batchFunc, Stream<T> stream, int batchSize){
ConcurrentBatch batch = new ConcurrentBatch(batchSize);
//hack java map: extends and override computeIfAbsent
Supplier<ConcurrentMap<Long, List<T>>> mapFactory = () -> new ConcurrentHashMap<Long, List<T>>() {
#Override
public List<T> computeIfAbsent(Long key, Function<? super Long, ? extends List<T>> mappingFunction) {
List<T> rs = super.computeIfAbsent(key, mappingFunction);
//apply batchFunc to old lists, when new batch list is created
if(rs.isEmpty()){
for(Entry<Long, List<T>> e : entrySet()) {
List<T> batchList = e.getValue();
//todo: need to improve
synchronized (batchList) {
if (batchList.size() == batch.getBatchSize()){
batchFunc.accept(batchList);
remove(e.getKey());
batchList.clear();
}
}
}
}
return rs;
}
};
stream.map(s -> new AbstractMap.SimpleEntry<>(batch.next(), s))
.collect(groupingByConcurrent(AbstractMap.SimpleEntry::getKey, mapFactory, mapping(AbstractMap.SimpleEntry::getValue, toList())))
.entrySet()
.stream()
//map contains only unprocessed lists (size<batchSize)
.forEach(e -> batchFunc.accept(e.getValue()));
}
This is a performant way
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;
public final class Partition<T> extends AbstractList<List<T>> {
private final List<T> list;
private final int chunkSize;
public Partition(List<T> list, int chunkSize) {
this.list = new ArrayList<>(list);
this.chunkSize = chunkSize;
}
public static <T> Partition<T> ofSize(List<T> list, int chunkSize) {
return new Partition<>(list, chunkSize);
}
#Override
public List<T> get(int index) {
int start = index * chunkSize;
int end = Math.min(start + chunkSize, list.size());
if (start > end) {
throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
}
return new ArrayList<>(list.subList(start, end));
}
#Override
public int size() {
return (int) Math.ceil((double) list.size() / (double) chunkSize);
}
}
Usage
Partition<String> partition = Partition.ofSize(paCustomerCodes, chunkSize);
for (List<String> strings : partition) {
}
Here is a pure Java 8 solution - both sequential and parallel:
public <T> Collection<List<T>> chunk(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.stream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// LinkedHashMap is used here just to preserve order
.collect(groupingBy(Entry::getKey, LinkedHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
public <T> Collection<List<T>> chunkParallel(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.parallelStream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// So far it is parallel processing ordering cannot be preserved,
// but we have to make it thread safe - using e.g. ConcurrentHashMap
.collect(groupingBy(Entry::getKey, ConcurrentHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
Here is quick solution by abacus-common
IntStream.range(0, Integer.MAX_VALUE).split(size).forEach(s -> N.println(s.toArray()));
Disclaimer:I'm the developer of abacus-common.

Aggregate Java 8 stream into stream of groups [duplicate]

How to implement "partition" operation on Java 8 Stream? By partition I mean, divide a stream into sub-streams of a given size. Somehow it will be identical to Guava Iterators.partition() method, just it's desirable that the partitions are lazily-evaluated Streams rather than List's.
It's impossible to partition the arbitrary source stream to the fixed size batches, because this will screw up the parallel processing. When processing in parallel you may not know how many elements in the first sub-task after the split, so you cannot create the partitions for the next sub-task until the first is fully processed.
However it is possible to create the stream of partitions from the random access List. Such feature is available, for example, in my StreamEx library:
List<Type> input = Arrays.asList(...);
Stream<List<Type>> stream = StreamEx.ofSubLists(input, partitionSize);
Or if you really want the stream of streams:
Stream<Stream<Type>> stream = StreamEx.ofSubLists(input, partitionSize).map(List::stream);
If you don't want to depend on third-party libraries, you can implement such ofSubLists method manually:
public static <T> Stream<List<T>> ofSubLists(List<T> source, int length) {
if (length <= 0)
throw new IllegalArgumentException("length = " + length);
int size = source.size();
if (size <= 0)
return Stream.empty();
int fullChunks = (size - 1) / length;
return IntStream.range(0, fullChunks + 1).mapToObj(
n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length));
}
This implementation looks a little bit long, but it takes into account some corner cases like close-to-MAX_VALUE list size.
If you want parallel-friendly solution for unordered stream (so you don't care which stream elements will be combined in single batch), you may use the collector like this (thanks to #sibnick for inspiration):
public static <T, A, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<List<T>, A, R> downstream) {
class Acc {
List<T> cur = new ArrayList<>();
A acc = downstream.supplier().get();
}
BiConsumer<Acc, T> accumulator = (acc, t) -> {
acc.cur.add(t);
if(acc.cur.size() == batchSize) {
downstream.accumulator().accept(acc.acc, acc.cur);
acc.cur = new ArrayList<>();
}
};
return Collector.of(Acc::new, accumulator,
(acc1, acc2) -> {
acc1.acc = downstream.combiner().apply(acc1.acc, acc2.acc);
for(T t : acc2.cur) accumulator.accept(acc1, t);
return acc1;
}, acc -> {
if(!acc.cur.isEmpty())
downstream.accumulator().accept(acc.acc, acc.cur);
return downstream.finisher().apply(acc.acc);
}, Collector.Characteristics.UNORDERED);
}
Usage example:
List<List<Integer>> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.toList()));
Result:
[[2, 3, 4], [7, 8, 9], [0, 1, 5], [12, 13, 14], [17, 18, 19], [10, 11, 15], [6, 16]]
Such collector is perfectly thread-safe and produces ordered batches for sequential stream.
If you want to apply an intermediate transformation for every batch, you may use the following version:
public static <T, AA, A, B, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<T, AA, B> batchCollector,
Collector<B, A, R> downstream) {
return unorderedBatches(batchSize,
Collectors.mapping(list -> list.stream().collect(batchCollector), downstream));
}
For example, this way you can sum the numbers in every batch on the fly:
List<Integer> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.summingInt(Integer::intValue),
Collectors.toList()));
I found an elegant solution: Iterable parts = Iterables.partition(stream::iterator, size)
Provided you want to use the Stream sequentially, it is possible to partition a Stream (as well as perform related functions such as windowing - which I think is what you really want in this case).
Two libraries that will support partitoning for standard Streams are cyclops-react (I am the author) and jOOλ which cyclops-react extends (to add functionality such as Windowing).
cyclops-streams has a collection of static functions StreamUtils for operating on Java Streams, and a series of functions such as splitAt, headAndTail, splitBy, partition for partitioning.
To window a Stream into a Stream of nested Streams of size 30 you can use the window method.
To the OPs point, in Streaming terms, splitting a Stream into multiple Streams of a given size is a Windowing operation (rather than a Partitioning operation).
Stream<Streamable<Integer>> streamOfStreams = StreamUtils.window(stream,30);
There is a Stream extension class called ReactiveSeq that extends jool.Seq and adds Windowing functionality, that may make the code a little cleaner.
ReactiveSeq<Integer> seq;
ReactiveSeq<ListX<Integer>> streamOfLists = seq.grouped(30);
As Tagir points out above though, this isn't suitable for parallel Streams. If you want to window or batch a Stream you wish to executed in a multithreaded fashion. LazyFutureStream in cyclops-reactmight be useful (Windowing is on the to-do list, but plain old batching is available now).
In this case data will be passed from the multiple threads executing the Stream to a Multi-Producer/Single-Consumer wait-free Queue and the sequential data from that queue can be windowed before being distributed to threads again.
Stream<List<Data>> batched = new LazyReact().range(0,1000)
.grouped(30)
.map(this::process);
It seem like, as Jon Skeet has shown in his comment, it's not possible to make partitions lazy. For non-lazy partitions, I already have this code:
public static <T> Stream<Stream<T>> partition(Stream<T> source, int size) {
final Iterator<T> it = source.iterator();
final Iterator<Stream<T>> partIt = Iterators.transform(Iterators.partition(it, size), List::stream);
final Iterable<Stream<T>> iterable = () -> partIt;
return StreamSupport.stream(iterable.spliterator(), false);
}
This is a pure Java solution that's evaluated lazily instead of using List.
public static <T> Stream<List<T>> partition(Stream<T> stream, int batchSize){
List<List<T>> currentBatch = new ArrayList<List<T>>(); //just to make it mutable
currentBatch.add(new ArrayList<T>(batchSize));
return Stream.concat(stream
.sequential()
.map(new Function<T, List<T>>(){
public List<T> apply(T t){
currentBatch.get(0).add(t);
return currentBatch.get(0).size() == batchSize ? currentBatch.set(0,new ArrayList<>(batchSize)): null;
}
}), Stream.generate(()->currentBatch.get(0).isEmpty()?null:currentBatch.get(0))
.limit(1)
).filter(Objects::nonNull);
}
The method returns Stream<List<T>> for flexibility. You can convert it to Stream<Stream<T>> easily by partition(something, 10).map(List::stream).
The most elegant and pure java 8 solution for this problem i found:
public static <T> List<List<T>> partition(final List<T> list, int batchSize) {
return IntStream.range(0, getNumberOfPartitions(list, batchSize))
.mapToObj(i -> list.subList(i * batchSize, Math.min((i + 1) * batchSize, list.size())))
.collect(toList());
}
//https://stackoverflow.com/questions/23246983/get-the-next-higher-integer-value-in-java
private static <T> int getNumberOfPartitions(List<T> list, int batchSize) {
return (list.size() + batchSize- 1) / batchSize;
}
I think it is possible with some sort of hack inside:
create utility class for batch:
public static class ConcurrentBatch {
private AtomicLong id = new AtomicLong();
private int batchSize;
public ConcurrentBatch(int batchSize) {
this.batchSize = batchSize;
}
public long next() {
return (id.getAndIncrement()) / batchSize;
}
public int getBatchSize() {
return batchSize;
}
}
and method:
public static <T> void applyConcurrentBatchToStream(Consumer<List<T>> batchFunc, Stream<T> stream, int batchSize){
ConcurrentBatch batch = new ConcurrentBatch(batchSize);
//hack java map: extends and override computeIfAbsent
Supplier<ConcurrentMap<Long, List<T>>> mapFactory = () -> new ConcurrentHashMap<Long, List<T>>() {
#Override
public List<T> computeIfAbsent(Long key, Function<? super Long, ? extends List<T>> mappingFunction) {
List<T> rs = super.computeIfAbsent(key, mappingFunction);
//apply batchFunc to old lists, when new batch list is created
if(rs.isEmpty()){
for(Entry<Long, List<T>> e : entrySet()) {
List<T> batchList = e.getValue();
//todo: need to improve
synchronized (batchList) {
if (batchList.size() == batch.getBatchSize()){
batchFunc.accept(batchList);
remove(e.getKey());
batchList.clear();
}
}
}
}
return rs;
}
};
stream.map(s -> new AbstractMap.SimpleEntry<>(batch.next(), s))
.collect(groupingByConcurrent(AbstractMap.SimpleEntry::getKey, mapFactory, mapping(AbstractMap.SimpleEntry::getValue, toList())))
.entrySet()
.stream()
//map contains only unprocessed lists (size<batchSize)
.forEach(e -> batchFunc.accept(e.getValue()));
}
This is a performant way
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;
public final class Partition<T> extends AbstractList<List<T>> {
private final List<T> list;
private final int chunkSize;
public Partition(List<T> list, int chunkSize) {
this.list = new ArrayList<>(list);
this.chunkSize = chunkSize;
}
public static <T> Partition<T> ofSize(List<T> list, int chunkSize) {
return new Partition<>(list, chunkSize);
}
#Override
public List<T> get(int index) {
int start = index * chunkSize;
int end = Math.min(start + chunkSize, list.size());
if (start > end) {
throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
}
return new ArrayList<>(list.subList(start, end));
}
#Override
public int size() {
return (int) Math.ceil((double) list.size() / (double) chunkSize);
}
}
Usage
Partition<String> partition = Partition.ofSize(paCustomerCodes, chunkSize);
for (List<String> strings : partition) {
}
Here is a pure Java 8 solution - both sequential and parallel:
public <T> Collection<List<T>> chunk(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.stream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// LinkedHashMap is used here just to preserve order
.collect(groupingBy(Entry::getKey, LinkedHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
public <T> Collection<List<T>> chunkParallel(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.parallelStream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// So far it is parallel processing ordering cannot be preserved,
// but we have to make it thread safe - using e.g. ConcurrentHashMap
.collect(groupingBy(Entry::getKey, ConcurrentHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
Here is quick solution by abacus-common
IntStream.range(0, Integer.MAX_VALUE).split(size).forEach(s -> N.println(s.toArray()));
Disclaimer:I'm the developer of abacus-common.

Java 8 stream reverse order

General question: What's the proper way to reverse a stream? Assuming that we don't know what type of elements that stream consists of, what's the generic way to reverse any stream?
Specific question:
IntStream provides range method to generate Integers in specific range IntStream.range(-range, 0), now that I want to reverse it switching range from 0 to negative won't work, also I can't use Integer::compare
List<Integer> list = Arrays.asList(1,2,3,4);
list.stream().sorted(Integer::compare).forEach(System.out::println);
with IntStream I'll get this compiler error
Error:(191, 0) ajc: The method sorted() in the type IntStream is not applicable for the arguments (Integer::compare)
what am I missing here?
For the specific question of generating a reverse IntStream, try something like this:
static IntStream revRange(int from, int to) {
return IntStream.range(from, to)
.map(i -> to - i + from - 1);
}
This avoids boxing and sorting.
For the general question of how to reverse a stream of any type, I don't know of there's a "proper" way. There are a couple ways I can think of. Both end up storing the stream elements. I don't know of a way to reverse a stream without storing the elements.
This first way stores the elements into an array and reads them out to a stream in reverse order. Note that since we don't know the runtime type of the stream elements, we can't type the array properly, requiring an unchecked cast.
#SuppressWarnings("unchecked")
static <T> Stream<T> reverse(Stream<T> input) {
Object[] temp = input.toArray();
return (Stream<T>) IntStream.range(0, temp.length)
.mapToObj(i -> temp[temp.length - i - 1]);
}
Another technique uses collectors to accumulate the items into a reversed list. This does lots of insertions at the front of ArrayList objects, so there's lots of copying going on.
Stream<T> input = ... ;
List<T> output =
input.collect(ArrayList::new,
(list, e) -> list.add(0, e),
(list1, list2) -> list1.addAll(0, list2));
It's probably possible to write a much more efficient reversing collector using some kind of customized data structure.
UPDATE 2016-01-29
Since this question has gotten a bit of attention recently, I figure I should update my answer to solve the problem with inserting at the front of ArrayList. This will be horribly inefficient with a large number of elements, requiring O(N^2) copying.
It's preferable to use an ArrayDeque instead, which efficiently supports insertion at the front. A small wrinkle is that we can't use the three-arg form of Stream.collect(); it requires the contents of the second arg be merged into the first arg, and there's no "add-all-at-front" bulk operation on Deque. Instead, we use addAll() to append the contents of the first arg to the end of the second, and then we return the second. This requires using the Collector.of() factory method.
The complete code is this:
Deque<String> output =
input.collect(Collector.of(
ArrayDeque::new,
(deq, t) -> deq.addFirst(t),
(d1, d2) -> { d2.addAll(d1); return d2; }));
The result is a Deque instead of a List, but that shouldn't be much of an issue, as it can easily be iterated or streamed in the now-reversed order.
Elegant solution
List<Integer> list = Arrays.asList(1,2,3,4);
list.stream()
.sorted(Collections.reverseOrder()) // Method on Stream<Integer>
.forEach(System.out::println);
General Question:
Stream does not store any elements.
So iterating elements in the reverse order is not possible without storing the elements in some intermediate collection.
Stream.of("1", "2", "20", "3")
.collect(Collectors.toCollection(ArrayDeque::new)) // or LinkedList
.descendingIterator()
.forEachRemaining(System.out::println);
Update: Changed LinkedList to ArrayDeque (better) see here for details
Prints:
3
20
2
1
By the way, using sort method is not correct as it sorts, NOT reverses (assuming stream may have unordered elements)
Specific Question:
I found this simple, easier and intuitive(Copied #Holger comment)
IntStream.iterate(to - 1, i -> i - 1).limit(to - from)
Many of the solutions here sort or reverse the IntStream, but that unnecessarily requires intermediate storage. Stuart Marks's solution is the way to go:
static IntStream revRange(int from, int to) {
return IntStream.range(from, to).map(i -> to - i + from - 1);
}
It correctly handles overflow as well, passing this test:
#Test
public void testRevRange() {
assertArrayEquals(revRange(0, 5).toArray(), new int[]{4, 3, 2, 1, 0});
assertArrayEquals(revRange(-5, 0).toArray(), new int[]{-1, -2, -3, -4, -5});
assertArrayEquals(revRange(1, 4).toArray(), new int[]{3, 2, 1});
assertArrayEquals(revRange(0, 0).toArray(), new int[0]);
assertArrayEquals(revRange(0, -1).toArray(), new int[0]);
assertArrayEquals(revRange(MIN_VALUE, MIN_VALUE).toArray(), new int[0]);
assertArrayEquals(revRange(MAX_VALUE, MAX_VALUE).toArray(), new int[0]);
assertArrayEquals(revRange(MIN_VALUE, MIN_VALUE + 1).toArray(), new int[]{MIN_VALUE});
assertArrayEquals(revRange(MAX_VALUE - 1, MAX_VALUE).toArray(), new int[]{MAX_VALUE - 1});
}
How NOT to do it:
Don't use .sorted(Comparator.reverseOrder()) or .sorted(Collections.reverseOrder()), because it will just sort elements in the descending order.
Using it for given Integer input:
[1, 4, 2, 5, 3]
the output would be as follows:
[5, 4, 3, 2, 1]
For String input:
["A", "D", "B", "E", "C"]
the output would be as follows:
[E, D, C, B, A]
Don't use .sorted((a, b) -> -1) (explanation at the end)
The easiest way to do it properly:
List<Integer> list = Arrays.asList(1, 4, 2, 5, 3);
Collections.reverse(list);
System.out.println(list);
Output:
[3, 5, 2, 4, 1]
The same for String:
List<String> stringList = Arrays.asList("A", "D", "B", "E", "C");
Collections.reverse(stringList);
System.out.println(stringList);
Output:
[C, E, B, D, A]
Don't use .sorted((a, b) -> -1)!
It breaks comparator contract and might work only for some cases ie. only on single thread but not in parallel.
yankee explanation:
(a, b) -> -1 breaks the contract for Comparator. Whether this works depends on the implementation of the sort algorithm. The next release of the JVM might break this. Actually I can already break this reproduciblly on my machine using IntStream.range(0, 10000).parallel().boxed().sorted((a, b) -> -1).forEachOrdered(System.out::println);
//Don't use this!!!
List<Integer> list = Arrays.asList(1, 4, 2, 5, 3);
List<Integer> reversedList = list.stream()
.sorted((a, b) -> -1)
.collect(Collectors.toList());
System.out.println(reversedList);
Output in positive case:
[3, 5, 2, 4, 1]
Possible output in parallel stream or with other JVM implementation:
[4, 1, 2, 3, 5]
The same for String:
//Don't use this!!!
List<String> stringList = Arrays.asList("A", "D", "B", "E", "C");
List<String> reversedStringList = stringList.stream()
.sorted((a, b) -> -1)
.collect(Collectors.toList());
System.out.println(reversedStringList);
Output in positive case:
[C, E, B, D, A]
Possible output in parallel stream or with other JVM implementation:
[A, E, B, D, C]
without external lib...
import java.util.List;
import java.util.Collections;
import java.util.stream.Collector;
public class MyCollectors {
public static <T> Collector<T, ?, List<T>> toListReversed() {
return Collectors.collectingAndThen(Collectors.toList(), l -> {
Collections.reverse(l);
return l;
});
}
}
If implemented Comparable<T> (ex. Integer, String, Date), you can do it using Comparator.reverseOrder().
List<Integer> list = Arrays.asList(1, 2, 3, 4);
list.stream()
.sorted(Comparator.reverseOrder())
.forEach(System.out::println);
You could define your own collector that collects the elements in reverse order:
public static <T> Collector<T, List<T>, List<T>> inReverse() {
return Collector.of(
ArrayList::new,
(l, t) -> l.add(t),
(l, r) -> {l.addAll(r); return l;},
Lists::<T>reverse);
}
And use it like:
stream.collect(inReverse()).forEach(t -> ...)
I use an ArrayList in forward order to efficiently insert collect the items (at the end of the list), and Guava Lists.reverse to efficiently give a reversed view of the list without making another copy of it.
Here are some test cases for the custom collector:
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.*;
import java.util.ArrayList;
import java.util.List;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;
import org.hamcrest.Matchers;
import org.junit.Test;
import com.google.common.collect.Lists;
public class TestReverseCollector {
private final Object t1 = new Object();
private final Object t2 = new Object();
private final Object t3 = new Object();
private final Object t4 = new Object();
private final Collector<Object, List<Object>, List<Object>> inReverse = inReverse();
private final Supplier<List<Object>> supplier = inReverse.supplier();
private final BiConsumer<List<Object>, Object> accumulator = inReverse.accumulator();
private final Function<List<Object>, List<Object>> finisher = inReverse.finisher();
private final BinaryOperator<List<Object>> combiner = inReverse.combiner();
#Test public void associative() {
final List<Object> a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
final List<Object> r1 = finisher.apply(a1);
final List<Object> a2 = supplier.get();
accumulator.accept(a2, t1);
final List<Object> a3 = supplier.get();
accumulator.accept(a3, t2);
final List<Object> r2 = finisher.apply(combiner.apply(a2, a3));
assertThat(r1, Matchers.equalTo(r2));
}
#Test public void identity() {
final List<Object> a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
final List<Object> r1 = finisher.apply(a1);
final List<Object> a2 = supplier.get();
accumulator.accept(a2, t1);
accumulator.accept(a2, t2);
final List<Object> r2 = finisher.apply(combiner.apply(a2, supplier.get()));
assertThat(r1, equalTo(r2));
}
#Test public void reversing() throws Exception {
final List<Object> a2 = supplier.get();
accumulator.accept(a2, t1);
accumulator.accept(a2, t2);
final List<Object> a3 = supplier.get();
accumulator.accept(a3, t3);
accumulator.accept(a3, t4);
final List<Object> r2 = finisher.apply(combiner.apply(a2, a3));
assertThat(r2, contains(t4, t3, t2, t1));
}
public static <T> Collector<T, List<T>, List<T>> inReverse() {
return Collector.of(
ArrayList::new,
(l, t) -> l.add(t),
(l, r) -> {l.addAll(r); return l;},
Lists::<T>reverse);
}
}
cyclops-react StreamUtils has a reverse Stream method (javadoc).
StreamUtils.reverse(Stream.of("1", "2", "20", "3"))
.forEach(System.out::println);
It works by collecting to an ArrayList and then making use of the ListIterator class which can iterate in either direction, to iterate backwards over the list.
If you already have a List, it will be more efficient
StreamUtils.reversedStream(Arrays.asList("1", "2", "20", "3"))
.forEach(System.out::println);
Here's the solution I've come up with:
private static final Comparator<Integer> BY_ASCENDING_ORDER = Integer::compare;
private static final Comparator<Integer> BY_DESCENDING_ORDER = BY_ASCENDING_ORDER.reversed();
then using those comparators:
IntStream.range(-range, 0).boxed().sorted(BY_DESCENDING_ORDER).forEach(// etc...
I would suggest using jOOλ, it's a great library that adds lots of useful functionality to Java 8 streams and lambdas.
You can then do the following:
List<Integer> list = Arrays.asList(1,2,3,4);
Seq.seq(list).reverse().forEach(System.out::println)
Simple as that. It's a pretty lightweight library, and well worth adding to any Java 8 project.
How about this utility method?
public static <T> Stream<T> getReverseStream(List<T> list) {
final ListIterator<T> listIt = list.listIterator(list.size());
final Iterator<T> reverseIterator = new Iterator<T>() {
#Override
public boolean hasNext() {
return listIt.hasPrevious();
}
#Override
public T next() {
return listIt.previous();
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
reverseIterator,
Spliterator.ORDERED | Spliterator.IMMUTABLE), false);
}
Seems to work with all cases without duplication.
With regard to the specific question of generating a reverse IntStream:
starting from Java 9 you can use the three-argument version of the IntStream.iterate(...):
IntStream.iterate(10, x -> x >= 0, x -> x - 1).forEach(System.out::println);
// Out: 10 9 8 7 6 5 4 3 2 1 0
where:
IntStream.iterate​(int seed, IntPredicate hasNext, IntUnaryOperator next);
seed - the initial element;
hasNext - a predicate to apply to elements to determine when the
stream must terminate;
next - a function to be applied to the previous element to produce a
new element.
Simplest way (simple collect - supports parallel streams):
public static <T> Stream<T> reverse(Stream<T> stream) {
return stream
.collect(Collector.of(
() -> new ArrayDeque<T>(),
ArrayDeque::addFirst,
(q1, q2) -> { q2.addAll(q1); return q2; })
)
.stream();
}
Advanced way (supports parallel streams in an ongoing way):
public static <T> Stream<T> reverse(Stream<T> stream) {
Objects.requireNonNull(stream, "stream");
class ReverseSpliterator implements Spliterator<T> {
private Spliterator<T> spliterator;
private final Deque<T> deque = new ArrayDeque<>();
private ReverseSpliterator(Spliterator<T> spliterator) {
this.spliterator = spliterator;
}
#Override
#SuppressWarnings({"StatementWithEmptyBody"})
public boolean tryAdvance(Consumer<? super T> action) {
while(spliterator.tryAdvance(deque::addFirst));
if(!deque.isEmpty()) {
action.accept(deque.remove());
return true;
}
return false;
}
#Override
public Spliterator<T> trySplit() {
// After traveling started the spliterator don't contain elements!
Spliterator<T> prev = spliterator.trySplit();
if(prev == null) {
return null;
}
Spliterator<T> me = spliterator;
spliterator = prev;
return new ReverseSpliterator(me);
}
#Override
public long estimateSize() {
return spliterator.estimateSize();
}
#Override
public int characteristics() {
return spliterator.characteristics();
}
#Override
public Comparator<? super T> getComparator() {
Comparator<? super T> comparator = spliterator.getComparator();
return (comparator != null) ? comparator.reversed() : null;
}
#Override
public void forEachRemaining(Consumer<? super T> action) {
// Ensure that tryAdvance is called at least once
if(!deque.isEmpty() || tryAdvance(action)) {
deque.forEach(action);
}
}
}
return StreamSupport.stream(new ReverseSpliterator(stream.spliterator()), stream.isParallel());
}
Note you can quickly extends to other type of streams (IntStream, ...).
Testing:
// Use parallel if you wish only
revert(Stream.of("One", "Two", "Three", "Four", "Five", "Six").parallel())
.forEachOrdered(System.out::println);
Results:
Six
Five
Four
Three
Two
One
Additional notes: The simplest way it isn't so useful when used with other stream operations (the collect join breaks the parallelism). The advance way doesn't have that issue, and it keeps also the initial characteristics of the stream, for example SORTED, and so, it's the way to go to use with other stream operations after the reverse.
ArrayDeque are faster in the stack than a Stack or LinkedList. "push()" inserts elements at the front of the Deque
protected <T> Stream<T> reverse(Stream<T> stream) {
ArrayDeque<T> stack = new ArrayDeque<>();
stream.forEach(stack::push);
return stack.stream();
}
List newStream = list.stream().sorted(Collections.reverseOrder()).collect(Collectors.toList());
newStream.forEach(System.out::println);
One could write a collector that collects elements in reversed order:
public static <T> Collector<T, ?, Stream<T>> reversed() {
return Collectors.collectingAndThen(Collectors.toList(), list -> {
Collections.reverse(list);
return list.stream();
});
}
And use it like this:
Stream.of(1, 2, 3, 4, 5).collect(reversed()).forEach(System.out::println);
Original answer (contains a bug - it does not work correctly for parallel streams):
A general purpose stream reverse method could look like:
public static <T> Stream<T> reverse(Stream<T> stream) {
LinkedList<T> stack = new LinkedList<>();
stream.forEach(stack::push);
return stack.stream();
}
For reference I was looking at the same problem, I wanted to join the string value of stream elements in the reverse order.
itemList = { last, middle, first } => first,middle,last
I started to use an intermediate collection with collectingAndThen from comonad or the ArrayDeque collector of Stuart Marks, although I wasn't happy with intermediate collection, and streaming again
itemList.stream()
.map(TheObject::toString)
.collect(Collectors.collectingAndThen(Collectors.toList(),
strings -> {
Collections.reverse(strings);
return strings;
}))
.stream()
.collect(Collector.joining());
So I iterated over Stuart Marks answer that was using the Collector.of factory, that has the interesting finisher lambda.
itemList.stream()
.collect(Collector.of(StringBuilder::new,
(sb, o) -> sb.insert(0, o),
(r1, r2) -> { r1.insert(0, r2); return r1; },
StringBuilder::toString));
Since in this case the stream is not parallel, the combiner is not relevant that much, I'm using insert anyway for the sake of code consistency but it does not matter as it would depend of which stringbuilder is built first.
I looked at the StringJoiner, however it does not have an insert method.
Not purely Java8 but if you use guava's Lists.reverse() method in conjunction, you can easily achieve this:
List<Integer> list = Arrays.asList(1,2,3,4);
Lists.reverse(list).stream().forEach(System.out::println);
Reversing string or any Array
(Stream.of("abcdefghijklm 1234567".split("")).collect(Collectors.collectingAndThen(Collectors.toList(),list -> {Collections.reverse(list);return list;}))).stream().forEach(System.out::println);
split can be modified based on the delimiter or space
How about reversing the Collection backing the stream prior?
import java.util.Collections;
import java.util.List;
public void reverseTest(List<Integer> sampleCollection) {
Collections.reverse(sampleCollection); // remember this reverses the elements in the list, so if you want the original input collection to remain untouched clone it first.
sampleCollection.stream().forEach(item -> {
// you op here
});
}
Answering specific question of reversing with IntStream, below worked for me:
IntStream.range(0, 10)
.map(x -> x * -1)
.sorted()
.map(Math::abs)
.forEach(System.out::println);
In all this I don't see the answer I would go to first.
This isn't exactly a direct answer to the question, but it's a potential solution to the problem.
Just build the list backwards in the first place. If you can, use a LinkedList instead of an ArrayList and when you add items use "Push" instead of add. The list will be built in the reverse order and will then stream correctly without any manipulation.
This won't fit cases where you are dealing with primitive arrays or lists that are already used in various ways but does work well in a surprising number of cases.
the simplest solution is using List::listIterator and Stream::generate
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
ListIterator<Integer> listIterator = list.listIterator(list.size());
Stream.generate(listIterator::previous)
.limit(list.size())
.forEach(System.out::println);
This method works with any Stream and is Java 8 compliant:
Stream<Integer> myStream = Stream.of(1, 2, 3, 4, 5);
myStream.reduce(Stream.empty(),
(Stream<Integer> a, Integer b) -> Stream.concat(Stream.of(b), a),
(a, b) -> Stream.concat(b, a))
.forEach(System.out::println);
This is how I do it.
I don't like the idea of creating a new collection and reverse iterating it.
The IntStream#map idea is pretty neat, but I prefer the IntStream#iterate method, for I think the idea of a countdown to Zero better expressed with the iterate method and easier to understand in terms of walking the array from back to front.
import static java.lang.Math.max;
private static final double EXACT_MATCH = 0d;
public static IntStream reverseStream(final int[] array) {
return countdownFrom(array.length - 1).map(index -> array[index]);
}
public static DoubleStream reverseStream(final double[] array) {
return countdownFrom(array.length - 1).mapToDouble(index -> array[index]);
}
public static <T> Stream<T> reverseStream(final T[] array) {
return countdownFrom(array.length - 1).mapToObj(index -> array[index]);
}
public static IntStream countdownFrom(final int top) {
return IntStream.iterate(top, t -> t - 1).limit(max(0, (long) top + 1));
}
Here are some tests to prove it works:
import static java.lang.Integer.MAX_VALUE;
import static org.junit.Assert.*;
#Test
public void testReverseStream_emptyArrayCreatesEmptyStream() {
Assert.assertEquals(0, reverseStream(new double[0]).count());
}
#Test
public void testReverseStream_singleElementCreatesSingleElementStream() {
Assert.assertEquals(1, reverseStream(new double[1]).count());
final double[] singleElementArray = new double[] { 123.4 };
assertArrayEquals(singleElementArray, reverseStream(singleElementArray).toArray(), EXACT_MATCH);
}
#Test
public void testReverseStream_multipleElementsAreStreamedInReversedOrder() {
final double[] arr = new double[] { 1d, 2d, 3d };
final double[] revArr = new double[] { 3d, 2d, 1d };
Assert.assertEquals(arr.length, reverseStream(arr).count());
Assert.assertArrayEquals(revArr, reverseStream(arr).toArray(), EXACT_MATCH);
}
#Test
public void testCountdownFrom_returnsAllElementsFromTopToZeroInReverseOrder() {
assertArrayEquals(new int[] { 4, 3, 2, 1, 0 }, countdownFrom(4).toArray());
}
#Test
public void testCountdownFrom_countingDownStartingWithZeroOutputsTheNumberZero() {
assertArrayEquals(new int[] { 0 }, countdownFrom(0).toArray());
}
#Test
public void testCountdownFrom_doesNotChokeOnIntegerMaxValue() {
assertEquals(true, countdownFrom(MAX_VALUE).anyMatch(x -> x == MAX_VALUE));
}
#Test
public void testCountdownFrom_givesZeroLengthCountForNegativeValues() {
assertArrayEquals(new int[0], countdownFrom(-1).toArray());
assertArrayEquals(new int[0], countdownFrom(-4).toArray());
}
Based on #stuart-marks's answer, but without casting, function returning stream of list elements starting from end:
public static <T> Stream<T> reversedStream(List<T> tList) {
final int size = tList.size();
return IntStream.range(0, size)
.mapToObj(i -> tList.get(size - 1 - i));
}
// usage
reversedStream(list).forEach(System.out::println);
What's the proper generic way to reverse a stream?
If the stream does not specify an encounter order, don't.
(!s.spliterator().hasCharacteristics(java.util.Spliterator.ORDERED))
The most generic and the easiest way to reverse a list will be :
public static <T> void reverseHelper(List<T> li){
li.stream()
.sorted((x,y)-> -1)
.collect(Collectors.toList())
.forEach(System.out::println);
}
Java 8 way to do this:
List<Integer> list = Arrays.asList(1,2,3,4);
Comparator<Integer> comparator = Integer::compare;
list.stream().sorted(comparator.reversed()).forEach(System.out::println);

Java 8 Distinct by property

In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?
For example I have a list of Person object and I want to remove people with the same name,
persons.stream().distinct();
Will use the default equality check for a Person object, so I need something like,
persons.stream().distinct(p -> p.getName());
Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?
Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Then you can write:
persons.stream().filter(distinctByKey(Person::getName))
Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.
(This is essentially the same as my answer to this question: Java Lambda Stream Distinct() on arbitrary key?)
An alternative would be to place the persons in a map using the name as a key:
persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();
Note that the Person that is kept, in case of a duplicate name, will be the first encontered.
You can wrap the person objects into another class, that only compares the names of the persons. Afterward, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:
persons.stream()
.map(Wrapper::new)
.distinct()
.map(Wrapper::unwrap)
...;
The class Wrapper might look as follows:
class Wrapper {
private final Person person;
public Wrapper(Person person) {
this.person = person;
}
public Person unwrap() {
return person;
}
public boolean equals(Object other) {
if (other instanceof Wrapper) {
return ((Wrapper) other).person.getName().equals(person.getName());
} else {
return false;
}
}
public int hashCode() {
return person.getName().hashCode();
}
}
Another solution, using Set. May not be the ideal solution, but it works
Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());
Or if you can modify the original list, you can use removeIf method
persons.removeIf(p -> !set.add(p.getName()));
There's a simpler approach using a TreeSet with a custom comparator.
persons.stream()
.collect(Collectors.toCollection(
() -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName()))
));
We can also use RxJava (very powerful reactive extension library)
Observable.from(persons).distinct(Person::getName)
or
Observable.from(persons).distinct(p -> p.getName())
You can use groupingBy collector:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));
If you want to have another stream you can use this:
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));
You can use the distinct(HashingStrategy) method in Eclipse Collections.
List<Person> persons = ...;
MutableList<Person> distinct =
ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));
If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.
MutableList<Person> persons = ...;
MutableList<Person> distinct =
persons.distinct(HashingStrategies.fromFunction(Person::getName));
HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
Similar approach which Saeed Zarinfam used but more Java 8 style:)
persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
.map(plans -> plans.stream().findFirst().get())
.collect(toList());
You can use StreamEx library:
StreamEx.of(persons)
.distinct(Person::getName)
.toList()
I recommend using Vavr, if you can. With this library you can do the following:
io.vavr.collection.List.ofAll(persons)
.distinctBy(Person::getName)
.toJavaSet() // or any another Java 8 Collection
Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
final Set<Object> seen = new HashSet<>();
return t -> seen.add(keyExtractor.apply(t));
}
Then call:
persons.stream().filter(distinctByKey(p -> p.getName());
My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.
List<YourPersonClass> listWithDistinctPersons = persons.stream()
//operators to remove duplicates based on person name
.collect(Collectors.groupingBy(p -> p.getName()))
.values()
.stream()
//cut short the groups to size of 1
.flatMap(group -> group.stream().limit(1))
//collect distinct users as list
.collect(Collectors.toList());
Distinct objects list can be found using:
List distinctPersons = persons.stream()
.collect(Collectors.collectingAndThen(
Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
ArrayList::new));
I made a generic version:
private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
return Collectors.collectingAndThen(
toMap(
keyExtractor,
t -> t,
(t1, t2) -> t1
),
(Map<R, T> map) -> map.values().stream()
);
}
An exemple:
Stream.of(new Person("Jean"),
new Person("Jean"),
new Person("Paul")
)
.filter(...)
.collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
.map(...)
.collect(toList())
Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:
Seq.seq(persons).distinct(Person::getName).toList();
Under the hood, it does practically the same thing as the accepted answer, though.
Set<YourPropertyType> set = new HashSet<>();
list
.stream()
.filter(it -> set.add(it.getYourProperty()))
.forEach(it -> ...);
While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set".
Things get even worse if list is above size of 10.
Consider you have a collection of 20 Objects, like this:
public static final List<SimpleEvent> testList = Arrays.asList(
new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));
Where you object SimpleEvent looks like this:
public class SimpleEvent {
private String name;
private String type;
public SimpleEvent(String name) {
this.name = name;
this.type = "type_"+name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
}
And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = testList
.stream()
.filter(distinctByKey(SimpleEvent::getName))
.map(SimpleEvent::getName)
.collect(Collectors.toSet());
blackhole.consume(uniqueNames);
}
#Benchmark
#OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
Set<String> uniqueNames = new HashSet<>();
for (SimpleEvent event : testList) {
uniqueNames.add(event.getName());
}
blackhole.consume(uniqueNames);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MyBenchmark.class.getSimpleName())
.forks(1)
.mode(Mode.Throughput)
.warmupBatchSize(3)
.warmupIterations(3)
.measurementIterations(3)
.build();
new Runner(opt).run();
}
Then you'll have Benchmark results like this:
Benchmark Mode Samples Score Score error Units
c.s.MyBenchmark.aForEachBasedUniqueSet thrpt 3 2635199.952 1663320.718 ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet thrpt 3 729134.695 895825.697 ops/s
And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.
Higher the throughput, better the performance
I would like to improve Stuart Marks answer. What if the key is null, it will through NullPointerException. Here I ignore the null key by adding one more check as keyExtractor.apply(t)!=null.
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));
}
This works like a charm:
Grouping the data by unique key to form a map.
Returning the first object from every value of the map (There could be multiple person having same name).
persons.stream()
.collect(groupingBy(Person::getName))
.values()
.stream()
.flatMap(values -> values.stream().limit(1))
.collect(toList());
The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:
Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
Person previous;
public boolean test(Person p) {
if(previous!=null && c.compare(previous, p)==0)
return false;
previous=p;
return true;
}
})./* more stream operations here */;
Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.
There are lot of approaches, this one will also help - Simple, Clean and Clear
List<Employee> employees = new ArrayList<>();
employees.add(new Employee(11, "Ravi"));
employees.add(new Employee(12, "Stalin"));
employees.add(new Employee(23, "Anbu"));
employees.add(new Employee(24, "Yuvaraj"));
employees.add(new Employee(35, "Sena"));
employees.add(new Employee(36, "Antony"));
employees.add(new Employee(47, "Sena"));
employees.add(new Employee(48, "Ravi"));
List<Employee> empList = new ArrayList<>(employees.stream().collect(
Collectors.toMap(Employee::getName, obj -> obj,
(existingValue, newValue) -> existingValue))
.values());
empList.forEach(System.out::println);
// Collectors.toMap(
// Employee::getName, - key (the value by which you want to eliminate duplicate)
// obj -> obj, - value (entire employee object)
// (existingValue, newValue) -> existingValue) - to avoid illegalstateexception: duplicate key
Output - toString() overloaded
Employee{id=35, name='Sena'}
Employee{id=12, name='Stalin'}
Employee{id=11, name='Ravi'}
Employee{id=24, name='Yuvaraj'}
Employee{id=36, name='Antony'}
Employee{id=23, name='Anbu'}
Here is the example
public class PayRoll {
private int payRollId;
private int id;
private String name;
private String dept;
private int salary;
public PayRoll(int payRollId, int id, String name, String dept, int salary) {
super();
this.payRollId = payRollId;
this.id = id;
this.name = name;
this.dept = dept;
this.salary = salary;
}
}
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;
public class Prac {
public static void main(String[] args) {
int salary=70000;
PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
PayRoll payRoll2=new PayRoll(1411, 2 , "B", "Technical", salary);
PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
List<PayRoll>list=new ArrayList<PayRoll>();
list.add(payRoll);
list.add(payRoll2);
list.add(payRoll3);
list.add(payRoll4);
list.add(payRoll5);
list.add(payRoll6);
Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));
k.entrySet().forEach(p->
{
if(p.getValue().isPresent())
{
System.out.println(p.getValue().get());
}
});
}
}
Output:
PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]
Late to the party but I sometimes use this one-liner as an equivalent:
((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply
The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.
Building on #josketres's answer, I created a generic utility method:
You could make this more Java 8-friendly by creating a Collector.
public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
return input.stream()
.collect(toCollection(() -> new TreeSet<>(comparer)));
}
#Test
public void removeDuplicatesWithDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(7), new C(42), new C(42));
Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
assertEquals(2, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 7));
assertTrue(result.stream().anyMatch(c -> c.value == 42));
}
#Test
public void removeDuplicatesWithoutDuplicates() {
ArrayList<C> input = new ArrayList<>();
Collections.addAll(input, new C(1), new C(2), new C(3));
Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
assertEquals(3, result.size());
assertTrue(result.stream().anyMatch(c -> c.value == 1));
assertTrue(result.stream().anyMatch(c -> c.value == 2));
assertTrue(result.stream().anyMatch(c -> c.value == 3));
}
private class C {
public final int value;
private C(int value) {
this.value = value;
}
}
Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). Stream partition answer by Tagir Valeev inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.
public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
(map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
(left, right) -> {
left.putAll(right);
return left;
}, map -> new ArrayList<>(map.values()),
Collector.Characteristics.UNORDERED)); }
I had a situation, where I was suppose to get distinct elements from list based on 2 keys.
If you want distinct based on two keys or may composite key, try this
class Person{
int rollno;
String name;
}
List<Person> personList;
Function<Person, List<Object>> compositeKey = personList->
Arrays.<Object>asList(personList.getName(), personList.getRollno());
Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));
List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
.filter(settingMap ->
settingMap.getValue().size() > 1)
.collect(Collectors.toList());
A variation of the top answer that handles null:
public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
}
In my tests:
assertEquals(
asList("a", "bb"),
Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));
assertEquals(
asList(5, null, 2, 3),
Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));
val maps = asList(
hashMapWith(0, 2),
hashMapWith(1, 2),
hashMapWith(2, null),
hashMapWith(3, 1),
hashMapWith(4, null),
hashMapWith(5, 2));
assertEquals(
asList(0, 2, 3),
maps.stream()
.filter(distinctBy(m -> m.get("val")))
.map(m -> m.get("i"))
.collect(toList()));
In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.
public List<Log> fetchLogById(Long id) {
return this.findLogById(id).stream()
.filter(new LogPredicate())
.collect(Collectors.toList());
}
public class LogPredicate implements Predicate<Log> {
private Log previous;
public boolean test(Log atual) {
boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);
if (isDifferent) {
previous = current;
}
return isDifferent;
}
private boolean verifyIfDifferentLog(Log current, Log previous) {
return !current.getId().equals(previous.getId());
}
}
My solution in this listing:
List<HolderEntry> result ....
List<HolderEntry> dto3s = new ArrayList<>(result.stream().collect(toMap(
HolderEntry::getId,
holder -> holder, //or Function.identity() if you want
(holder1, holder2) -> holder1
)).values());
In my situation i want to find distinct values and put their in List.

Categories