How to implement "partition" operation on Java 8 Stream? By partition I mean, divide a stream into sub-streams of a given size. Somehow it will be identical to Guava Iterators.partition() method, just it's desirable that the partitions are lazily-evaluated Streams rather than List's.
It's impossible to partition the arbitrary source stream to the fixed size batches, because this will screw up the parallel processing. When processing in parallel you may not know how many elements in the first sub-task after the split, so you cannot create the partitions for the next sub-task until the first is fully processed.
However it is possible to create the stream of partitions from the random access List. Such feature is available, for example, in my StreamEx library:
List<Type> input = Arrays.asList(...);
Stream<List<Type>> stream = StreamEx.ofSubLists(input, partitionSize);
Or if you really want the stream of streams:
Stream<Stream<Type>> stream = StreamEx.ofSubLists(input, partitionSize).map(List::stream);
If you don't want to depend on third-party libraries, you can implement such ofSubLists method manually:
public static <T> Stream<List<T>> ofSubLists(List<T> source, int length) {
if (length <= 0)
throw new IllegalArgumentException("length = " + length);
int size = source.size();
if (size <= 0)
return Stream.empty();
int fullChunks = (size - 1) / length;
return IntStream.range(0, fullChunks + 1).mapToObj(
n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length));
}
This implementation looks a little bit long, but it takes into account some corner cases like close-to-MAX_VALUE list size.
If you want parallel-friendly solution for unordered stream (so you don't care which stream elements will be combined in single batch), you may use the collector like this (thanks to #sibnick for inspiration):
public static <T, A, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<List<T>, A, R> downstream) {
class Acc {
List<T> cur = new ArrayList<>();
A acc = downstream.supplier().get();
}
BiConsumer<Acc, T> accumulator = (acc, t) -> {
acc.cur.add(t);
if(acc.cur.size() == batchSize) {
downstream.accumulator().accept(acc.acc, acc.cur);
acc.cur = new ArrayList<>();
}
};
return Collector.of(Acc::new, accumulator,
(acc1, acc2) -> {
acc1.acc = downstream.combiner().apply(acc1.acc, acc2.acc);
for(T t : acc2.cur) accumulator.accept(acc1, t);
return acc1;
}, acc -> {
if(!acc.cur.isEmpty())
downstream.accumulator().accept(acc.acc, acc.cur);
return downstream.finisher().apply(acc.acc);
}, Collector.Characteristics.UNORDERED);
}
Usage example:
List<List<Integer>> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.toList()));
Result:
[[2, 3, 4], [7, 8, 9], [0, 1, 5], [12, 13, 14], [17, 18, 19], [10, 11, 15], [6, 16]]
Such collector is perfectly thread-safe and produces ordered batches for sequential stream.
If you want to apply an intermediate transformation for every batch, you may use the following version:
public static <T, AA, A, B, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<T, AA, B> batchCollector,
Collector<B, A, R> downstream) {
return unorderedBatches(batchSize,
Collectors.mapping(list -> list.stream().collect(batchCollector), downstream));
}
For example, this way you can sum the numbers in every batch on the fly:
List<Integer> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.summingInt(Integer::intValue),
Collectors.toList()));
I found an elegant solution: Iterable parts = Iterables.partition(stream::iterator, size)
Provided you want to use the Stream sequentially, it is possible to partition a Stream (as well as perform related functions such as windowing - which I think is what you really want in this case).
Two libraries that will support partitoning for standard Streams are cyclops-react (I am the author) and jOOλ which cyclops-react extends (to add functionality such as Windowing).
cyclops-streams has a collection of static functions StreamUtils for operating on Java Streams, and a series of functions such as splitAt, headAndTail, splitBy, partition for partitioning.
To window a Stream into a Stream of nested Streams of size 30 you can use the window method.
To the OPs point, in Streaming terms, splitting a Stream into multiple Streams of a given size is a Windowing operation (rather than a Partitioning operation).
Stream<Streamable<Integer>> streamOfStreams = StreamUtils.window(stream,30);
There is a Stream extension class called ReactiveSeq that extends jool.Seq and adds Windowing functionality, that may make the code a little cleaner.
ReactiveSeq<Integer> seq;
ReactiveSeq<ListX<Integer>> streamOfLists = seq.grouped(30);
As Tagir points out above though, this isn't suitable for parallel Streams. If you want to window or batch a Stream you wish to executed in a multithreaded fashion. LazyFutureStream in cyclops-reactmight be useful (Windowing is on the to-do list, but plain old batching is available now).
In this case data will be passed from the multiple threads executing the Stream to a Multi-Producer/Single-Consumer wait-free Queue and the sequential data from that queue can be windowed before being distributed to threads again.
Stream<List<Data>> batched = new LazyReact().range(0,1000)
.grouped(30)
.map(this::process);
It seem like, as Jon Skeet has shown in his comment, it's not possible to make partitions lazy. For non-lazy partitions, I already have this code:
public static <T> Stream<Stream<T>> partition(Stream<T> source, int size) {
final Iterator<T> it = source.iterator();
final Iterator<Stream<T>> partIt = Iterators.transform(Iterators.partition(it, size), List::stream);
final Iterable<Stream<T>> iterable = () -> partIt;
return StreamSupport.stream(iterable.spliterator(), false);
}
This is a pure Java solution that's evaluated lazily instead of using List.
public static <T> Stream<List<T>> partition(Stream<T> stream, int batchSize){
List<List<T>> currentBatch = new ArrayList<List<T>>(); //just to make it mutable
currentBatch.add(new ArrayList<T>(batchSize));
return Stream.concat(stream
.sequential()
.map(new Function<T, List<T>>(){
public List<T> apply(T t){
currentBatch.get(0).add(t);
return currentBatch.get(0).size() == batchSize ? currentBatch.set(0,new ArrayList<>(batchSize)): null;
}
}), Stream.generate(()->currentBatch.get(0).isEmpty()?null:currentBatch.get(0))
.limit(1)
).filter(Objects::nonNull);
}
The method returns Stream<List<T>> for flexibility. You can convert it to Stream<Stream<T>> easily by partition(something, 10).map(List::stream).
The most elegant and pure java 8 solution for this problem i found:
public static <T> List<List<T>> partition(final List<T> list, int batchSize) {
return IntStream.range(0, getNumberOfPartitions(list, batchSize))
.mapToObj(i -> list.subList(i * batchSize, Math.min((i + 1) * batchSize, list.size())))
.collect(toList());
}
//https://stackoverflow.com/questions/23246983/get-the-next-higher-integer-value-in-java
private static <T> int getNumberOfPartitions(List<T> list, int batchSize) {
return (list.size() + batchSize- 1) / batchSize;
}
I think it is possible with some sort of hack inside:
create utility class for batch:
public static class ConcurrentBatch {
private AtomicLong id = new AtomicLong();
private int batchSize;
public ConcurrentBatch(int batchSize) {
this.batchSize = batchSize;
}
public long next() {
return (id.getAndIncrement()) / batchSize;
}
public int getBatchSize() {
return batchSize;
}
}
and method:
public static <T> void applyConcurrentBatchToStream(Consumer<List<T>> batchFunc, Stream<T> stream, int batchSize){
ConcurrentBatch batch = new ConcurrentBatch(batchSize);
//hack java map: extends and override computeIfAbsent
Supplier<ConcurrentMap<Long, List<T>>> mapFactory = () -> new ConcurrentHashMap<Long, List<T>>() {
#Override
public List<T> computeIfAbsent(Long key, Function<? super Long, ? extends List<T>> mappingFunction) {
List<T> rs = super.computeIfAbsent(key, mappingFunction);
//apply batchFunc to old lists, when new batch list is created
if(rs.isEmpty()){
for(Entry<Long, List<T>> e : entrySet()) {
List<T> batchList = e.getValue();
//todo: need to improve
synchronized (batchList) {
if (batchList.size() == batch.getBatchSize()){
batchFunc.accept(batchList);
remove(e.getKey());
batchList.clear();
}
}
}
}
return rs;
}
};
stream.map(s -> new AbstractMap.SimpleEntry<>(batch.next(), s))
.collect(groupingByConcurrent(AbstractMap.SimpleEntry::getKey, mapFactory, mapping(AbstractMap.SimpleEntry::getValue, toList())))
.entrySet()
.stream()
//map contains only unprocessed lists (size<batchSize)
.forEach(e -> batchFunc.accept(e.getValue()));
}
This is a performant way
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;
public final class Partition<T> extends AbstractList<List<T>> {
private final List<T> list;
private final int chunkSize;
public Partition(List<T> list, int chunkSize) {
this.list = new ArrayList<>(list);
this.chunkSize = chunkSize;
}
public static <T> Partition<T> ofSize(List<T> list, int chunkSize) {
return new Partition<>(list, chunkSize);
}
#Override
public List<T> get(int index) {
int start = index * chunkSize;
int end = Math.min(start + chunkSize, list.size());
if (start > end) {
throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
}
return new ArrayList<>(list.subList(start, end));
}
#Override
public int size() {
return (int) Math.ceil((double) list.size() / (double) chunkSize);
}
}
Usage
Partition<String> partition = Partition.ofSize(paCustomerCodes, chunkSize);
for (List<String> strings : partition) {
}
Here is a pure Java 8 solution - both sequential and parallel:
public <T> Collection<List<T>> chunk(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.stream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// LinkedHashMap is used here just to preserve order
.collect(groupingBy(Entry::getKey, LinkedHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
public <T> Collection<List<T>> chunkParallel(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.parallelStream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// So far it is parallel processing ordering cannot be preserved,
// but we have to make it thread safe - using e.g. ConcurrentHashMap
.collect(groupingBy(Entry::getKey, ConcurrentHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
Here is quick solution by abacus-common
IntStream.range(0, Integer.MAX_VALUE).split(size).forEach(s -> N.println(s.toArray()));
Disclaimer:I'm the developer of abacus-common.
Related
Simplified Example
I have the following code which generates the sum of a series i.e. 1, 1+2, 1+2+3, 1+2+3+4
public static void main(String[] args) {
Stream<Integer> inputStream = Stream.of(1,2,3,4);
Iterator<Integer> iterator = inputStream.iterator();
Stream<Integer> outputStream = Stream.iterate(
iterator.next(),
i -> iterator.hasNext(),
next -> {
return iterator.next() + next;
}
);
List<Integer> outputList = outputStream.collect(Collectors.toList());
System.out.println(outputList);
}
But this prints: [1, 3, 6], missing the last element.
Working example but needs Atomic Variable
Note this seems to get the correct check I want, but is there a better solution? Looks awful:
public static void main(String[] args) {
Stream<Integer> inputStream = Stream.of(1,2,3,4);
Iterator<Integer> iterator = inputStream.iterator();
AtomicBoolean check = new AtomicBoolean(true);
Stream<Integer> outputStream = Stream.iterate(
iterator.next(),
i -> check.get(),
next -> {
check.set(iterator.hasNext());
return iterator.hasNext() ? iterator.next() + next : next;
}
);
List<Integer> outputList = outputStream.collect(Collectors.toList());
System.out.println(outputList);
}
Generic Problem description
Here's a generic code illustrating the problem.
public static <O, I> Stream<O> iterate(O seed, Stream<I> stream, BiFunction<I,O,O> function) {
return iterate(seed, stream.iterator(), function);
}
public static <O, I> Stream<O> iterate(O seed, Iterator<I> iterator, BiFunction<I,O,O> function) {
AtomicBoolean hasNext = new AtomicBoolean(true);
return Stream.iterate(
seed,
i -> hasNext.get(),
next -> {
hasNext.set(iterator.hasNext());
return iterator.hasNext() ? function.apply(iterator.next(), next) : next;
}
);
}
public static void main(String[] args) {
Stream<Integer> inputStream = Stream.of(2,3,4);
BiFunction<Integer, Integer, Integer> f = Integer::sum;
Stream<Integer> outputStream = iterate(1, inputStream, f);
List<Integer> outputList = outputStream.collect(Collectors.toList());
System.out.println(outputList);
}
Problem Context
Basically, I want to do this because I am creating a function which produces a forecast of the balance of an interest bearing account.
I want to be able to take a stream of dates and then produce a stream of balances. That way you don't need to know how many elements there will be, or even the distribution of dates, which makes it a more flexible approach.
Also note that the next element of the Stream depends on the previous. This is why I have a seed which represents the first value (does not have a previous value), which would be the opening balance.
... but is there a better solution?
Yes, an elegant solution can be by using Arrays#parallelPrefix.
public class Main {
public static void main(String args[]) {
int[] arr = { 1, 2, 3, 4 };
Arrays.parallelPrefix(arr, Integer::sum);
System.out.println(Arrays.toString(arr));
}
}
Output:
[1, 3, 6, 10]
You can always convert back and forth between Stream<Integer> and int[] as per your requirement.
public class Main {
public static void main(String args[]) {
int[] arr = Stream.of(1, 2, 3, 4).mapToInt(Integer::valueOf).toArray();
Arrays.parallelPrefix(arr, Integer::sum);
System.out.println(Arrays.toString(arr));
// In case , you need a Stream<Integer> again
Stream<Integer> resultStream = Arrays.stream(arr).boxed();
// Or want the result as a List<Integer>
List<Integer> resultList = resultStream.toList();
System.out.println(resultList);
}
}
The problem is that the three arguments to Stream.iterate(…,…,…) correspond to the three arguments to a for(…;…;…) … statement, where the loop’s body corresponds to the chained stream operations. But this doesn’t match an iterator loop, which looks like
for(Iterator<I> iterator = …; iterator.hasNext(); ) {
I elements = iterator.next();
…
}
In other words, the right place of the next() call would be after the hasNext() check, before the remaining loop body, whereas the function passed as third argument to Stream.iterate(…,…,…) is evaluated after the loop body equivalent, before the hasNext() check.
A simple solution would be
public static <O, I> Stream<O> iterate(
O seed, Iterator<I> iterator, BiFunction<I,O,O> function) {
return Stream.iterate(seed, Objects::nonNull,
prev -> iterator.hasNext()? function.apply(iterator.next(), prev): null);
}
which effectively moves the hasNext() check before the next() call. But it requires that null will never occur as a normal result of the function evaluation, which should be the case for all arithmetic operations.
Otherwise, you would have to go one level deeper to implement a general-purpose stream operation:
public static <O, I> Stream<O> iterate(
O seed, Stream<I> stream, BiFunction<I,O,O> function) {
boolean parallel = stream.isParallel();
Spliterator<I> sp = stream.spliterator();
return StreamSupport.stream(new Spliterators.AbstractSpliterator<O>(
sp.estimateSize() == Long.MAX_VALUE? Long.MAX_VALUE: sp.estimateSize() + 1,
sp.characteristics() & Spliterator.SIZED | Spliterator.ORDERED) {
O value = seed;
final Consumer<I> c = i -> value = function.apply(i, value);
boolean end;
#Override
public boolean tryAdvance(Consumer<? super O> action) {
if(end) return false;
O current = value;
end = !sp.tryAdvance(c);
action.accept(current);
return true;
}
}, parallel).onClose(stream::close);
}
This is more complicated, but has no constraints of the values produced by the function and is more efficient than going through an Iterator. Note that this solution also cares to retain the current parallel setting and ensures that closing the stream will be delegated to the original stream, which might be important when being backed by a resource, like Files.lines, etc.
Try this:
AtomicInteger sum = new AtomicInteger(0);
List<Integer> result = Stream.of(1, 2, 3, 4).map(it -> sum.addAndGet(it)).toList();
result.forEach(System.out::print);
Here's a generic code illustrating the problem.
public static <O, I> Stream<O> iterate(O seed, Stream<I> stream,
BiFunction<I,O,O> function) {
return iterate(seed, stream.iterator(), function);
}
I want to be able to take a stream of dates and then produce a stream
of balances. That way you don't need to know how many elements there
will be, or even the distribution of dates, which makes it a more
flexible approach.
I've written the following solution based on the assumption that it's possible to merge the balances produces in the different threads, and the function BiFunction<I,O,O> is associative, as well as a merging function BinaryOperator<O>, that would be responsible for combining resulting values, i.e. balances.
Also, the value of seed should meat the same requirements which are imposed on identity of the Stream.reduce() operation (otherwise parallel stream would yield an incorrect result). I.e.
merger.apply(r, mapper.apply(seed, t)) == mapper.apply(t, r)
Which would not hold true if seed would be 1 and both mapper and merger would be defined as Integer::sum, but i -> i * i and seed 1 the condition would be met.
Note that if at least one of these requirements is not possible to fulfil, streams are not the right tool for this problem.
Here's how it might be implemented using a custom Collector. To define a Collector we can use static method Collector.of().
public static <R, T> Stream<R> iterate(R seed, Stream<T> stream,
BiFunction<T, R, R> mapper,
BinaryOperator<R> merger) {
return stream
.collect(sequenceCollector(seed, mapper, merger));
}
public static <R, T> Collector<T, ?, Stream<R>> sequenceCollector(R seed,
BiFunction<T, R, R> mapper,
BinaryOperator<R> merger) {
return Collector.of(
ArrayDeque::new,
(Deque<R> deque, T next) ->
deque.add(
mapper.apply(next, Objects.requireNonNullElse(deque.peekLast(), seed))
),
(left, right) -> {
R last = left.getLast();
right.forEach(next -> left.add(merger.apply(next, last)));
return left;
},
Collection::stream
);
}
main()
public static void main(String[] args) {
Stream<Integer> input1 = Stream.of(1, 2, 3, 4);
Stream<Integer> output1 = iterate(0, input1, Integer::sum, Integer::sum);
List<Integer> outputList1 = output1.toList();
System.out.println("Sequential: " + outputList1);
Stream<Integer> input2 = Stream.of(1, 2, 3, 4).parallel();
Stream<Integer> output2 = iterate(0, input2, Integer::sum, Integer::sum);
List<Integer> outputList2 = output2.toList();
System.out.println("Parallel: " + outputList2);
}
Output:
Sequential: [1, 3, 6, 10]
Parallel: [1, 3, 6, 10]
I'm trying to split a list into a list of list where each list has a maximum size of 4.
I would like to know how this is possible to do using lambdas.
Currently the way I'm doing it is as follow:
List<List<Object>> listOfList = new ArrayList<>();
final int MAX_ROW_LENGTH = 4;
int startIndex =0;
while(startIndex <= listToSplit.size() )
{
int endIndex = ( ( startIndex+MAX_ROW_LENGTH ) < listToSplit.size() ) ? startIndex+MAX_ROW_LENGTH : listToSplit.size();
listOfList.add(new ArrayList<>(listToSplit.subList(startIndex, endIndex)));
startIndex = startIndex+MAX_ROW_LENGTH;
}
UPDATE
It seems that there isn't a simple way to use lambdas to split lists. While all of the answers are much appreciated, they're also a wonderful example of when lambdas do not simplify things.
Try this approach:
static <T> List<List<T>> listSplitter(List<T> incoming, int size) {
// add validation if needed
return incoming.stream()
.collect(Collector.of(
ArrayList::new,
(accumulator, item) -> {
if(accumulator.isEmpty()) {
accumulator.add(new ArrayList<>(singletonList(item)));
} else {
List<T> last = accumulator.get(accumulator.size() - 1);
if(last.size() == size) {
accumulator.add(new ArrayList<>(singletonList(item)));
} else {
last.add(item);
}
}
},
(li1, li2) -> {
li1.addAll(li2);
return li1;
}
));
}
System.out.println(
listSplitter(
Arrays.asList(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
4
)
);
Also note that this code could be optimized, instead of:
new ArrayList<>(Collections.singletonList(item))
use this one:
List<List<T>> newList = new ArrayList<>(size);
newList.add(item);
return newList;
If you REALLY need a lambda it can be done like this. Otherwise the previous answers are better.
List<List<Object>> lists = new ArrayList<>();
AtomicInteger counter = new AtomicInteger();
final int MAX_ROW_LENGTH = 4;
listToSplit.forEach(pO -> {
if(counter.getAndIncrement() % MAX_ROW_LENGTH == 0) {
lists.add(new ArrayList<>());
}
lists.get(lists.size()-1).add(pO);
});
Surely the below is sufficient
final List<List<Object>> listOfList = new ArrayList<>(
listToSplit.stream()
.collect(Collectors.groupingBy(el -> listToSplit.indexOf(el) / MAX_ROW_LENGTH))
.values()
);
Stream it, collect with a grouping: this gives a Map of Object -> List, pull the values of the map and pass directly into whatever constructor (map.values() gives a Collection not a List).
Perhaps you can use something like that
BiFunction<List,Integer,List> splitter= (list2, count)->{
//temporary list of lists
List<List> listOfLists=new ArrayList<>();
//helper implicit recursive function
BiConsumer<Integer,BiConsumer> splitterHelper = (offset, func) -> {
if(list2.size()> offset+count){
listOfLists.add(list2.subList(offset,offset+count));
//implicit self call
func.accept(offset+count,func);
}
else if(list2.size()>offset){
listOfLists.add(list2.subList(offset,list2.size()));
//implicit self call
func.accept(offset+count,func);
}
};
//pass self reference
splitterHelper.accept(0,splitterHelper);
return listOfLists;
};
Usage example
List<Integer> list=new ArrayList<Integer>(){{
add(1);
add(2);
add(3);
add(4);
add(5);
add(6);
add(7);
add(8);
add(8);
}};
//calling splitter function
List listOfLists = splitter.apply(list, 3 /*max sublist size*/);
System.out.println(listOfLists);
And as a result we have
[[1, 2, 3], [4, 5, 6], [7, 8, 8]]
The requirement is a bit odd, but you could do:
final int[] counter = new int[] {0};
List<List<Object>> listOfLists = in.stream()
.collect(Collectors.groupingBy( x -> counter[0]++ / MAX_ROW_LENGTH ))
.entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.map(Map.Entry::getValue)
.collect(Collectors.toList());
You could probably streamline this by using the variant of groupingBy that takes a mapSupplier lambda, and supplying a SortedMap. This should return an EntrySet that iterates in order. I leave it as an exercise.
What we're doing here is:
Collecting your list items into a Map<Integer,Object> using a counter to group. The counter is held in a single-element array because the lambda can only use local variables if they're final.
Getting the map entries as a stream, and sorting by the Integer key.
Using Stream::map() to convert the stream of Map.Entry<Integer,Object> into a stream of Object values.
Collecting this into a list.
This doesn't benefit from any "free" parallelisation. It has a memory overhead in the intermediate Map. It's not particularly easy to read.
However, I wouldn't do this, just for the sake of using a lambda. I would do something like:
for(int i=0; i<in.size(); i += MAX_ROW_LENGTH) {
listOfList.add(
listToSplit.subList(i, Math.min(i + MAX_ROW_LENGTH, in.size());
}
(Yours had a defensive copy new ArrayList<>(listToSplit.subList(...)). I've not duplicated it because it's not always necessary - for example if the input list is unmodifiable and the output lists aren't intended to be modifiable. But do put it back in if you decide you need it in your case.)
This will be extremely fast on any in-memory list. You're very unlikely to want to parallelise it.
Alternatively, you could write your own (unmodifiable) implementation of List that's a view over the underlying List<Object>:
public class PartitionedList<T> extends AbstractList<List<T>> {
private final List<T> source;
private final int sublistSize;
public PartitionedList(T source, int sublistSize) {
this.source = source;
this.sublistSize = sublistSize;
}
#Override
public int size() {
return source.size() / sublistSize;
}
#Override
public List<T> get(int index) {
int sourceIndex = index * sublistSize
return source.subList(sourceIndex,
Math.min(sourceIndex + sublistSize, source.size());
}
}
Again, it's up to you whether you want to make defensive copies here.
This will be have equivalent big-O access time to the underlying list.
You can use:
ListUtils.partition(List list, int size)
OR
List<List> partition(List list, int size)
Both return consecutive sublists of a list, each of the same size (the final list may be smaller).
How to implement "partition" operation on Java 8 Stream? By partition I mean, divide a stream into sub-streams of a given size. Somehow it will be identical to Guava Iterators.partition() method, just it's desirable that the partitions are lazily-evaluated Streams rather than List's.
It's impossible to partition the arbitrary source stream to the fixed size batches, because this will screw up the parallel processing. When processing in parallel you may not know how many elements in the first sub-task after the split, so you cannot create the partitions for the next sub-task until the first is fully processed.
However it is possible to create the stream of partitions from the random access List. Such feature is available, for example, in my StreamEx library:
List<Type> input = Arrays.asList(...);
Stream<List<Type>> stream = StreamEx.ofSubLists(input, partitionSize);
Or if you really want the stream of streams:
Stream<Stream<Type>> stream = StreamEx.ofSubLists(input, partitionSize).map(List::stream);
If you don't want to depend on third-party libraries, you can implement such ofSubLists method manually:
public static <T> Stream<List<T>> ofSubLists(List<T> source, int length) {
if (length <= 0)
throw new IllegalArgumentException("length = " + length);
int size = source.size();
if (size <= 0)
return Stream.empty();
int fullChunks = (size - 1) / length;
return IntStream.range(0, fullChunks + 1).mapToObj(
n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length));
}
This implementation looks a little bit long, but it takes into account some corner cases like close-to-MAX_VALUE list size.
If you want parallel-friendly solution for unordered stream (so you don't care which stream elements will be combined in single batch), you may use the collector like this (thanks to #sibnick for inspiration):
public static <T, A, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<List<T>, A, R> downstream) {
class Acc {
List<T> cur = new ArrayList<>();
A acc = downstream.supplier().get();
}
BiConsumer<Acc, T> accumulator = (acc, t) -> {
acc.cur.add(t);
if(acc.cur.size() == batchSize) {
downstream.accumulator().accept(acc.acc, acc.cur);
acc.cur = new ArrayList<>();
}
};
return Collector.of(Acc::new, accumulator,
(acc1, acc2) -> {
acc1.acc = downstream.combiner().apply(acc1.acc, acc2.acc);
for(T t : acc2.cur) accumulator.accept(acc1, t);
return acc1;
}, acc -> {
if(!acc.cur.isEmpty())
downstream.accumulator().accept(acc.acc, acc.cur);
return downstream.finisher().apply(acc.acc);
}, Collector.Characteristics.UNORDERED);
}
Usage example:
List<List<Integer>> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.toList()));
Result:
[[2, 3, 4], [7, 8, 9], [0, 1, 5], [12, 13, 14], [17, 18, 19], [10, 11, 15], [6, 16]]
Such collector is perfectly thread-safe and produces ordered batches for sequential stream.
If you want to apply an intermediate transformation for every batch, you may use the following version:
public static <T, AA, A, B, R> Collector<T, ?, R> unorderedBatches(int batchSize,
Collector<T, AA, B> batchCollector,
Collector<B, A, R> downstream) {
return unorderedBatches(batchSize,
Collectors.mapping(list -> list.stream().collect(batchCollector), downstream));
}
For example, this way you can sum the numbers in every batch on the fly:
List<Integer> list = IntStream.range(0,20)
.boxed().parallel()
.collect(unorderedBatches(3, Collectors.summingInt(Integer::intValue),
Collectors.toList()));
I found an elegant solution: Iterable parts = Iterables.partition(stream::iterator, size)
Provided you want to use the Stream sequentially, it is possible to partition a Stream (as well as perform related functions such as windowing - which I think is what you really want in this case).
Two libraries that will support partitoning for standard Streams are cyclops-react (I am the author) and jOOλ which cyclops-react extends (to add functionality such as Windowing).
cyclops-streams has a collection of static functions StreamUtils for operating on Java Streams, and a series of functions such as splitAt, headAndTail, splitBy, partition for partitioning.
To window a Stream into a Stream of nested Streams of size 30 you can use the window method.
To the OPs point, in Streaming terms, splitting a Stream into multiple Streams of a given size is a Windowing operation (rather than a Partitioning operation).
Stream<Streamable<Integer>> streamOfStreams = StreamUtils.window(stream,30);
There is a Stream extension class called ReactiveSeq that extends jool.Seq and adds Windowing functionality, that may make the code a little cleaner.
ReactiveSeq<Integer> seq;
ReactiveSeq<ListX<Integer>> streamOfLists = seq.grouped(30);
As Tagir points out above though, this isn't suitable for parallel Streams. If you want to window or batch a Stream you wish to executed in a multithreaded fashion. LazyFutureStream in cyclops-reactmight be useful (Windowing is on the to-do list, but plain old batching is available now).
In this case data will be passed from the multiple threads executing the Stream to a Multi-Producer/Single-Consumer wait-free Queue and the sequential data from that queue can be windowed before being distributed to threads again.
Stream<List<Data>> batched = new LazyReact().range(0,1000)
.grouped(30)
.map(this::process);
It seem like, as Jon Skeet has shown in his comment, it's not possible to make partitions lazy. For non-lazy partitions, I already have this code:
public static <T> Stream<Stream<T>> partition(Stream<T> source, int size) {
final Iterator<T> it = source.iterator();
final Iterator<Stream<T>> partIt = Iterators.transform(Iterators.partition(it, size), List::stream);
final Iterable<Stream<T>> iterable = () -> partIt;
return StreamSupport.stream(iterable.spliterator(), false);
}
This is a pure Java solution that's evaluated lazily instead of using List.
public static <T> Stream<List<T>> partition(Stream<T> stream, int batchSize){
List<List<T>> currentBatch = new ArrayList<List<T>>(); //just to make it mutable
currentBatch.add(new ArrayList<T>(batchSize));
return Stream.concat(stream
.sequential()
.map(new Function<T, List<T>>(){
public List<T> apply(T t){
currentBatch.get(0).add(t);
return currentBatch.get(0).size() == batchSize ? currentBatch.set(0,new ArrayList<>(batchSize)): null;
}
}), Stream.generate(()->currentBatch.get(0).isEmpty()?null:currentBatch.get(0))
.limit(1)
).filter(Objects::nonNull);
}
The method returns Stream<List<T>> for flexibility. You can convert it to Stream<Stream<T>> easily by partition(something, 10).map(List::stream).
The most elegant and pure java 8 solution for this problem i found:
public static <T> List<List<T>> partition(final List<T> list, int batchSize) {
return IntStream.range(0, getNumberOfPartitions(list, batchSize))
.mapToObj(i -> list.subList(i * batchSize, Math.min((i + 1) * batchSize, list.size())))
.collect(toList());
}
//https://stackoverflow.com/questions/23246983/get-the-next-higher-integer-value-in-java
private static <T> int getNumberOfPartitions(List<T> list, int batchSize) {
return (list.size() + batchSize- 1) / batchSize;
}
I think it is possible with some sort of hack inside:
create utility class for batch:
public static class ConcurrentBatch {
private AtomicLong id = new AtomicLong();
private int batchSize;
public ConcurrentBatch(int batchSize) {
this.batchSize = batchSize;
}
public long next() {
return (id.getAndIncrement()) / batchSize;
}
public int getBatchSize() {
return batchSize;
}
}
and method:
public static <T> void applyConcurrentBatchToStream(Consumer<List<T>> batchFunc, Stream<T> stream, int batchSize){
ConcurrentBatch batch = new ConcurrentBatch(batchSize);
//hack java map: extends and override computeIfAbsent
Supplier<ConcurrentMap<Long, List<T>>> mapFactory = () -> new ConcurrentHashMap<Long, List<T>>() {
#Override
public List<T> computeIfAbsent(Long key, Function<? super Long, ? extends List<T>> mappingFunction) {
List<T> rs = super.computeIfAbsent(key, mappingFunction);
//apply batchFunc to old lists, when new batch list is created
if(rs.isEmpty()){
for(Entry<Long, List<T>> e : entrySet()) {
List<T> batchList = e.getValue();
//todo: need to improve
synchronized (batchList) {
if (batchList.size() == batch.getBatchSize()){
batchFunc.accept(batchList);
remove(e.getKey());
batchList.clear();
}
}
}
}
return rs;
}
};
stream.map(s -> new AbstractMap.SimpleEntry<>(batch.next(), s))
.collect(groupingByConcurrent(AbstractMap.SimpleEntry::getKey, mapFactory, mapping(AbstractMap.SimpleEntry::getValue, toList())))
.entrySet()
.stream()
//map contains only unprocessed lists (size<batchSize)
.forEach(e -> batchFunc.accept(e.getValue()));
}
This is a performant way
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;
public final class Partition<T> extends AbstractList<List<T>> {
private final List<T> list;
private final int chunkSize;
public Partition(List<T> list, int chunkSize) {
this.list = new ArrayList<>(list);
this.chunkSize = chunkSize;
}
public static <T> Partition<T> ofSize(List<T> list, int chunkSize) {
return new Partition<>(list, chunkSize);
}
#Override
public List<T> get(int index) {
int start = index * chunkSize;
int end = Math.min(start + chunkSize, list.size());
if (start > end) {
throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
}
return new ArrayList<>(list.subList(start, end));
}
#Override
public int size() {
return (int) Math.ceil((double) list.size() / (double) chunkSize);
}
}
Usage
Partition<String> partition = Partition.ofSize(paCustomerCodes, chunkSize);
for (List<String> strings : partition) {
}
Here is a pure Java 8 solution - both sequential and parallel:
public <T> Collection<List<T>> chunk(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.stream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// LinkedHashMap is used here just to preserve order
.collect(groupingBy(Entry::getKey, LinkedHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
public <T> Collection<List<T>> chunkParallel(Collection<T> collection, int chunkSize) {
final AtomicInteger index = new AtomicInteger();
return collection.parallelStream()
.map(v -> new SimpleImmutableEntry<>(index.getAndIncrement() / chunkSize, v))
// So far it is parallel processing ordering cannot be preserved,
// but we have to make it thread safe - using e.g. ConcurrentHashMap
.collect(groupingBy(Entry::getKey, ConcurrentHashMap::new, mapping(Entry::getValue, toList())))
.values();
}
Here is quick solution by abacus-common
IntStream.range(0, Integer.MAX_VALUE).split(size).forEach(s -> N.println(s.toArray()));
Disclaimer:I'm the developer of abacus-common.
General question: What's the proper way to reverse a stream? Assuming that we don't know what type of elements that stream consists of, what's the generic way to reverse any stream?
Specific question:
IntStream provides range method to generate Integers in specific range IntStream.range(-range, 0), now that I want to reverse it switching range from 0 to negative won't work, also I can't use Integer::compare
List<Integer> list = Arrays.asList(1,2,3,4);
list.stream().sorted(Integer::compare).forEach(System.out::println);
with IntStream I'll get this compiler error
Error:(191, 0) ajc: The method sorted() in the type IntStream is not applicable for the arguments (Integer::compare)
what am I missing here?
For the specific question of generating a reverse IntStream, try something like this:
static IntStream revRange(int from, int to) {
return IntStream.range(from, to)
.map(i -> to - i + from - 1);
}
This avoids boxing and sorting.
For the general question of how to reverse a stream of any type, I don't know of there's a "proper" way. There are a couple ways I can think of. Both end up storing the stream elements. I don't know of a way to reverse a stream without storing the elements.
This first way stores the elements into an array and reads them out to a stream in reverse order. Note that since we don't know the runtime type of the stream elements, we can't type the array properly, requiring an unchecked cast.
#SuppressWarnings("unchecked")
static <T> Stream<T> reverse(Stream<T> input) {
Object[] temp = input.toArray();
return (Stream<T>) IntStream.range(0, temp.length)
.mapToObj(i -> temp[temp.length - i - 1]);
}
Another technique uses collectors to accumulate the items into a reversed list. This does lots of insertions at the front of ArrayList objects, so there's lots of copying going on.
Stream<T> input = ... ;
List<T> output =
input.collect(ArrayList::new,
(list, e) -> list.add(0, e),
(list1, list2) -> list1.addAll(0, list2));
It's probably possible to write a much more efficient reversing collector using some kind of customized data structure.
UPDATE 2016-01-29
Since this question has gotten a bit of attention recently, I figure I should update my answer to solve the problem with inserting at the front of ArrayList. This will be horribly inefficient with a large number of elements, requiring O(N^2) copying.
It's preferable to use an ArrayDeque instead, which efficiently supports insertion at the front. A small wrinkle is that we can't use the three-arg form of Stream.collect(); it requires the contents of the second arg be merged into the first arg, and there's no "add-all-at-front" bulk operation on Deque. Instead, we use addAll() to append the contents of the first arg to the end of the second, and then we return the second. This requires using the Collector.of() factory method.
The complete code is this:
Deque<String> output =
input.collect(Collector.of(
ArrayDeque::new,
(deq, t) -> deq.addFirst(t),
(d1, d2) -> { d2.addAll(d1); return d2; }));
The result is a Deque instead of a List, but that shouldn't be much of an issue, as it can easily be iterated or streamed in the now-reversed order.
Elegant solution
List<Integer> list = Arrays.asList(1,2,3,4);
list.stream()
.sorted(Collections.reverseOrder()) // Method on Stream<Integer>
.forEach(System.out::println);
General Question:
Stream does not store any elements.
So iterating elements in the reverse order is not possible without storing the elements in some intermediate collection.
Stream.of("1", "2", "20", "3")
.collect(Collectors.toCollection(ArrayDeque::new)) // or LinkedList
.descendingIterator()
.forEachRemaining(System.out::println);
Update: Changed LinkedList to ArrayDeque (better) see here for details
Prints:
3
20
2
1
By the way, using sort method is not correct as it sorts, NOT reverses (assuming stream may have unordered elements)
Specific Question:
I found this simple, easier and intuitive(Copied #Holger comment)
IntStream.iterate(to - 1, i -> i - 1).limit(to - from)
Many of the solutions here sort or reverse the IntStream, but that unnecessarily requires intermediate storage. Stuart Marks's solution is the way to go:
static IntStream revRange(int from, int to) {
return IntStream.range(from, to).map(i -> to - i + from - 1);
}
It correctly handles overflow as well, passing this test:
#Test
public void testRevRange() {
assertArrayEquals(revRange(0, 5).toArray(), new int[]{4, 3, 2, 1, 0});
assertArrayEquals(revRange(-5, 0).toArray(), new int[]{-1, -2, -3, -4, -5});
assertArrayEquals(revRange(1, 4).toArray(), new int[]{3, 2, 1});
assertArrayEquals(revRange(0, 0).toArray(), new int[0]);
assertArrayEquals(revRange(0, -1).toArray(), new int[0]);
assertArrayEquals(revRange(MIN_VALUE, MIN_VALUE).toArray(), new int[0]);
assertArrayEquals(revRange(MAX_VALUE, MAX_VALUE).toArray(), new int[0]);
assertArrayEquals(revRange(MIN_VALUE, MIN_VALUE + 1).toArray(), new int[]{MIN_VALUE});
assertArrayEquals(revRange(MAX_VALUE - 1, MAX_VALUE).toArray(), new int[]{MAX_VALUE - 1});
}
How NOT to do it:
Don't use .sorted(Comparator.reverseOrder()) or .sorted(Collections.reverseOrder()), because it will just sort elements in the descending order.
Using it for given Integer input:
[1, 4, 2, 5, 3]
the output would be as follows:
[5, 4, 3, 2, 1]
For String input:
["A", "D", "B", "E", "C"]
the output would be as follows:
[E, D, C, B, A]
Don't use .sorted((a, b) -> -1) (explanation at the end)
The easiest way to do it properly:
List<Integer> list = Arrays.asList(1, 4, 2, 5, 3);
Collections.reverse(list);
System.out.println(list);
Output:
[3, 5, 2, 4, 1]
The same for String:
List<String> stringList = Arrays.asList("A", "D", "B", "E", "C");
Collections.reverse(stringList);
System.out.println(stringList);
Output:
[C, E, B, D, A]
Don't use .sorted((a, b) -> -1)!
It breaks comparator contract and might work only for some cases ie. only on single thread but not in parallel.
yankee explanation:
(a, b) -> -1 breaks the contract for Comparator. Whether this works depends on the implementation of the sort algorithm. The next release of the JVM might break this. Actually I can already break this reproduciblly on my machine using IntStream.range(0, 10000).parallel().boxed().sorted((a, b) -> -1).forEachOrdered(System.out::println);
//Don't use this!!!
List<Integer> list = Arrays.asList(1, 4, 2, 5, 3);
List<Integer> reversedList = list.stream()
.sorted((a, b) -> -1)
.collect(Collectors.toList());
System.out.println(reversedList);
Output in positive case:
[3, 5, 2, 4, 1]
Possible output in parallel stream or with other JVM implementation:
[4, 1, 2, 3, 5]
The same for String:
//Don't use this!!!
List<String> stringList = Arrays.asList("A", "D", "B", "E", "C");
List<String> reversedStringList = stringList.stream()
.sorted((a, b) -> -1)
.collect(Collectors.toList());
System.out.println(reversedStringList);
Output in positive case:
[C, E, B, D, A]
Possible output in parallel stream or with other JVM implementation:
[A, E, B, D, C]
without external lib...
import java.util.List;
import java.util.Collections;
import java.util.stream.Collector;
public class MyCollectors {
public static <T> Collector<T, ?, List<T>> toListReversed() {
return Collectors.collectingAndThen(Collectors.toList(), l -> {
Collections.reverse(l);
return l;
});
}
}
If implemented Comparable<T> (ex. Integer, String, Date), you can do it using Comparator.reverseOrder().
List<Integer> list = Arrays.asList(1, 2, 3, 4);
list.stream()
.sorted(Comparator.reverseOrder())
.forEach(System.out::println);
You could define your own collector that collects the elements in reverse order:
public static <T> Collector<T, List<T>, List<T>> inReverse() {
return Collector.of(
ArrayList::new,
(l, t) -> l.add(t),
(l, r) -> {l.addAll(r); return l;},
Lists::<T>reverse);
}
And use it like:
stream.collect(inReverse()).forEach(t -> ...)
I use an ArrayList in forward order to efficiently insert collect the items (at the end of the list), and Guava Lists.reverse to efficiently give a reversed view of the list without making another copy of it.
Here are some test cases for the custom collector:
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.*;
import java.util.ArrayList;
import java.util.List;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;
import org.hamcrest.Matchers;
import org.junit.Test;
import com.google.common.collect.Lists;
public class TestReverseCollector {
private final Object t1 = new Object();
private final Object t2 = new Object();
private final Object t3 = new Object();
private final Object t4 = new Object();
private final Collector<Object, List<Object>, List<Object>> inReverse = inReverse();
private final Supplier<List<Object>> supplier = inReverse.supplier();
private final BiConsumer<List<Object>, Object> accumulator = inReverse.accumulator();
private final Function<List<Object>, List<Object>> finisher = inReverse.finisher();
private final BinaryOperator<List<Object>> combiner = inReverse.combiner();
#Test public void associative() {
final List<Object> a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
final List<Object> r1 = finisher.apply(a1);
final List<Object> a2 = supplier.get();
accumulator.accept(a2, t1);
final List<Object> a3 = supplier.get();
accumulator.accept(a3, t2);
final List<Object> r2 = finisher.apply(combiner.apply(a2, a3));
assertThat(r1, Matchers.equalTo(r2));
}
#Test public void identity() {
final List<Object> a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
final List<Object> r1 = finisher.apply(a1);
final List<Object> a2 = supplier.get();
accumulator.accept(a2, t1);
accumulator.accept(a2, t2);
final List<Object> r2 = finisher.apply(combiner.apply(a2, supplier.get()));
assertThat(r1, equalTo(r2));
}
#Test public void reversing() throws Exception {
final List<Object> a2 = supplier.get();
accumulator.accept(a2, t1);
accumulator.accept(a2, t2);
final List<Object> a3 = supplier.get();
accumulator.accept(a3, t3);
accumulator.accept(a3, t4);
final List<Object> r2 = finisher.apply(combiner.apply(a2, a3));
assertThat(r2, contains(t4, t3, t2, t1));
}
public static <T> Collector<T, List<T>, List<T>> inReverse() {
return Collector.of(
ArrayList::new,
(l, t) -> l.add(t),
(l, r) -> {l.addAll(r); return l;},
Lists::<T>reverse);
}
}
cyclops-react StreamUtils has a reverse Stream method (javadoc).
StreamUtils.reverse(Stream.of("1", "2", "20", "3"))
.forEach(System.out::println);
It works by collecting to an ArrayList and then making use of the ListIterator class which can iterate in either direction, to iterate backwards over the list.
If you already have a List, it will be more efficient
StreamUtils.reversedStream(Arrays.asList("1", "2", "20", "3"))
.forEach(System.out::println);
Here's the solution I've come up with:
private static final Comparator<Integer> BY_ASCENDING_ORDER = Integer::compare;
private static final Comparator<Integer> BY_DESCENDING_ORDER = BY_ASCENDING_ORDER.reversed();
then using those comparators:
IntStream.range(-range, 0).boxed().sorted(BY_DESCENDING_ORDER).forEach(// etc...
I would suggest using jOOλ, it's a great library that adds lots of useful functionality to Java 8 streams and lambdas.
You can then do the following:
List<Integer> list = Arrays.asList(1,2,3,4);
Seq.seq(list).reverse().forEach(System.out::println)
Simple as that. It's a pretty lightweight library, and well worth adding to any Java 8 project.
How about this utility method?
public static <T> Stream<T> getReverseStream(List<T> list) {
final ListIterator<T> listIt = list.listIterator(list.size());
final Iterator<T> reverseIterator = new Iterator<T>() {
#Override
public boolean hasNext() {
return listIt.hasPrevious();
}
#Override
public T next() {
return listIt.previous();
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
reverseIterator,
Spliterator.ORDERED | Spliterator.IMMUTABLE), false);
}
Seems to work with all cases without duplication.
With regard to the specific question of generating a reverse IntStream:
starting from Java 9 you can use the three-argument version of the IntStream.iterate(...):
IntStream.iterate(10, x -> x >= 0, x -> x - 1).forEach(System.out::println);
// Out: 10 9 8 7 6 5 4 3 2 1 0
where:
IntStream.iterate(int seed, IntPredicate hasNext, IntUnaryOperator next);
seed - the initial element;
hasNext - a predicate to apply to elements to determine when the
stream must terminate;
next - a function to be applied to the previous element to produce a
new element.
Simplest way (simple collect - supports parallel streams):
public static <T> Stream<T> reverse(Stream<T> stream) {
return stream
.collect(Collector.of(
() -> new ArrayDeque<T>(),
ArrayDeque::addFirst,
(q1, q2) -> { q2.addAll(q1); return q2; })
)
.stream();
}
Advanced way (supports parallel streams in an ongoing way):
public static <T> Stream<T> reverse(Stream<T> stream) {
Objects.requireNonNull(stream, "stream");
class ReverseSpliterator implements Spliterator<T> {
private Spliterator<T> spliterator;
private final Deque<T> deque = new ArrayDeque<>();
private ReverseSpliterator(Spliterator<T> spliterator) {
this.spliterator = spliterator;
}
#Override
#SuppressWarnings({"StatementWithEmptyBody"})
public boolean tryAdvance(Consumer<? super T> action) {
while(spliterator.tryAdvance(deque::addFirst));
if(!deque.isEmpty()) {
action.accept(deque.remove());
return true;
}
return false;
}
#Override
public Spliterator<T> trySplit() {
// After traveling started the spliterator don't contain elements!
Spliterator<T> prev = spliterator.trySplit();
if(prev == null) {
return null;
}
Spliterator<T> me = spliterator;
spliterator = prev;
return new ReverseSpliterator(me);
}
#Override
public long estimateSize() {
return spliterator.estimateSize();
}
#Override
public int characteristics() {
return spliterator.characteristics();
}
#Override
public Comparator<? super T> getComparator() {
Comparator<? super T> comparator = spliterator.getComparator();
return (comparator != null) ? comparator.reversed() : null;
}
#Override
public void forEachRemaining(Consumer<? super T> action) {
// Ensure that tryAdvance is called at least once
if(!deque.isEmpty() || tryAdvance(action)) {
deque.forEach(action);
}
}
}
return StreamSupport.stream(new ReverseSpliterator(stream.spliterator()), stream.isParallel());
}
Note you can quickly extends to other type of streams (IntStream, ...).
Testing:
// Use parallel if you wish only
revert(Stream.of("One", "Two", "Three", "Four", "Five", "Six").parallel())
.forEachOrdered(System.out::println);
Results:
Six
Five
Four
Three
Two
One
Additional notes: The simplest way it isn't so useful when used with other stream operations (the collect join breaks the parallelism). The advance way doesn't have that issue, and it keeps also the initial characteristics of the stream, for example SORTED, and so, it's the way to go to use with other stream operations after the reverse.
ArrayDeque are faster in the stack than a Stack or LinkedList. "push()" inserts elements at the front of the Deque
protected <T> Stream<T> reverse(Stream<T> stream) {
ArrayDeque<T> stack = new ArrayDeque<>();
stream.forEach(stack::push);
return stack.stream();
}
List newStream = list.stream().sorted(Collections.reverseOrder()).collect(Collectors.toList());
newStream.forEach(System.out::println);
One could write a collector that collects elements in reversed order:
public static <T> Collector<T, ?, Stream<T>> reversed() {
return Collectors.collectingAndThen(Collectors.toList(), list -> {
Collections.reverse(list);
return list.stream();
});
}
And use it like this:
Stream.of(1, 2, 3, 4, 5).collect(reversed()).forEach(System.out::println);
Original answer (contains a bug - it does not work correctly for parallel streams):
A general purpose stream reverse method could look like:
public static <T> Stream<T> reverse(Stream<T> stream) {
LinkedList<T> stack = new LinkedList<>();
stream.forEach(stack::push);
return stack.stream();
}
For reference I was looking at the same problem, I wanted to join the string value of stream elements in the reverse order.
itemList = { last, middle, first } => first,middle,last
I started to use an intermediate collection with collectingAndThen from comonad or the ArrayDeque collector of Stuart Marks, although I wasn't happy with intermediate collection, and streaming again
itemList.stream()
.map(TheObject::toString)
.collect(Collectors.collectingAndThen(Collectors.toList(),
strings -> {
Collections.reverse(strings);
return strings;
}))
.stream()
.collect(Collector.joining());
So I iterated over Stuart Marks answer that was using the Collector.of factory, that has the interesting finisher lambda.
itemList.stream()
.collect(Collector.of(StringBuilder::new,
(sb, o) -> sb.insert(0, o),
(r1, r2) -> { r1.insert(0, r2); return r1; },
StringBuilder::toString));
Since in this case the stream is not parallel, the combiner is not relevant that much, I'm using insert anyway for the sake of code consistency but it does not matter as it would depend of which stringbuilder is built first.
I looked at the StringJoiner, however it does not have an insert method.
Not purely Java8 but if you use guava's Lists.reverse() method in conjunction, you can easily achieve this:
List<Integer> list = Arrays.asList(1,2,3,4);
Lists.reverse(list).stream().forEach(System.out::println);
Reversing string or any Array
(Stream.of("abcdefghijklm 1234567".split("")).collect(Collectors.collectingAndThen(Collectors.toList(),list -> {Collections.reverse(list);return list;}))).stream().forEach(System.out::println);
split can be modified based on the delimiter or space
How about reversing the Collection backing the stream prior?
import java.util.Collections;
import java.util.List;
public void reverseTest(List<Integer> sampleCollection) {
Collections.reverse(sampleCollection); // remember this reverses the elements in the list, so if you want the original input collection to remain untouched clone it first.
sampleCollection.stream().forEach(item -> {
// you op here
});
}
Answering specific question of reversing with IntStream, below worked for me:
IntStream.range(0, 10)
.map(x -> x * -1)
.sorted()
.map(Math::abs)
.forEach(System.out::println);
In all this I don't see the answer I would go to first.
This isn't exactly a direct answer to the question, but it's a potential solution to the problem.
Just build the list backwards in the first place. If you can, use a LinkedList instead of an ArrayList and when you add items use "Push" instead of add. The list will be built in the reverse order and will then stream correctly without any manipulation.
This won't fit cases where you are dealing with primitive arrays or lists that are already used in various ways but does work well in a surprising number of cases.
the simplest solution is using List::listIterator and Stream::generate
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
ListIterator<Integer> listIterator = list.listIterator(list.size());
Stream.generate(listIterator::previous)
.limit(list.size())
.forEach(System.out::println);
This method works with any Stream and is Java 8 compliant:
Stream<Integer> myStream = Stream.of(1, 2, 3, 4, 5);
myStream.reduce(Stream.empty(),
(Stream<Integer> a, Integer b) -> Stream.concat(Stream.of(b), a),
(a, b) -> Stream.concat(b, a))
.forEach(System.out::println);
This is how I do it.
I don't like the idea of creating a new collection and reverse iterating it.
The IntStream#map idea is pretty neat, but I prefer the IntStream#iterate method, for I think the idea of a countdown to Zero better expressed with the iterate method and easier to understand in terms of walking the array from back to front.
import static java.lang.Math.max;
private static final double EXACT_MATCH = 0d;
public static IntStream reverseStream(final int[] array) {
return countdownFrom(array.length - 1).map(index -> array[index]);
}
public static DoubleStream reverseStream(final double[] array) {
return countdownFrom(array.length - 1).mapToDouble(index -> array[index]);
}
public static <T> Stream<T> reverseStream(final T[] array) {
return countdownFrom(array.length - 1).mapToObj(index -> array[index]);
}
public static IntStream countdownFrom(final int top) {
return IntStream.iterate(top, t -> t - 1).limit(max(0, (long) top + 1));
}
Here are some tests to prove it works:
import static java.lang.Integer.MAX_VALUE;
import static org.junit.Assert.*;
#Test
public void testReverseStream_emptyArrayCreatesEmptyStream() {
Assert.assertEquals(0, reverseStream(new double[0]).count());
}
#Test
public void testReverseStream_singleElementCreatesSingleElementStream() {
Assert.assertEquals(1, reverseStream(new double[1]).count());
final double[] singleElementArray = new double[] { 123.4 };
assertArrayEquals(singleElementArray, reverseStream(singleElementArray).toArray(), EXACT_MATCH);
}
#Test
public void testReverseStream_multipleElementsAreStreamedInReversedOrder() {
final double[] arr = new double[] { 1d, 2d, 3d };
final double[] revArr = new double[] { 3d, 2d, 1d };
Assert.assertEquals(arr.length, reverseStream(arr).count());
Assert.assertArrayEquals(revArr, reverseStream(arr).toArray(), EXACT_MATCH);
}
#Test
public void testCountdownFrom_returnsAllElementsFromTopToZeroInReverseOrder() {
assertArrayEquals(new int[] { 4, 3, 2, 1, 0 }, countdownFrom(4).toArray());
}
#Test
public void testCountdownFrom_countingDownStartingWithZeroOutputsTheNumberZero() {
assertArrayEquals(new int[] { 0 }, countdownFrom(0).toArray());
}
#Test
public void testCountdownFrom_doesNotChokeOnIntegerMaxValue() {
assertEquals(true, countdownFrom(MAX_VALUE).anyMatch(x -> x == MAX_VALUE));
}
#Test
public void testCountdownFrom_givesZeroLengthCountForNegativeValues() {
assertArrayEquals(new int[0], countdownFrom(-1).toArray());
assertArrayEquals(new int[0], countdownFrom(-4).toArray());
}
Based on #stuart-marks's answer, but without casting, function returning stream of list elements starting from end:
public static <T> Stream<T> reversedStream(List<T> tList) {
final int size = tList.size();
return IntStream.range(0, size)
.mapToObj(i -> tList.get(size - 1 - i));
}
// usage
reversedStream(list).forEach(System.out::println);
What's the proper generic way to reverse a stream?
If the stream does not specify an encounter order, don't.
(!s.spliterator().hasCharacteristics(java.util.Spliterator.ORDERED))
The most generic and the easiest way to reverse a list will be :
public static <T> void reverseHelper(List<T> li){
li.stream()
.sorted((x,y)-> -1)
.collect(Collectors.toList())
.forEach(System.out::println);
}
Java 8 way to do this:
List<Integer> list = Arrays.asList(1,2,3,4);
Comparator<Integer> comparator = Integer::compare;
list.stream().sorted(comparator.reversed()).forEach(System.out::println);
I tried to translate the following line of Scala to Java 8 using the Streams API:
// Scala
util.Random.shuffle((1 to 24).toList)
To write the equivalent in Java I created a range of integers:
IntStream.range(1, 25)
I suspected to find a toList method in the stream API, but IntStream only knows the strange method:
collect(
Supplier<R> supplier, ObjIntConsumer<R> accumulator, BiConsumer<R,R> combiner)
How can I shuffle a list with Java 8 Streams API?
Here you go:
List<Integer> integers =
IntStream.range(1, 10) // <-- creates a stream of ints
.boxed() // <-- converts them to Integers
.collect(Collectors.toList()); // <-- collects the values to a list
Collections.shuffle(integers);
System.out.println(integers);
Prints:
[8, 1, 5, 3, 4, 2, 6, 9, 7]
You may find the following toShuffledList() method useful.
private static final Collector<?, ?, ?> SHUFFLER = Collectors.collectingAndThen(
Collectors.toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list;
}
);
#SuppressWarnings("unchecked")
public static <T> Collector<T, ?, List<T>> toShuffledList() {
return (Collector<T, ?, List<T>>) SHUFFLER;
}
This enables the following kind of one-liner:
IntStream.rangeClosed('A', 'Z')
.mapToObj(a -> (char) a)
.collect(toShuffledList())
.forEach(System.out::print);
Example output:
AVBFYXIMUDENOTHCRJKWGQZSPL
You can use a custom comparator that "sorts" the values by a random value:
public final class RandomComparator<T> implements Comparator<T> {
private final Map<T, Integer> map = new IdentityHashMap<>();
private final Random random;
public RandomComparator() {
this(new Random());
}
public RandomComparator(Random random) {
this.random = random;
}
#Override
public int compare(T t1, T t2) {
return Integer.compare(valueFor(t1), valueFor(t2));
}
private int valueFor(T t) {
synchronized (map) {
return map.computeIfAbsent(t, ignore -> random.nextInt());
}
}
}
Each object in the stream is (lazily) associated a random integer value, on which we sort. The synchronization on the map is to deal with parallel streams.
You can then use it like that:
IntStream.rangeClosed(0, 24).boxed()
.sorted(new RandomComparator<>())
.collect(Collectors.toList());
The advantage of this solution is that it integrates within the stream pipeline.
If you want to process the whole Stream without too much hassle, you can simply create your own Collector using Collectors.collectingAndThen():
public static <T> Collector<T, ?, Stream<T>> toEagerShuffledStream() {
return Collectors.collectingAndThen(
toList(),
list -> {
Collections.shuffle(list);
return list.stream();
});
}
But this won't perform well if you want to limit() the resulting Stream. In order to overcome this, one could create a custom Spliterator:
package com.pivovarit.stream;
import java.util.List;
import java.util.Objects;
import java.util.Random;
import java.util.RandomAccess;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
class ImprovedRandomSpliterator<T, LIST extends RandomAccess & List<T>> implements Spliterator<T> {
private final Random random;
private final List<T> source;
private int size;
ImprovedRandomSpliterator(LIST source, Supplier<? extends Random> random) {
Objects.requireNonNull(source, "source can't be null");
Objects.requireNonNull(random, "random can't be null");
this.source = source;
this.random = random.get();
this.size = this.source.size();
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (size > 0) {
int nextIdx = random.nextInt(size);
int lastIdx = --size;
T last = source.get(lastIdx);
T elem = source.set(nextIdx, last);
action.accept(elem);
return true;
} else {
return false;
}
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.size();
}
#Override
public int characteristics() {
return SIZED;
}
}
and then:
public final class RandomCollectors {
private RandomCollectors() {
}
public static <T> Collector<T, ?, Stream<T>> toImprovedLazyShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
public static <T> Collector<T, ?, Stream<T>> toEagerShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list.stream();
});
}
}
I explained the performance considerations here: https://4comprehension.com/implementing-a-randomized-stream-spliterator-in-java/
To perform a shuffle efficiently you need all the values in advance. You can use Collections.shuffle() after you have converted the stream to a list like you do in Scala.
If you're looking for a "streaming only" solution and a deterministic, merely "haphazard" ordering versus a "random" ordering is Good Enough, you can always sort your ints by a hash value:
List<Integer> xs=IntStream.range(0, 10)
.boxed()
.sorted( (a, b) -> a.hashCode() - b.hashCode() )
.collect(Collectors.toList());
If you'd rather have an int[] than a List<Integer>, you can just unbox them afterwards. Unfortunately, you have go through the boxing step to apply a custom Comparator, so there's no eliminating that part of the process.
List<Integer> ys=IntStream.range(0, 10)
.boxed()
.sorted( (a, b) -> a.hashCode() - b.hashCode() )
.mapToInt( a -> a.intValue())
.toArray();
List<Integer> randomShuffledRange(int startInclusive, int endExclusive) {
return new Random().ints(startInclusive, endExclusive)
.distinct()
.limit(endExclusive-startInclusive)
.boxed()
.collect(Collectors.toList());
}
var shuffled = randomShuffledRange(1, 10);
System.out.println(shuffled);
Example output:
[4, 6, 8, 9, 1, 7, 3, 5, 2]
This is my one line solution:
I am picking one random color:
colourRepository.findAll().stream().sorted((o1,o2)-> RandomUtils.nextInt(-1,1)).findFirst().get()