Java 8 Stream API - Select the lowest key after group by - java

I have a stream of Foo objects.
class Foo {
private int variableCount;
public Foo(int vars) {
this.variableCount = vars;
}
public Integer getVariableCount() {
return variableCount;
}
}
I want a list of Foo's that all have the lowest variableCount.
For example
new Foo(3), new Foo(3), new Foo(2), new Foo(1), new Foo(1)
I only want the stream to return the last 2 Foos, since they have the lowest value.
I've tried doing a collect with grouping by
.collect(Collectors.groupingBy((Foo foo) -> {
return foo.getVariableCount();
})
And that returns a Map<Integer, List<Foo>> and I'm not sure how to transform that into what I want.
Thanks in advance

You can use a sorted map for grouping and then just get the first entry.
Something along the lines:
Collectors.groupingBy(
Foo::getVariableCount,
TreeMap::new,
Collectors.toList())
.firstEntry()
.getValue()

Here is a solution that:
Only streams the list once.
Doesn't build a map or other structure that contains all of the input items (unless the variable counts are all the same), only keeping those that are currently the minimum.
Is O(n) time, O(n) space. It's entirely possible that all Foos have the same variable count, in which case this solution would store all items like other solutions. But in practice, with different, varied values and higher cardinality, the number of items in the list is likely to be much lower.
Edited
I've improved my solution according to the suggestions in the comments.
I implemented an accumulator object, which supplies functions to the Collector for this.
/**
* Accumulator object to hold the current min
* and the list of Foos that are the min.
*/
class Accumulator {
Integer min;
List<Foo> foos;
Accumulator() {
min = Integer.MAX_VALUE;
foos = new ArrayList<>();
}
void accumulate(Foo f) {
if (f.getVariableCount() != null) {
if (f.getVariableCount() < min) {
min = f.getVariableCount();
foos.clear();
foos.add(f);
} else if (f.getVariableCount() == min) {
foos.add(f);
}
}
}
Accumulator combine(Accumulator other) {
if (min < other.min) {
return this;
}
else if (min > other.min) {
return other;
}
else {
foos.addAll(other.foos);
return this;
}
}
List<Foo> getFoos() { return foos; }
}
Then all we have to do is collect, referencing the accumulator's methods for its functions.
List<Foo> mins = foos.stream().collect(Collector.of(
Accumulator::new,
Accumulator::accumulate,
Accumulator::combine,
Accumulator::getFoos
)
);
Testing with
List<Foo> foos = Arrays.asList(new Foo(3), new Foo(3), new Foo(2), new Foo(1), new Foo(1), new Foo(4));
The output is (with a suitable toString defined on Foo):
[Foo{1}, Foo{1}]

IF you are OK streaming (iterating) twice:
private static List<Foo> mins(List<Foo> foos) {
return foos.stream()
.map(Foo::getVariableCount)
.min(Comparator.naturalOrder())
.map(x -> foos.stream()
.filter(y -> y.getVariableCount() == x)
.collect(Collectors.toList()))
.orElse(Collections.emptyList());
}

To avoid creating the entire map and also avoiding streaming twice, I copied a custom collector from here https://stackoverflow.com/a/30497254/1264846 and modified it to work with min instead of max. I didn't even know custom collectors were possible so I thank #lexicore for pointing me in that direction.
This is the resulting function minAll
public static <T, A, D> Collector<T, ?, D> minAll(Comparator<? super T> comparator,
Collector<? super T, A, D> downstream) {
Supplier<A> downstreamSupplier = downstream.supplier();
BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
BinaryOperator<A> downstreamCombiner = downstream.combiner();
class Container {
A acc;
T obj;
boolean hasAny;
Container(A acc) {
this.acc = acc;
}
}
Supplier<Container> supplier = () -> new Container(downstreamSupplier.get());
BiConsumer<Container, T> accumulator = (acc, t) -> {
if(!acc.hasAny) {
downstreamAccumulator.accept(acc.acc, t);
acc.obj = t;
acc.hasAny = true;
} else {
int cmp = comparator.compare(t, acc.obj);
if (cmp < 0) {
acc.acc = downstreamSupplier.get();
acc.obj = t;
}
if (cmp <= 0)
downstreamAccumulator.accept(acc.acc, t);
}
};
BinaryOperator<Container> combiner = (acc1, acc2) -> {
if (!acc2.hasAny) {
return acc1;
}
if (!acc1.hasAny) {
return acc2;
}
int cmp = comparator.compare(acc1.obj, acc2.obj);
if (cmp < 0) {
return acc1;
}
if (cmp > 0) {
return acc2;
}
acc1.acc = downstreamCombiner.apply(acc1.acc, acc2.acc);
return acc1;
};
Function<Container, D> finisher = acc -> downstream.finisher().apply(acc.acc);
return Collector.of(supplier, accumulator, combiner, finisher);
}

You could use collect wisely on the sorted list and in accumulator add the logic to add only either first element to empty list or add any other Foo having variable count same as of the first element of the list.
A complete working example below:-
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
class Foo {
private int variableCount;
public Foo(int vars) {
this.variableCount = vars;
}
public Integer getVariableCount() {
return variableCount;
}
public static void main(String[] args) {
List<Foo> list = Arrays.asList(
new Foo(2),
new Foo(2),
new Foo(3),
new Foo(3),
new Foo(1),
new Foo(1)
);
System.out.println(list.stream()
.sorted(Comparator.comparing(Foo::getVariableCount))
.collect(() -> new ArrayList<Foo>(),
(ArrayList<Foo> arrayList, Foo e) -> {
if (arrayList.isEmpty()
|| arrayList.get(0).getVariableCount() == e.getVariableCount()) {
arrayList.add(e);
}
},
(ArrayList<Foo> foos, ArrayList<Foo> foo) -> foos.addAll(foo)
)
);
}
#Override
public String toString() {
return "Foo{" +
"variableCount=" + variableCount +
'}';
}
}
Also, you could first find the minimum variableCount in one stream and use that inside filter of another stream.
list.sort(Comparator.comparing(Foo::getVariableCount));
int min = list.get(0).getVariableCount();
list.stream().filter(foo -> foo.getVariableCount() == min)
.collect(Collectors.toList());
I think in any case either sorting is required or a way to find the minimum number which later can be used inside the predicate. Even if you are using the map to group the values.
Cheers!

Here is alternative with one stream and custom reducer. The idea is to first sort and then collect only elements with first min value:
List<Foo> newlist = list.stream()
.sorted( Comparator.comparing(Foo::getVariableCount) )
.reduce( new ArrayList<>(),
(l, f) -> {
if ( l.isEmpty() || l.get(0).getVariableCount() == f.getVariableCount() ) l.add(f);
return l;
},
(l1, l2) -> {
l1.addAll(l2);
return l1;
}
);
Or using collect is even more compact:
List<Foo> newlist = list.stream()
.sorted( Comparator.comparing(Foo::getVariableCount) )
.collect( ArrayList::new,
(l, f) -> if ( l.isEmpty() || l.get(0).getVariableCount() == f.getVariableCount() ) l.add(f),
List::addAll
);

To avoid creating the map you could use two streams :
the first finds the minimum value.
the second filters elements with this value.
It could give :
List<Foo> foos = ...;
int min = foos.stream()
.mapToInt(Foo::getVariableCount)
.min()
.orElseThrow(RuntimeException::new); // technical error
List<Foo> minFoos = foos.stream()
.filter(f -> f.getVariableCount() == min)
.collect(Collectors.toList());

Related

How to get a random element using Java Stream API? [duplicate]

What is the most effective way to get a random element from a list with Java8 stream api?
Arrays.asList(new Obj1(), new Obj2(), new Obj3());
Thanks.
Why with streams? You just have to get a random number from 0 to the size of the list and then call get on this index:
Random r = new Random();
ElementType e = list.get(r.nextInt(list.size()));
Stream will give you nothing interesting here, but you can try with:
Random r = new Random();
ElementType e = list.stream().skip(r.nextInt(list.size())).findFirst().get();
Idea is to skip an arbitrary number of elements (but not the last one!), then get the first element if it exists. As a result you will have an Optional<ElementType> which will be non empty and then extract its value with get. You have a lot of options here after having skip.
Using streams here is highly inefficient...
Note: that none of these solutions take in account empty lists, but the problem is defined on non-empty lists.
There are much more efficient ways to do it, but if this has to be Stream the easiest way is to create your own Comparator, which returns random result (-1, 0, 1) and sort your stream:
List<String> strings = Arrays.asList("a", "b", "c", "d", "e", "f");
String randomString = strings
.stream()
.sorted((o1, o2) -> ThreadLocalRandom.current().nextInt(-1, 2))
.findAny()
.get();
ThreadLocalRandom has ready "out of the box" method to get random number in your required range for comparator.
While all the given answers work, there is a simple one-liner that does the trick without having to check if the list is empty first:
List<String> list = List.of("a", "b", "c");
list.stream().skip((int) (list.size() * Math.random())).findAny();
For an empty list this will return an Optional.empty.
In the last time I needed to do something like that I did that:
List<String> list = Arrays.asList("a", "b", "c");
Collections.shuffle(list);
String letter = list.stream().findAny().orElse(null);
System.out.println(letter);
If you HAVE to use streams, I wrote an elegant, albeit very inefficient collector that does the job:
/**
* Returns a random item from the stream (or null in case of an empty stream).
* This operation can't be lazy and is inefficient, and therefore shouldn't
* be used on streams with a large number or items or in performance critical sections.
* #return a random item from the stream or null if the stream is empty.
*/
public static <T> Collector<T, List<T>, T> randomItem() {
final Random RANDOM = new Random();
return Collector.of(() -> (List<T>) new ArrayList<T>(),
(acc, elem) -> acc.add(elem),
(list1, list2) -> ListUtils.union(list1, list2), // Using a 3rd party for list union, could be done "purely"
list -> list.isEmpty() ? null : list.get(RANDOM.nextInt(list.size())));
}
Usage:
#Test
public void standardRandomTest() {
assertThat(Stream.of(1, 2, 3, 4).collect(randomItem())).isBetween(1, 4);
}
Another idea would be to implement your own Spliterator and then use it as a source for Stream:
import java.util.List;
import java.util.Random;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
public class ImprovedRandomSpliterator<T> implements Spliterator<T> {
private final Random random;
private final T[] source;
private int size;
ImprovedRandomSpliterator(List<T> source, Supplier<? extends Random> random) {
if (source.isEmpty()) {
throw new IllegalArgumentException("RandomSpliterator can't be initialized with an empty collection");
}
this.source = (T[]) source.toArray();
this.random = random.get();
this.size = this.source.length;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (size > 0) {
int nextIdx = random.nextInt(size);
int lastIdx = size - 1;
action.accept(source[nextIdx]);
source[nextIdx] = source[lastIdx];
source[lastIdx] = null; // let object be GCed
size--;
return true;
} else {
return false;
}
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.length;
}
#Override
public int characteristics() {
return SIZED;
}
}
public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
and then simply:
list.stream()
.collect(toShuffledStream())
.findAny();
Details can be found here.
...but it's definitely an overkill, so if you're looking for a pragmatic approach. Definitely go for Jean's solution.
If you don't know in advance the size of the your list, you could do something like that :
yourStream.collect(new RandomListCollector<>(randomSetSize));
I guess that you will have to write your own Collector implementation like this one to have an homogeneous randomization :
public class RandomListCollector<T> implements Collector<T, RandomListCollector.ListAccumulator<T>, List<T>> {
private final Random rand;
private final int size;
public RandomListCollector(Random random , int size) {
super();
this.rand = random;
this.size = size;
}
public RandomListCollector(int size) {
this(new Random(System.nanoTime()), size);
}
#Override
public Supplier<ListAccumulator<T>> supplier() {
return () -> new ListAccumulator<T>();
}
#Override
public BiConsumer<ListAccumulator<T>, T> accumulator() {
return (l, t) -> {
if (l.size() < size) {
l.add(t);
} else if (rand.nextDouble() <= ((double) size) / (l.gSize() + 1)) {
l.add(t);
l.remove(rand.nextInt(size));
} else {
// in any case gSize needs to be incremented
l.gSizeInc();
}
};
}
#Override
public BinaryOperator<ListAccumulator<T>> combiner() {
return (l1, l2) -> {
int lgSize = l1.gSize() + l2.gSize();
ListAccumulator<T> l = new ListAccumulator<>();
if (l1.size() + l2.size()<size) {
l.addAll(l1);
l.addAll(l2);
} else {
while (l.size() < size) {
if (l1.size()==0 || l2.size()>0 && rand.nextDouble() < (double) l2.gSize() / (l1.gSize() + l2.gSize())) {
l.add(l2.remove(rand.nextInt(l2.size()), true));
} else {
l.add(l1.remove(rand.nextInt(l1.size()), true));
}
}
}
// set the gSize of l :
l.gSize(lgSize);
return l;
};
}
#Override
public Function<ListAccumulator<T>, List<T>> finisher() {
return (la) -> la.list;
}
#Override
public Set<Characteristics> characteristics() {
return Collections.singleton(Characteristics.CONCURRENT);
}
static class ListAccumulator<T> implements Iterable<T> {
List<T> list;
volatile int gSize;
public ListAccumulator() {
list = new ArrayList<>();
gSize = 0;
}
public void addAll(ListAccumulator<T> l) {
list.addAll(l.list);
gSize += l.gSize;
}
public T remove(int index) {
return remove(index, false);
}
public T remove(int index, boolean global) {
T t = list.remove(index);
if (t != null && global)
gSize--;
return t;
}
public void add(T t) {
list.add(t);
gSize++;
}
public int gSize() {
return gSize;
}
public void gSize(int gSize) {
this.gSize = gSize;
}
public void gSizeInc() {
gSize++;
}
public int size() {
return list.size();
}
#Override
public Iterator<T> iterator() {
return list.iterator();
}
}
}
If you want something easier and still don't want to load all your list in memory:
public <T> Stream<T> getRandomStreamSubset(Stream<T> stream, int subsetSize) {
int cnt = 0;
Random r = new Random(System.nanoTime());
Object[] tArr = new Object[subsetSize];
Iterator<T> iter = stream.iterator();
while (iter.hasNext() && cnt <subsetSize) {
tArr[cnt++] = iter.next();
}
while (iter.hasNext()) {
cnt++;
T t = iter.next();
if (r.nextDouble() <= (double) subsetSize / cnt) {
tArr[r.nextInt(subsetSize)] = t;
}
}
return Arrays.stream(tArr).map(o -> (T)o );
}
but you are then away from the stream api and could do the same with a basic iterator
The selected answer has errors in its stream solution...
You cannot use Random#nextInt with a non-positive long, "0" in this case.
The stream solution will also never choose the last in the list
Example:
List<Integer> intList = Arrays.asList(0, 1, 2, 3, 4);
// #nextInt is exclusive, so here it means a returned value of 0-3
// if you have a list of size = 1, #next Int will throw an IllegalArgumentException (bound must be positive)
int skipIndex = new Random().nextInt(intList.size()-1);
// randomInt will only ever be 0, 1, 2, or 3. Never 4
int randomInt = intList.stream()
.skip(skipIndex) // max skip of list#size - 2
.findFirst()
.get();
My recommendation would be to go with the non-stream approach that Jean-Baptiste Yunès put forth, but if you must do a stream approach, you could do something like this (but it's a little ugly):
list.stream()
.skip(list.isEmpty ? 0 : new Random().nextInt(list.size()))
.findFirst();
Sometimes you may want to get a random item somewhere in the stream. If you want to get random items even after filtering your list, this code snippet will work for you:
List<String> items = Arrays.asList("A", "B", "C", "D", "E");
List<String> shuffledAndFilteredItems = items.stream()
.filter(value -> value.equals("A") || value.equals("B"))
//filter, map...
.collect(Collectors.collectingAndThen(
Collectors.toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list;
}));
String randomItem = shuffledAndFilteredItems
.stream()
.findFirst()
.orElse(null);
Of course there may be faster / optimized ways, but it allows you to do it all at once.

How to interleave (merge) two Java 8 Streams?

Stream<String> a = Stream.of("one", "three", "five");
Stream<String> b = Stream.of("two", "four", "six");
What do I need to do for the output to be the below?
// one
// two
// three
// four
// five
// six
I looked into concat but as the javadoc explains, it just appends one after the other, it does not interleave / intersperse.
Stream<String> out = Stream.concat(a, b);
out.forEach(System.out::println);
Creates a lazily concatenated stream whose elements are all the
elements of the first stream followed by all the elements of the
second stream.
Wrongly gives
// one
// three
// five
// two
// four
// six
Could do it if I collected them and iterated, but was hoping for something more Java8-y, Streamy :-)
Note
I don't want to zip the streams
“zip” operation will take an element from each collection and combine them.
the result of a zip operation would be something like this: (unwanted)
// onetwo
// threefour
// fivesix
I’d use something like this:
public static <T> Stream<T> interleave(Stream<? extends T> a, Stream<? extends T> b) {
Spliterator<? extends T> spA = a.spliterator(), spB = b.spliterator();
long s = spA.estimateSize() + spB.estimateSize();
if(s < 0) s = Long.MAX_VALUE;
int ch = spA.characteristics() & spB.characteristics()
& (Spliterator.NONNULL|Spliterator.SIZED);
ch |= Spliterator.ORDERED;
return StreamSupport.stream(new Spliterators.AbstractSpliterator<T>(s, ch) {
Spliterator<? extends T> sp1 = spA, sp2 = spB;
#Override
public boolean tryAdvance(Consumer<? super T> action) {
Spliterator<? extends T> sp = sp1;
if(sp.tryAdvance(action)) {
sp1 = sp2;
sp2 = sp;
return true;
}
return sp2.tryAdvance(action);
}
}, false);
}
It retains the characteristics of the input streams as far as possible, which allows certain optimizations (e.g. for count()and toArray()). Further, it adds the ORDERED even when the input streams might be unordered, to reflect the interleaving.
When one stream has more elements than the other, the remaining elements will appear at the end.
A much dumber solution than Holger did, but may be it would fit your requirements:
private static <T> Stream<T> interleave(Stream<T> left, Stream<T> right) {
Spliterator<T> splLeft = left.spliterator();
Spliterator<T> splRight = right.spliterator();
T[] single = (T[]) new Object[1];
Stream.Builder<T> builder = Stream.builder();
while (splRight.tryAdvance(x -> single[0] = x) && splLeft.tryAdvance(builder)) {
builder.add(single[0]);
}
return builder.build();
}
As you can see from the question comments, I gave this a go using zip:
Stream<String> a = Stream.of("one", "three", "five");
Stream<String> b = Stream.of("two", "four", "six");
Stream<String> out = interleave(a, b);
public static <T> Stream<T> interleave(Stream<T> streamA, Stream<T> streamB) {
return zip(streamA, streamB, (o1, o2) -> Stream.of(o1, o2)).flatMap(s -> s);
}
/**
* https://stackoverflow.com/questions/17640754/zipping-streams-using-jdk8-with-lambda-java-util-stream-streams-zip
**/
private static <A, B, C> Stream<C> zip(Stream<A> streamA, Stream<B> streamB, BiFunction<A, B, C> zipper) {
final Iterator<A> iteratorA = streamA.iterator();
final Iterator<B> iteratorB = streamB.iterator();
final Iterator<C> iteratorC = new Iterator<C>() {
#Override
public boolean hasNext() {
return iteratorA.hasNext() && iteratorB.hasNext();
}
#Override
public C next() {
return zipper.apply(iteratorA.next(), iteratorB.next());
}
};
final boolean parallel = streamA.isParallel() || streamB.isParallel();
return iteratorToFiniteStream(iteratorC, parallel);
}
private static <T> Stream<T> iteratorToFiniteStream(Iterator<T> iterator, boolean parallel) {
final Iterable<T> iterable = () -> iterator;
return StreamSupport.stream(iterable.spliterator(), parallel);
}
This may not be a good answer because
(1) it collects to map, which you don't want to do I guess and
(2) it is not completely stateless as it uses AtomicIntegers.
Still adding it because
(1) it is readable and
(2) community can get an idea from this and try to improve it.
Stream<String> a = Stream.of("one", "three", "five");
Stream<String> b = Stream.of("two", "four", "six");
AtomicInteger i = new AtomicInteger(0);
AtomicInteger j = new AtomicInteger(1);
Stream.of(a.collect(Collectors.toMap(o -> i.addAndGet(2), Function.identity())),
b.collect(Collectors.toMap(o -> j.addAndGet(2), Function.identity())))
.flatMap(m -> m.entrySet().stream())
.sorted(Comparator.comparing(Map.Entry::getKey))
.forEach(e -> System.out.println(e.getValue())); // or collect
Output
one
two
three
four
five
six
#Holger's edit
Stream.concat(a.map(o -> new AbstractMap.SimpleEntry<>(i.addAndGet(2), o)),
b.map(o -> new AbstractMap.SimpleEntry<>(j.addAndGet(2), o)))
.sorted(Map.Entry.comparingByKey())
.forEach(e -> System.out.println(e.getValue())); // or collect
without any external lib (using jdk11)
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class MergeUtil {
private static <T> Stream<T> zipped(List<T> lista, List<T> listb) {
int maxSize = Math.max(lista.size(), listb.size());
final var listStream = IntStream
.range(0, maxSize)
.mapToObj(i -> {
List<T> result = new ArrayList<>(2);
if (i < lista.size()) result.add(lista.get(i));
if (i < listb.size()) result.add(listb.get(i));
return result;
});
return listStream.flatMap(List::stream);
}
public static void main(String[] args) {
var l1 = List.of(1, 2, 3);
var l2 = List.of(4, 5, 6, 7, 8, 9);
final var zip = zipped(l1, l2);
System.out.println(zip.collect(Collectors.toList()));
}
}
listStream is a Stream<List<A>> that flatted in return.
The result is:
[1, 4, 2, 5, 3, 6, 7, 8, 9]
One solution with Iterator
final Iterator<String> iterA = a.iterator();
final Iterator<String> iterB = b.iterator();
final Iterator<String> iter = new Iterator<String>() {
private final AtomicInteger idx = new AtomicInteger();
#Override
public boolean hasNext() {
return iterA.hasNext() || iterB.hasNext();
}
#Override
public String next() {
return idx.getAndIncrement() % 2 == 0 && iterA.hasNext() ? iterA.next() : iterB.next();
}
};
// Create target Stream with StreamEx from: https://github.com/amaembo/streamex
StreamEx.of(iter).forEach(System.out::println);
// Or Streams from Google Guava
Streams.stream(iter).forEach(System.out::println);
Or simply by the solution in abacus-common provided by me:
AtomicInteger idx = new AtomicInteger();
StreamEx.merge(a, b, (s1, s2) -> idx.getAndIncrement() % 2 == 0 ? Nth.FIRST : Nth.SECOND).forEach(Fn.println());
Using Guava's Streams.zip and Stream.flatMap:
Stream<String> interleaved = Streams
.zip(a, b, (x, y) -> Stream.of(x, y))
.flatMap(Function.identity());
interleaved.forEach(System.out::println);
Prints:
one
two
three
four
five
six

How can I collect Iterator values into a list of 50 elements

I want to be able to collect a list of elements of fixed size of 50 elements. Here is how I am currently doing it. I would like to use lambdas if possible.
List<Contact> contactList=getContacts();
Iterator<Contact> it=contactList.iterator();
List<Contact> batch=new ArrayList<>();
while(it.hasNext()) {
if(batch.size()<50) {
batch.add(it.next())
} else {
processBatch(batch);
}
//When iterator has less than 50 elements
if (!it.hasNext() && batch.size()<50) {
processBatch(batch);
}
}
You can do it in that way :
Iterable<String> iterable = () -> it;
contactList.addAll(StreamSupport.stream(iterable.spliterator(), false)
.limit(50)
.collect(Collectors.toList()));
Approach-1
public static void main(String[] args) {
List<Integer> list = IntStream.range(0, 280).boxed().collect(toList());
AtomicInteger count = new AtomicInteger();
StreamSupport.stream(list.spliterator(), false)
.collect(groupingBy(e -> count.getAndIncrement() / 50,
collectingAndThen(toList(), l -> {
processBatch(l);
return null;
})));
}
public static <T extends Object> void processBatch(List<T> list) {
System.out.println(list.size());
}
I have taken AtomicInteger to act as mutable counter object. If you are using apache commons lang API then replace AtomicInteger with MutableInt object.
Approach-2
If you can directly use list object rather than using iterator, then we can code like below. Here external counter object not require.
IntStream.range(0, list.size()).mapToObj(i -> new Object[] { i, list.get(i) }).collect(groupingBy(
arr -> (int) arr[0] / 50, Collectors.mapping(arr -> arr[1], collectingAndThen(toList(), l -> {
processBatch(l);
return null;
}))));

Perform multiple unrelated operations on elements of a single stream in Java

How can I perform multiple unrelated operations on elements of a single stream?
Say I have a List<String> composed from a text. Each string in the list may or may not contain a certain word, which represents an action to perform. Let's say that:
if the string contains 'of', all the words in that string must be counted
if the string contains 'for', the portion after the first occurrence of 'for' must be returned, yielding a List<String> with all substrings
Of course, I could do something like this:
List<String> strs = ...;
List<Integer> wordsInStr = strs.stream()
.filter(t -> t.contains("of"))
.map(t -> t.split(" ").length)
.collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream()
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
but then the list would be traversed twice, which could result in a performance penalty if strs contained lots of elements.
Is it possible to somehow execute those two operations without traversing twice over the list?
If you want a single pass Stream then you have to use a custom Collector (parallelization possible).
class Splitter {
public List<String> words = new ArrayList<>();
public List<Integer> counts = new ArrayList<>();
public void accept(String s) {
if(s.contains("of")) {
counts.add(s.split(" ").length);
} else if(s.contains("for")) {
words.add(s.substring(s.indexOf("for")));
}
}
public Splitter merge(Splitter other) {
words.addAll(other.words);
counts.addAll(other.counts);
return this;
}
}
Splitter collect = strs.stream().collect(
Collector.of(Splitter::new, Splitter::accept, Splitter::merge)
);
System.out.println(collect.counts);
System.out.println(collect.words);
Here is the answer to address the OP from a different aspect. First of all, let's take a look how fast/slow to iterate a list/collection. Here is the test result on my machine by the below performance test:
When: length of string list = 100, Thread number = 1, loops = 1000, unit = milliseconds
OP: 0.013
Accepted answer: 0.020
By the counter function: 0.010
When: length of string list = 1000_000, Thread number = 1, loops = 100, unit = milliseconds
OP: 99.387
Accepted answer: 89.848
By the counter function: 59.183
Conclusion: The percentage of performance improvement is pretty small or even slower(if the length of string list is small). generally, it's a mistake to reduce the iteration of list/collection which is loaded in memory by the more complicate collector. you won't get much performance improvements. we should look into somewhere else if there is a performance issue.
Here is my performance test code with tool Profiler: (I'm not going to discuss how to do a performance test here. if you doubt the test result, you can do it again with any tool you believe in)
#Test
public void test_46539786() {
final int strsLength = 1000_000;
final int threadNum = 1;
final int loops = 100;
final int rounds = 3;
final List<String> strs = IntStream.range(0, strsLength).mapToObj(i -> i % 2 == 0 ? i + " of " + i : i + " for " + i).toList();
Profiler.run(threadNum, loops, rounds, "OP", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(t -> t.split(" ").length).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
Profiler.run(threadNum, loops, rounds, "Accepted answer", () -> {
Splitter collect = strs.stream().collect(Collector.of(Splitter::new, Splitter::accept, Splitter::merge));
assertTrue(collect.counts.size() == collect.words.size());
}).printResult();
final Function<String, Integer> counter = s -> {
int count = 0;
for (int i = 0, len = s.length(); i < len; i++) {
if (s.charAt(i) == ' ') {
count++;
}
}
return count;
};
Profiler.run(threadNum, loops, rounds, "By the counter function", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(counter).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
}
You could use a custom collector for that and iterate only once:
private static <T, R> Collector<String, ?, Pair<List<String>, List<Long>>> multiple() {
class Acc {
List<String> strings = new ArrayList<>();
List<Long> longs = new ArrayList<>();
void add(String elem) {
if (elem.contains("of")) {
long howMany = Arrays.stream(elem.split(" ")).count();
longs.add(howMany);
}
if (elem.contains("for")) {
String result = elem.substring(elem.indexOf("for"));
strings.add(result);
}
}
Acc merge(Acc right) {
longs.addAll(right.longs);
strings.addAll(right.strings);
return this;
}
public Pair<List<String>, List<Long>> finisher() {
return Pair.of(strings, longs);
}
}
return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}
Usage would be:
Pair<List<String>, List<Long>> pair = Stream.of("t of r m", "t of r m", "nice for nice nice again")
.collect(multiple());
If you want to have 1 stream through a list, you need a way to manage 2 different states, you can do this by implementing Consumer to new class.
class WordsInStr implements Consumer<String> {
ArrayList<Integer> list = new ArrayList<>();
#Override
public void accept(String s) {
Stream.of(s).filter(t -> t.contains("of")) //probably would be faster without stream here
.map(t -> t.split(" ").length)
.forEach(list::add);
}
}
class LinePortionAfterFor implements Consumer<String> {
ArrayList<String> list = new ArrayList<>();
#Override
public void accept(String s) {
Stream.of(s) //probably would be faster without stream here
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.forEach(list::add);
}
}
WordsInStr w = new WordsInStr();
LinePortionAfterFor l = new LinePortionAfterFor();
strs.stream()//stream not needed here
.forEach(w.andThen(l));
System.out.println(w.list);
System.out.println(l.list);

How to get a random element from a list with stream api?

What is the most effective way to get a random element from a list with Java8 stream api?
Arrays.asList(new Obj1(), new Obj2(), new Obj3());
Thanks.
Why with streams? You just have to get a random number from 0 to the size of the list and then call get on this index:
Random r = new Random();
ElementType e = list.get(r.nextInt(list.size()));
Stream will give you nothing interesting here, but you can try with:
Random r = new Random();
ElementType e = list.stream().skip(r.nextInt(list.size())).findFirst().get();
Idea is to skip an arbitrary number of elements (but not the last one!), then get the first element if it exists. As a result you will have an Optional<ElementType> which will be non empty and then extract its value with get. You have a lot of options here after having skip.
Using streams here is highly inefficient...
Note: that none of these solutions take in account empty lists, but the problem is defined on non-empty lists.
There are much more efficient ways to do it, but if this has to be Stream the easiest way is to create your own Comparator, which returns random result (-1, 0, 1) and sort your stream:
List<String> strings = Arrays.asList("a", "b", "c", "d", "e", "f");
String randomString = strings
.stream()
.sorted((o1, o2) -> ThreadLocalRandom.current().nextInt(-1, 2))
.findAny()
.get();
ThreadLocalRandom has ready "out of the box" method to get random number in your required range for comparator.
While all the given answers work, there is a simple one-liner that does the trick without having to check if the list is empty first:
List<String> list = List.of("a", "b", "c");
list.stream().skip((int) (list.size() * Math.random())).findAny();
For an empty list this will return an Optional.empty.
In the last time I needed to do something like that I did that:
List<String> list = Arrays.asList("a", "b", "c");
Collections.shuffle(list);
String letter = list.stream().findAny().orElse(null);
System.out.println(letter);
If you HAVE to use streams, I wrote an elegant, albeit very inefficient collector that does the job:
/**
* Returns a random item from the stream (or null in case of an empty stream).
* This operation can't be lazy and is inefficient, and therefore shouldn't
* be used on streams with a large number or items or in performance critical sections.
* #return a random item from the stream or null if the stream is empty.
*/
public static <T> Collector<T, List<T>, T> randomItem() {
final Random RANDOM = new Random();
return Collector.of(() -> (List<T>) new ArrayList<T>(),
(acc, elem) -> acc.add(elem),
(list1, list2) -> ListUtils.union(list1, list2), // Using a 3rd party for list union, could be done "purely"
list -> list.isEmpty() ? null : list.get(RANDOM.nextInt(list.size())));
}
Usage:
#Test
public void standardRandomTest() {
assertThat(Stream.of(1, 2, 3, 4).collect(randomItem())).isBetween(1, 4);
}
Another idea would be to implement your own Spliterator and then use it as a source for Stream:
import java.util.List;
import java.util.Random;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
public class ImprovedRandomSpliterator<T> implements Spliterator<T> {
private final Random random;
private final T[] source;
private int size;
ImprovedRandomSpliterator(List<T> source, Supplier<? extends Random> random) {
if (source.isEmpty()) {
throw new IllegalArgumentException("RandomSpliterator can't be initialized with an empty collection");
}
this.source = (T[]) source.toArray();
this.random = random.get();
this.size = this.source.length;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (size > 0) {
int nextIdx = random.nextInt(size);
int lastIdx = size - 1;
action.accept(source[nextIdx]);
source[nextIdx] = source[lastIdx];
source[lastIdx] = null; // let object be GCed
size--;
return true;
} else {
return false;
}
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.length;
}
#Override
public int characteristics() {
return SIZED;
}
}
public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
and then simply:
list.stream()
.collect(toShuffledStream())
.findAny();
Details can be found here.
...but it's definitely an overkill, so if you're looking for a pragmatic approach. Definitely go for Jean's solution.
If you don't know in advance the size of the your list, you could do something like that :
yourStream.collect(new RandomListCollector<>(randomSetSize));
I guess that you will have to write your own Collector implementation like this one to have an homogeneous randomization :
public class RandomListCollector<T> implements Collector<T, RandomListCollector.ListAccumulator<T>, List<T>> {
private final Random rand;
private final int size;
public RandomListCollector(Random random , int size) {
super();
this.rand = random;
this.size = size;
}
public RandomListCollector(int size) {
this(new Random(System.nanoTime()), size);
}
#Override
public Supplier<ListAccumulator<T>> supplier() {
return () -> new ListAccumulator<T>();
}
#Override
public BiConsumer<ListAccumulator<T>, T> accumulator() {
return (l, t) -> {
if (l.size() < size) {
l.add(t);
} else if (rand.nextDouble() <= ((double) size) / (l.gSize() + 1)) {
l.add(t);
l.remove(rand.nextInt(size));
} else {
// in any case gSize needs to be incremented
l.gSizeInc();
}
};
}
#Override
public BinaryOperator<ListAccumulator<T>> combiner() {
return (l1, l2) -> {
int lgSize = l1.gSize() + l2.gSize();
ListAccumulator<T> l = new ListAccumulator<>();
if (l1.size() + l2.size()<size) {
l.addAll(l1);
l.addAll(l2);
} else {
while (l.size() < size) {
if (l1.size()==0 || l2.size()>0 && rand.nextDouble() < (double) l2.gSize() / (l1.gSize() + l2.gSize())) {
l.add(l2.remove(rand.nextInt(l2.size()), true));
} else {
l.add(l1.remove(rand.nextInt(l1.size()), true));
}
}
}
// set the gSize of l :
l.gSize(lgSize);
return l;
};
}
#Override
public Function<ListAccumulator<T>, List<T>> finisher() {
return (la) -> la.list;
}
#Override
public Set<Characteristics> characteristics() {
return Collections.singleton(Characteristics.CONCURRENT);
}
static class ListAccumulator<T> implements Iterable<T> {
List<T> list;
volatile int gSize;
public ListAccumulator() {
list = new ArrayList<>();
gSize = 0;
}
public void addAll(ListAccumulator<T> l) {
list.addAll(l.list);
gSize += l.gSize;
}
public T remove(int index) {
return remove(index, false);
}
public T remove(int index, boolean global) {
T t = list.remove(index);
if (t != null && global)
gSize--;
return t;
}
public void add(T t) {
list.add(t);
gSize++;
}
public int gSize() {
return gSize;
}
public void gSize(int gSize) {
this.gSize = gSize;
}
public void gSizeInc() {
gSize++;
}
public int size() {
return list.size();
}
#Override
public Iterator<T> iterator() {
return list.iterator();
}
}
}
If you want something easier and still don't want to load all your list in memory:
public <T> Stream<T> getRandomStreamSubset(Stream<T> stream, int subsetSize) {
int cnt = 0;
Random r = new Random(System.nanoTime());
Object[] tArr = new Object[subsetSize];
Iterator<T> iter = stream.iterator();
while (iter.hasNext() && cnt <subsetSize) {
tArr[cnt++] = iter.next();
}
while (iter.hasNext()) {
cnt++;
T t = iter.next();
if (r.nextDouble() <= (double) subsetSize / cnt) {
tArr[r.nextInt(subsetSize)] = t;
}
}
return Arrays.stream(tArr).map(o -> (T)o );
}
but you are then away from the stream api and could do the same with a basic iterator
The selected answer has errors in its stream solution...
You cannot use Random#nextInt with a non-positive long, "0" in this case.
The stream solution will also never choose the last in the list
Example:
List<Integer> intList = Arrays.asList(0, 1, 2, 3, 4);
// #nextInt is exclusive, so here it means a returned value of 0-3
// if you have a list of size = 1, #next Int will throw an IllegalArgumentException (bound must be positive)
int skipIndex = new Random().nextInt(intList.size()-1);
// randomInt will only ever be 0, 1, 2, or 3. Never 4
int randomInt = intList.stream()
.skip(skipIndex) // max skip of list#size - 2
.findFirst()
.get();
My recommendation would be to go with the non-stream approach that Jean-Baptiste Yunès put forth, but if you must do a stream approach, you could do something like this (but it's a little ugly):
list.stream()
.skip(list.isEmpty ? 0 : new Random().nextInt(list.size()))
.findFirst();
Sometimes you may want to get a random item somewhere in the stream. If you want to get random items even after filtering your list, this code snippet will work for you:
List<String> items = Arrays.asList("A", "B", "C", "D", "E");
List<String> shuffledAndFilteredItems = items.stream()
.filter(value -> value.equals("A") || value.equals("B"))
//filter, map...
.collect(Collectors.collectingAndThen(
Collectors.toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list;
}));
String randomItem = shuffledAndFilteredItems
.stream()
.findFirst()
.orElse(null);
Of course there may be faster / optimized ways, but it allows you to do it all at once.

Categories