Java stream sort map by key and value - java

There is needed to sort words in a file by amount of entrance a definite symbol (firstly sort by entrance amount then by alphabet). For example, for the symbol 'e' the result should be: avrgspeed=2; became=2; because=2 ... automated=1; autowired=1.
Is there better way to write all in stream-style.
public class Sorter {
public Map<String, Integer> getDistinctWordsMap(String path, char symbol) throws IOException {
Pattern delimeter = Pattern.compile("([^a-zA-Z])");
List<WordParameter> parameterList = new ArrayList<>();
Files.lines(Paths.get(path))
.flatMap(delimeter::splitAsStream).map(String::toLowerCase).distinct()
.forEachOrdered(word -> parameterList.add(new WordParameter(word, symbol)));
Collections.sort(parameterList);
return parameterList.stream().filter(w->w.count>0).collect(toMap(n->n.word, n->n.count, (e1, e2) -> e1, LinkedHashMap::new));
}
class WordParameter implements Comparable<WordParameter>{
String word;
int count;
public WordParameter(String word, char symbol) {
this.word = word;
this.count = countEntrance(symbol);
}
private int countEntrance(char symbol){
int quantity = 0;
char[] charArr = word.toCharArray();
for(int i = 0; i<charArr.length; i++){
if(charArr[i]==symbol){
quantity++;
}
}
return quantity;
}
#Override
public int compareTo(WordParameter o) {
if(count<o.count)
return 1;
else if(count>o.count)
return -1;
else {
return word.compareTo(o.word);
}
}
}
}

You could definitely reduce the boilerplate code and make it more concise.
Not tested the code below but something like this should suffice:
Files.lines(Paths.get(path))
.flatMap(delimeter::splitAsStream)
.map(String::toLowerCase)
.filter(s -> s.indexOf(symbol) >= 0)
.distinct()
.map(s -> new SimpleEntry<>(s, s.chars().filter(c -> c == symbol).count()))
.sorted(Map.Entry.<String,Long>comparingByValue(Comparator.reverseOrder())
.thenComparing(Map.Entry::getKey))
.collect(toMap(SimpleEntry::getKey, e -> e.getValue().intValue(), (l, r) -> l, LinkedHashMap::new));
This means you don’t need your custom class anymore as we’re using SimpleEntry in the stream pipeline.

Related

How to get a random element using Java Stream API? [duplicate]

What is the most effective way to get a random element from a list with Java8 stream api?
Arrays.asList(new Obj1(), new Obj2(), new Obj3());
Thanks.
Why with streams? You just have to get a random number from 0 to the size of the list and then call get on this index:
Random r = new Random();
ElementType e = list.get(r.nextInt(list.size()));
Stream will give you nothing interesting here, but you can try with:
Random r = new Random();
ElementType e = list.stream().skip(r.nextInt(list.size())).findFirst().get();
Idea is to skip an arbitrary number of elements (but not the last one!), then get the first element if it exists. As a result you will have an Optional<ElementType> which will be non empty and then extract its value with get. You have a lot of options here after having skip.
Using streams here is highly inefficient...
Note: that none of these solutions take in account empty lists, but the problem is defined on non-empty lists.
There are much more efficient ways to do it, but if this has to be Stream the easiest way is to create your own Comparator, which returns random result (-1, 0, 1) and sort your stream:
List<String> strings = Arrays.asList("a", "b", "c", "d", "e", "f");
String randomString = strings
.stream()
.sorted((o1, o2) -> ThreadLocalRandom.current().nextInt(-1, 2))
.findAny()
.get();
ThreadLocalRandom has ready "out of the box" method to get random number in your required range for comparator.
While all the given answers work, there is a simple one-liner that does the trick without having to check if the list is empty first:
List<String> list = List.of("a", "b", "c");
list.stream().skip((int) (list.size() * Math.random())).findAny();
For an empty list this will return an Optional.empty.
In the last time I needed to do something like that I did that:
List<String> list = Arrays.asList("a", "b", "c");
Collections.shuffle(list);
String letter = list.stream().findAny().orElse(null);
System.out.println(letter);
If you HAVE to use streams, I wrote an elegant, albeit very inefficient collector that does the job:
/**
* Returns a random item from the stream (or null in case of an empty stream).
* This operation can't be lazy and is inefficient, and therefore shouldn't
* be used on streams with a large number or items or in performance critical sections.
* #return a random item from the stream or null if the stream is empty.
*/
public static <T> Collector<T, List<T>, T> randomItem() {
final Random RANDOM = new Random();
return Collector.of(() -> (List<T>) new ArrayList<T>(),
(acc, elem) -> acc.add(elem),
(list1, list2) -> ListUtils.union(list1, list2), // Using a 3rd party for list union, could be done "purely"
list -> list.isEmpty() ? null : list.get(RANDOM.nextInt(list.size())));
}
Usage:
#Test
public void standardRandomTest() {
assertThat(Stream.of(1, 2, 3, 4).collect(randomItem())).isBetween(1, 4);
}
Another idea would be to implement your own Spliterator and then use it as a source for Stream:
import java.util.List;
import java.util.Random;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
public class ImprovedRandomSpliterator<T> implements Spliterator<T> {
private final Random random;
private final T[] source;
private int size;
ImprovedRandomSpliterator(List<T> source, Supplier<? extends Random> random) {
if (source.isEmpty()) {
throw new IllegalArgumentException("RandomSpliterator can't be initialized with an empty collection");
}
this.source = (T[]) source.toArray();
this.random = random.get();
this.size = this.source.length;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (size > 0) {
int nextIdx = random.nextInt(size);
int lastIdx = size - 1;
action.accept(source[nextIdx]);
source[nextIdx] = source[lastIdx];
source[lastIdx] = null; // let object be GCed
size--;
return true;
} else {
return false;
}
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.length;
}
#Override
public int characteristics() {
return SIZED;
}
}
public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
and then simply:
list.stream()
.collect(toShuffledStream())
.findAny();
Details can be found here.
...but it's definitely an overkill, so if you're looking for a pragmatic approach. Definitely go for Jean's solution.
If you don't know in advance the size of the your list, you could do something like that :
yourStream.collect(new RandomListCollector<>(randomSetSize));
I guess that you will have to write your own Collector implementation like this one to have an homogeneous randomization :
public class RandomListCollector<T> implements Collector<T, RandomListCollector.ListAccumulator<T>, List<T>> {
private final Random rand;
private final int size;
public RandomListCollector(Random random , int size) {
super();
this.rand = random;
this.size = size;
}
public RandomListCollector(int size) {
this(new Random(System.nanoTime()), size);
}
#Override
public Supplier<ListAccumulator<T>> supplier() {
return () -> new ListAccumulator<T>();
}
#Override
public BiConsumer<ListAccumulator<T>, T> accumulator() {
return (l, t) -> {
if (l.size() < size) {
l.add(t);
} else if (rand.nextDouble() <= ((double) size) / (l.gSize() + 1)) {
l.add(t);
l.remove(rand.nextInt(size));
} else {
// in any case gSize needs to be incremented
l.gSizeInc();
}
};
}
#Override
public BinaryOperator<ListAccumulator<T>> combiner() {
return (l1, l2) -> {
int lgSize = l1.gSize() + l2.gSize();
ListAccumulator<T> l = new ListAccumulator<>();
if (l1.size() + l2.size()<size) {
l.addAll(l1);
l.addAll(l2);
} else {
while (l.size() < size) {
if (l1.size()==0 || l2.size()>0 && rand.nextDouble() < (double) l2.gSize() / (l1.gSize() + l2.gSize())) {
l.add(l2.remove(rand.nextInt(l2.size()), true));
} else {
l.add(l1.remove(rand.nextInt(l1.size()), true));
}
}
}
// set the gSize of l :
l.gSize(lgSize);
return l;
};
}
#Override
public Function<ListAccumulator<T>, List<T>> finisher() {
return (la) -> la.list;
}
#Override
public Set<Characteristics> characteristics() {
return Collections.singleton(Characteristics.CONCURRENT);
}
static class ListAccumulator<T> implements Iterable<T> {
List<T> list;
volatile int gSize;
public ListAccumulator() {
list = new ArrayList<>();
gSize = 0;
}
public void addAll(ListAccumulator<T> l) {
list.addAll(l.list);
gSize += l.gSize;
}
public T remove(int index) {
return remove(index, false);
}
public T remove(int index, boolean global) {
T t = list.remove(index);
if (t != null && global)
gSize--;
return t;
}
public void add(T t) {
list.add(t);
gSize++;
}
public int gSize() {
return gSize;
}
public void gSize(int gSize) {
this.gSize = gSize;
}
public void gSizeInc() {
gSize++;
}
public int size() {
return list.size();
}
#Override
public Iterator<T> iterator() {
return list.iterator();
}
}
}
If you want something easier and still don't want to load all your list in memory:
public <T> Stream<T> getRandomStreamSubset(Stream<T> stream, int subsetSize) {
int cnt = 0;
Random r = new Random(System.nanoTime());
Object[] tArr = new Object[subsetSize];
Iterator<T> iter = stream.iterator();
while (iter.hasNext() && cnt <subsetSize) {
tArr[cnt++] = iter.next();
}
while (iter.hasNext()) {
cnt++;
T t = iter.next();
if (r.nextDouble() <= (double) subsetSize / cnt) {
tArr[r.nextInt(subsetSize)] = t;
}
}
return Arrays.stream(tArr).map(o -> (T)o );
}
but you are then away from the stream api and could do the same with a basic iterator
The selected answer has errors in its stream solution...
You cannot use Random#nextInt with a non-positive long, "0" in this case.
The stream solution will also never choose the last in the list
Example:
List<Integer> intList = Arrays.asList(0, 1, 2, 3, 4);
// #nextInt is exclusive, so here it means a returned value of 0-3
// if you have a list of size = 1, #next Int will throw an IllegalArgumentException (bound must be positive)
int skipIndex = new Random().nextInt(intList.size()-1);
// randomInt will only ever be 0, 1, 2, or 3. Never 4
int randomInt = intList.stream()
.skip(skipIndex) // max skip of list#size - 2
.findFirst()
.get();
My recommendation would be to go with the non-stream approach that Jean-Baptiste Yunès put forth, but if you must do a stream approach, you could do something like this (but it's a little ugly):
list.stream()
.skip(list.isEmpty ? 0 : new Random().nextInt(list.size()))
.findFirst();
Sometimes you may want to get a random item somewhere in the stream. If you want to get random items even after filtering your list, this code snippet will work for you:
List<String> items = Arrays.asList("A", "B", "C", "D", "E");
List<String> shuffledAndFilteredItems = items.stream()
.filter(value -> value.equals("A") || value.equals("B"))
//filter, map...
.collect(Collectors.collectingAndThen(
Collectors.toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list;
}));
String randomItem = shuffledAndFilteredItems
.stream()
.findFirst()
.orElse(null);
Of course there may be faster / optimized ways, but it allows you to do it all at once.

Join element into a list and map the list to hashmap

I have made the following code:
Stream
.concat(
_question.getIncorrectAnswers().stream(),
Stream.of(_question.getCorrectAnswer())
)
.collect(Collectors.collectingAndThen(
Collectors.toList(),
collected -> {
Collections.shuffle(collected);
return collected.stream();
}
))
.collect(Collectors.toMap(index++, Object::toString));
What I am trying to achieve is to join _question.getCorrectAnswer() which is a String object into _question.getIncorrectAnswers() which is a List of Strings.
I then want to shuffle the list I made and then map the list into this:
private final Map<Integer, String> _options = new HashMap<>();
which is a map that contains a counter (starting from 1) and the String the list contains.
I want do this using Java Streams into 1 line (for education purpose mostly).
I know how to make it using 3-4 lines but I am seeking for a more complex way so I can understand and learn new methods.
Any help or guidance is appreciated.
The index++ comes with compilation error, fix it with an incremental object.
AtomicInteger index = new AtomicInteger();
Object v = Stream.concat(_question.getIncorrectAnswers().stream(), Stream.of(_question.getCorrectAnswer())).collect(Collectors.collectingAndThen(Collectors.toList(), collected ->
{
Collections.shuffle(collected);
return collected.stream();
})).collect(Collectors.toMap(i -> index.incrementAndGet(), Object::toString));
Here is one approach using java.util.Random:
Assuming you have these two methods returning a list and a string:
static List<String> getIncorrectAnswers(){
return List.of("bar", "baz", "doo");
}
static String getCorrectAnswer() {
return "foo";
}
generate random ints between 0 and getIncorrectAnswers().size() + 1, map each random int i to a string (if i == getIncorrectAnswers().size() then to correct answer else to incorrect answer at index i) finally collect to map using
collect(Supplier<R> supplier,
BiConsumer<R, ? super T> accumulator,
BiConsumer<R, R> combiner)
Example:
public static void main(String[] args) {
Random random = new Random();
Map<Integer, String> result =
random.ints(0, getIncorrectAnswers().size() + 1)
.distinct()
.limit(getIncorrectAnswers().size() + 1)
.mapToObj(i -> i == getIncorrectAnswers().size() ? getCorrectAnswer() : getIncorrectAnswers().get(i))
.collect(HashMap::new, (m, s) -> m.put(m.size() + 1, s), (m1, m2) -> {
int offset = m1.size();
m2.forEach((i, s) -> m1.put(i + offset, s));
});
result.entrySet().forEach(System.out::println);
}
It may be possible to use Stream::sorted with a custom "comparator" to randomize the order instead of Collections.shuffle.
Also, when calculating the key, an AtomicInteger k or int[] k should be used to increment the key.
In the following code sample the entries of the concatenated string are randomly sorted:
public static Map<Integer, String> buildOptions(Question question) {
AtomicInteger k = new AtomicInteger(1);
return Stream.concat(Stream.of(question.getCorrectAnswer()),
question.getIncorrectAnswers().stream())
.sorted((s1, s2) -> ThreadLocalRandom.current().nextInt(7) - 3)
.collect(Collectors.toMap(x -> k.getAndIncrement(), x -> x));
}
Or a sequence of indexes may be generated and sorted randomly (boxing is needed because IntStream does not have sorted with a custom comparator):
public static Map<Integer, String> buildOptionsIndex(Question question) {
int[] k = {1};
int n = question.getIncorrectAnswers().size();
return IntStream.rangeClosed(0, n)
.boxed()
.sorted((i1, i2) -> ThreadLocalRandom.current().nextInt(2 * n + 1) - n)
.map(i -> i == n ? question.getCorrectAnswer() : question.getIncorrectAnswers().get(i))
.collect(Collectors.toMap(s -> k[0]++, Function.identity()));
}
These methods provide similar results:
for (int i = 0; i < 3; i++) {
System.out.println(buildOptions(new Question("good",
Arrays.asList("invalid", "poor", "misfit", "imprecise"))));
System.out.println(buildOptionsIndex(new Question("correct",
Arrays.asList("bad", "incorrect", "wrong", "inaccurate"))));
System.out.println("----");
}
Output
{1=invalid, 2=misfit, 3=good, 4=poor, 5=imprecise}
{1=inaccurate, 2=wrong, 3=incorrect, 4=bad, 5=correct}
----
{1=good, 2=misfit, 3=invalid, 4=poor, 5=imprecise}
{1=bad, 2=incorrect, 3=wrong, 4=correct, 5=inaccurate}
----
{1=poor, 2=misfit, 3=invalid, 4=good, 5=imprecise}
{1=bad, 2=incorrect, 3=correct, 4=wrong, 5=inaccurate}
----

Perform multiple unrelated operations on elements of a single stream in Java

How can I perform multiple unrelated operations on elements of a single stream?
Say I have a List<String> composed from a text. Each string in the list may or may not contain a certain word, which represents an action to perform. Let's say that:
if the string contains 'of', all the words in that string must be counted
if the string contains 'for', the portion after the first occurrence of 'for' must be returned, yielding a List<String> with all substrings
Of course, I could do something like this:
List<String> strs = ...;
List<Integer> wordsInStr = strs.stream()
.filter(t -> t.contains("of"))
.map(t -> t.split(" ").length)
.collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream()
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
but then the list would be traversed twice, which could result in a performance penalty if strs contained lots of elements.
Is it possible to somehow execute those two operations without traversing twice over the list?
If you want a single pass Stream then you have to use a custom Collector (parallelization possible).
class Splitter {
public List<String> words = new ArrayList<>();
public List<Integer> counts = new ArrayList<>();
public void accept(String s) {
if(s.contains("of")) {
counts.add(s.split(" ").length);
} else if(s.contains("for")) {
words.add(s.substring(s.indexOf("for")));
}
}
public Splitter merge(Splitter other) {
words.addAll(other.words);
counts.addAll(other.counts);
return this;
}
}
Splitter collect = strs.stream().collect(
Collector.of(Splitter::new, Splitter::accept, Splitter::merge)
);
System.out.println(collect.counts);
System.out.println(collect.words);
Here is the answer to address the OP from a different aspect. First of all, let's take a look how fast/slow to iterate a list/collection. Here is the test result on my machine by the below performance test:
When: length of string list = 100, Thread number = 1, loops = 1000, unit = milliseconds
OP: 0.013
Accepted answer: 0.020
By the counter function: 0.010
When: length of string list = 1000_000, Thread number = 1, loops = 100, unit = milliseconds
OP: 99.387
Accepted answer: 89.848
By the counter function: 59.183
Conclusion: The percentage of performance improvement is pretty small or even slower(if the length of string list is small). generally, it's a mistake to reduce the iteration of list/collection which is loaded in memory by the more complicate collector. you won't get much performance improvements. we should look into somewhere else if there is a performance issue.
Here is my performance test code with tool Profiler: (I'm not going to discuss how to do a performance test here. if you doubt the test result, you can do it again with any tool you believe in)
#Test
public void test_46539786() {
final int strsLength = 1000_000;
final int threadNum = 1;
final int loops = 100;
final int rounds = 3;
final List<String> strs = IntStream.range(0, strsLength).mapToObj(i -> i % 2 == 0 ? i + " of " + i : i + " for " + i).toList();
Profiler.run(threadNum, loops, rounds, "OP", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(t -> t.split(" ").length).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
Profiler.run(threadNum, loops, rounds, "Accepted answer", () -> {
Splitter collect = strs.stream().collect(Collector.of(Splitter::new, Splitter::accept, Splitter::merge));
assertTrue(collect.counts.size() == collect.words.size());
}).printResult();
final Function<String, Integer> counter = s -> {
int count = 0;
for (int i = 0, len = s.length(); i < len; i++) {
if (s.charAt(i) == ' ') {
count++;
}
}
return count;
};
Profiler.run(threadNum, loops, rounds, "By the counter function", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(counter).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
}
You could use a custom collector for that and iterate only once:
private static <T, R> Collector<String, ?, Pair<List<String>, List<Long>>> multiple() {
class Acc {
List<String> strings = new ArrayList<>();
List<Long> longs = new ArrayList<>();
void add(String elem) {
if (elem.contains("of")) {
long howMany = Arrays.stream(elem.split(" ")).count();
longs.add(howMany);
}
if (elem.contains("for")) {
String result = elem.substring(elem.indexOf("for"));
strings.add(result);
}
}
Acc merge(Acc right) {
longs.addAll(right.longs);
strings.addAll(right.strings);
return this;
}
public Pair<List<String>, List<Long>> finisher() {
return Pair.of(strings, longs);
}
}
return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}
Usage would be:
Pair<List<String>, List<Long>> pair = Stream.of("t of r m", "t of r m", "nice for nice nice again")
.collect(multiple());
If you want to have 1 stream through a list, you need a way to manage 2 different states, you can do this by implementing Consumer to new class.
class WordsInStr implements Consumer<String> {
ArrayList<Integer> list = new ArrayList<>();
#Override
public void accept(String s) {
Stream.of(s).filter(t -> t.contains("of")) //probably would be faster without stream here
.map(t -> t.split(" ").length)
.forEach(list::add);
}
}
class LinePortionAfterFor implements Consumer<String> {
ArrayList<String> list = new ArrayList<>();
#Override
public void accept(String s) {
Stream.of(s) //probably would be faster without stream here
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.forEach(list::add);
}
}
WordsInStr w = new WordsInStr();
LinePortionAfterFor l = new LinePortionAfterFor();
strs.stream()//stream not needed here
.forEach(w.andThen(l));
System.out.println(w.list);
System.out.println(l.list);

How to get a random element from a list with stream api?

What is the most effective way to get a random element from a list with Java8 stream api?
Arrays.asList(new Obj1(), new Obj2(), new Obj3());
Thanks.
Why with streams? You just have to get a random number from 0 to the size of the list and then call get on this index:
Random r = new Random();
ElementType e = list.get(r.nextInt(list.size()));
Stream will give you nothing interesting here, but you can try with:
Random r = new Random();
ElementType e = list.stream().skip(r.nextInt(list.size())).findFirst().get();
Idea is to skip an arbitrary number of elements (but not the last one!), then get the first element if it exists. As a result you will have an Optional<ElementType> which will be non empty and then extract its value with get. You have a lot of options here after having skip.
Using streams here is highly inefficient...
Note: that none of these solutions take in account empty lists, but the problem is defined on non-empty lists.
There are much more efficient ways to do it, but if this has to be Stream the easiest way is to create your own Comparator, which returns random result (-1, 0, 1) and sort your stream:
List<String> strings = Arrays.asList("a", "b", "c", "d", "e", "f");
String randomString = strings
.stream()
.sorted((o1, o2) -> ThreadLocalRandom.current().nextInt(-1, 2))
.findAny()
.get();
ThreadLocalRandom has ready "out of the box" method to get random number in your required range for comparator.
While all the given answers work, there is a simple one-liner that does the trick without having to check if the list is empty first:
List<String> list = List.of("a", "b", "c");
list.stream().skip((int) (list.size() * Math.random())).findAny();
For an empty list this will return an Optional.empty.
In the last time I needed to do something like that I did that:
List<String> list = Arrays.asList("a", "b", "c");
Collections.shuffle(list);
String letter = list.stream().findAny().orElse(null);
System.out.println(letter);
If you HAVE to use streams, I wrote an elegant, albeit very inefficient collector that does the job:
/**
* Returns a random item from the stream (or null in case of an empty stream).
* This operation can't be lazy and is inefficient, and therefore shouldn't
* be used on streams with a large number or items or in performance critical sections.
* #return a random item from the stream or null if the stream is empty.
*/
public static <T> Collector<T, List<T>, T> randomItem() {
final Random RANDOM = new Random();
return Collector.of(() -> (List<T>) new ArrayList<T>(),
(acc, elem) -> acc.add(elem),
(list1, list2) -> ListUtils.union(list1, list2), // Using a 3rd party for list union, could be done "purely"
list -> list.isEmpty() ? null : list.get(RANDOM.nextInt(list.size())));
}
Usage:
#Test
public void standardRandomTest() {
assertThat(Stream.of(1, 2, 3, 4).collect(randomItem())).isBetween(1, 4);
}
Another idea would be to implement your own Spliterator and then use it as a source for Stream:
import java.util.List;
import java.util.Random;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.function.Supplier;
public class ImprovedRandomSpliterator<T> implements Spliterator<T> {
private final Random random;
private final T[] source;
private int size;
ImprovedRandomSpliterator(List<T> source, Supplier<? extends Random> random) {
if (source.isEmpty()) {
throw new IllegalArgumentException("RandomSpliterator can't be initialized with an empty collection");
}
this.source = (T[]) source.toArray();
this.random = random.get();
this.size = this.source.length;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (size > 0) {
int nextIdx = random.nextInt(size);
int lastIdx = size - 1;
action.accept(source[nextIdx]);
source[nextIdx] = source[lastIdx];
source[lastIdx] = null; // let object be GCed
size--;
return true;
} else {
return false;
}
}
#Override
public Spliterator<T> trySplit() {
return null;
}
#Override
public long estimateSize() {
return source.length;
}
#Override
public int characteristics() {
return SIZED;
}
}
public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
return Collectors.collectingAndThen(
toCollection(ArrayList::new),
list -> !list.isEmpty()
? StreamSupport.stream(new ImprovedRandomSpliterator<>(list, Random::new), false)
: Stream.empty());
}
and then simply:
list.stream()
.collect(toShuffledStream())
.findAny();
Details can be found here.
...but it's definitely an overkill, so if you're looking for a pragmatic approach. Definitely go for Jean's solution.
If you don't know in advance the size of the your list, you could do something like that :
yourStream.collect(new RandomListCollector<>(randomSetSize));
I guess that you will have to write your own Collector implementation like this one to have an homogeneous randomization :
public class RandomListCollector<T> implements Collector<T, RandomListCollector.ListAccumulator<T>, List<T>> {
private final Random rand;
private final int size;
public RandomListCollector(Random random , int size) {
super();
this.rand = random;
this.size = size;
}
public RandomListCollector(int size) {
this(new Random(System.nanoTime()), size);
}
#Override
public Supplier<ListAccumulator<T>> supplier() {
return () -> new ListAccumulator<T>();
}
#Override
public BiConsumer<ListAccumulator<T>, T> accumulator() {
return (l, t) -> {
if (l.size() < size) {
l.add(t);
} else if (rand.nextDouble() <= ((double) size) / (l.gSize() + 1)) {
l.add(t);
l.remove(rand.nextInt(size));
} else {
// in any case gSize needs to be incremented
l.gSizeInc();
}
};
}
#Override
public BinaryOperator<ListAccumulator<T>> combiner() {
return (l1, l2) -> {
int lgSize = l1.gSize() + l2.gSize();
ListAccumulator<T> l = new ListAccumulator<>();
if (l1.size() + l2.size()<size) {
l.addAll(l1);
l.addAll(l2);
} else {
while (l.size() < size) {
if (l1.size()==0 || l2.size()>0 && rand.nextDouble() < (double) l2.gSize() / (l1.gSize() + l2.gSize())) {
l.add(l2.remove(rand.nextInt(l2.size()), true));
} else {
l.add(l1.remove(rand.nextInt(l1.size()), true));
}
}
}
// set the gSize of l :
l.gSize(lgSize);
return l;
};
}
#Override
public Function<ListAccumulator<T>, List<T>> finisher() {
return (la) -> la.list;
}
#Override
public Set<Characteristics> characteristics() {
return Collections.singleton(Characteristics.CONCURRENT);
}
static class ListAccumulator<T> implements Iterable<T> {
List<T> list;
volatile int gSize;
public ListAccumulator() {
list = new ArrayList<>();
gSize = 0;
}
public void addAll(ListAccumulator<T> l) {
list.addAll(l.list);
gSize += l.gSize;
}
public T remove(int index) {
return remove(index, false);
}
public T remove(int index, boolean global) {
T t = list.remove(index);
if (t != null && global)
gSize--;
return t;
}
public void add(T t) {
list.add(t);
gSize++;
}
public int gSize() {
return gSize;
}
public void gSize(int gSize) {
this.gSize = gSize;
}
public void gSizeInc() {
gSize++;
}
public int size() {
return list.size();
}
#Override
public Iterator<T> iterator() {
return list.iterator();
}
}
}
If you want something easier and still don't want to load all your list in memory:
public <T> Stream<T> getRandomStreamSubset(Stream<T> stream, int subsetSize) {
int cnt = 0;
Random r = new Random(System.nanoTime());
Object[] tArr = new Object[subsetSize];
Iterator<T> iter = stream.iterator();
while (iter.hasNext() && cnt <subsetSize) {
tArr[cnt++] = iter.next();
}
while (iter.hasNext()) {
cnt++;
T t = iter.next();
if (r.nextDouble() <= (double) subsetSize / cnt) {
tArr[r.nextInt(subsetSize)] = t;
}
}
return Arrays.stream(tArr).map(o -> (T)o );
}
but you are then away from the stream api and could do the same with a basic iterator
The selected answer has errors in its stream solution...
You cannot use Random#nextInt with a non-positive long, "0" in this case.
The stream solution will also never choose the last in the list
Example:
List<Integer> intList = Arrays.asList(0, 1, 2, 3, 4);
// #nextInt is exclusive, so here it means a returned value of 0-3
// if you have a list of size = 1, #next Int will throw an IllegalArgumentException (bound must be positive)
int skipIndex = new Random().nextInt(intList.size()-1);
// randomInt will only ever be 0, 1, 2, or 3. Never 4
int randomInt = intList.stream()
.skip(skipIndex) // max skip of list#size - 2
.findFirst()
.get();
My recommendation would be to go with the non-stream approach that Jean-Baptiste Yunès put forth, but if you must do a stream approach, you could do something like this (but it's a little ugly):
list.stream()
.skip(list.isEmpty ? 0 : new Random().nextInt(list.size()))
.findFirst();
Sometimes you may want to get a random item somewhere in the stream. If you want to get random items even after filtering your list, this code snippet will work for you:
List<String> items = Arrays.asList("A", "B", "C", "D", "E");
List<String> shuffledAndFilteredItems = items.stream()
.filter(value -> value.equals("A") || value.equals("B"))
//filter, map...
.collect(Collectors.collectingAndThen(
Collectors.toCollection(ArrayList::new),
list -> {
Collections.shuffle(list);
return list;
}));
String randomItem = shuffledAndFilteredItems
.stream()
.findFirst()
.orElse(null);
Of course there may be faster / optimized ways, but it allows you to do it all at once.

JAVA8 - Grouping with lambda

I have a collection with structure like this:
#Entity
public class RRR{
private Map<XClas, YClas> xySets;
}
and XClas has a field called ZZZ
my question is:
I would like to aggregate it with lambda to get a Map<ZZZ, List<RRR>>.
Is it possible? Now I'm stuck with:
Map xxx = rrrList.stream().collect(
Collectors.groupingBy(x->x.xySets().entrySet().stream().collect(
Collectors.groupingBy(y->y.getKey().getZZZ()))));
but it's Map<Map<ZZZ, List<XClas>>, List<RRR>> so it's not what I was looking for :)
Right now just to make it work, I did aggregation with two nested loops, but it would be so great, if you could help me make it done with lambdas.
EDIT
I post what I got by now, as asked.
I already left nested loops, and I manage to work my way up to this point:
Map<ZZZ, List<RRR>> temp;
rrrList.stream().forEach(x -> x.getxySetsAsList().stream().forEach(z -> {
if (temp.containsKey(z.getKey().getZZZ())){
List<RRR> uuu = new LinkedList<>(temp.get(z.getKey().getZZZ()));
uuu.add(x);
temp.put(z.getKey().getZZZ(), uuu);
} else {
temp.put(z.getKey().getZZZ(), Collections.singletonList(x));
}
}));
Thanks in advance
Something like that? :
Map<ZZZ, List<RRR>> map = new HashMap<>();
list.stream().forEach(rrr -> {
rrr.xySets.keySet().stream().forEach(xclas -> {
if (!map.containsKey(xclas.zzz))
map.put(xclas.zzz, new ArrayList<RRR>());
map.get(xclas.zzz).add(rrr);
});
});
Another way you could do this:
Map<Z, List<R>> map = rs.stream()
.map(r -> r.xys.keySet()
.stream()
.collect(Collectors.<X, Z, R>toMap(x -> x.z, x -> r, (a, b) -> a)))
.map(Map::entrySet)
.flatMap(Collection::stream)
.collect(Collectors.groupingBy(Entry::getKey,
Collectors.mapping(Entry::getValue, Collectors.toList())));
I have tried around a bit and found the following solution, posting it here just as another example:
rrrList.stream().map(x -> x.xySets).map(Map::entrySet).flatMap(x -> x.stream())
.collect(Collectors.groupingBy(x -> x.getKey().getZZZ(),
Collectors.mapping(Entry::getValue, Collectors.toList())));
The first line could also be written as rrrList.stream().flatMap(x -> x.xySets.entrySet().stream()) which might be found more readable.
Here is self-contained example code for those wanting to play around themselves:
public static void main(String[] args) {
List<RRR> rrrList = Arrays.asList(new RRR(), new RRR(), new RRR());
System.out.println(rrrList);
Stream<Entry<XClas, YClas>> sf = rrrList.stream().map(x -> x.xySets).map(Map::entrySet).flatMap(x -> x.stream());
Map<ZZZ, List<YClas>> res = sf.collect(Collectors.groupingBy(x -> x.getKey().getZZZ(), Collectors.mapping(Entry::getValue, Collectors.toList())));
System.out.println(res);
}
public static class RRR {
static XClas shared = new XClas();
private Map<XClas, YClas> xySets = new HashMap<>();
RRR() { xySets.put(shared, new YClas()); xySets.put(new XClas(), new YClas()); }
static int s = 0; int n = s++;
public String toString() { return "RRR" + n + "(" + xySets + ")"; }
}
public static class XClas {
private ZZZ zzz = new ZZZ();
public ZZZ getZZZ() { return zzz; }
public String toString() { return "XClas(" + zzz + ")"; }
public boolean equals(Object o) { return (o instanceof XClas) && ((XClas)o).zzz.equals(zzz); }
public int hashCode() { return zzz.hashCode(); }
}
public static class YClas {
static int s = 0; int n = s++;
public String toString() { return "YClas" + n; }
}
public static class ZZZ {
static int s = 0; int n = s++ / 2;
public String toString() { return "ZZZ" + n; }
public boolean equals(Object o) { return (o instanceof ZZZ) && ((ZZZ)o).n == n; }
public int hashCode() { return n; }
}

Categories