MinMaxPriorityQueue using Java streams - java

I am looking for a memory-efficient way in Java to find top n elements from a huge collection. For instance, I have a word, a distance() method, and a collection of "all" words.
I have implemented a class Pair that implements compareTo() so that pairs are sorted by their values.
Using streams, my naive solution looks like this:
double distance(String word1, String word2){
...
}
Collection<String> words = ...;
String word = "...";
words.stream()
.map(w -> new Pair<String, Double>(w, distance(word, w)))
.sorted()
.limit(n);
To my understanding, this will process and intermediately store each element in words so that it can be sorted before applying limit(). However, it is more memory-efficient to have a collection that stores n elements and whenever a new element is added, it removes the smallest element (according to the comparable object's natural order) and thus never grows larger than n (or n+1).
This is exactly what the Guava MinMaxPriorityQueue does. Thus, my current best solution to the above problem is this:
Queue<Pair<String, Double>> neighbours = MinMaxPriorityQueue.maximumSize(n).create();
words.stream()
.forEach(w -> neighbours.add(new Pair<String, Double>(w, distance(word, w)));
The sorting of the top n elements remains to be done after converting the queue to a stream or list, but this is not an issue since n is relatively small.
My question is: is there a way to do the same using streams?

A heap-based structure will of course be more efficient than sorting the entire huge list. Luckily, streams library is perfectly happy to let you use specialized collections when necessary:
MinMaxPriorityQueue<Pair<String, Double>> topN = words.stream()
.map(w -> new Pair<String, Double>(w, distance(word, w)))
.collect(toCollection(
() -> MinMaxPriorityQueue.maximumSize(n).create()
));
This is better than the .forEach solution because it's easy to parallelize and is more idiomatic java8.
Note that () -> MinMaxPriorityQueue.maximumSize(n).create() should be possible to be replaced with MinMaxPriorityQueue.maximumSize(n)::create but, for some reason, that won't compile under some conditions (see comments below).

Related

Using Java Stream - process two halves of stream independently

I have currently this code:
AtomicInteger counter = new AtomicInteger(0);
return IntStream.range(0, costs.length)
.mapToObj(i -> new int[]{costs[i][0]-costs[i][1], i})
.sorted(Comparator.comparingInt(d -> d[0]))
.mapToInt(s ->
counter.getAndIncrement() < costs.length/2 ? costs[s[1]][0] : costs[s[1]][1]
)
.sum();
Where I compute diff of two elements of an array and then sort it and in the end I need to process two halves independently.
Is there any better way to do this than using AtomicInteger as a counter? Is there some method like mapToIntWithIndex that is accessible inside JDK (not in external libraries)? Is there something like zip() in python where I could join indices together with stream? If not is there any plan to add this to next Java releases?
This is not a reliable way to do this. The Streams API makes it clear that functions used in maps should not be stateful.
Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful.
If you use stateful functions, it may appear to work, but because you aren't using it according to the documentation, the behaviour is technically undefined, and could break in future versions of Java.
Collect to a list, and then process the two halves of the list:
List<int[]> list = /* your stream up to and including the sort */.collect(toList());
int sum = list.subList(0, half ).stream().mapToInt(s -> costs[s[1]][0]).sum()
+ list.subList(half, list.size()).stream().mapToInt(s -> costs[s[1]][1]).sum();
Actually, I'd be tempted to write it as for loops, as I just find it easier on the eye:
int sum = 0;
for (int[][] s : list.subList(0, half)) sum += costs[s[1]][0];
for (int[][] s : list.subList(half, list.size())) sum += costs[s[1]][1];

Alternative for filtering out only first element of a list that matches some element

I am trying to think of an alternative to the List method for remove(int index) and remove(T element). Where I take a list and do some filtering and return a new list without the element requested to be removed. I want to do it functionally, as I don't want to mutate the original list.
Here is my attempt.
List<Integer> integers = Arrays.asList(2, 4, 1, 2, 5, 1);
Integer elementToRemove = 1;
List<Integer> collect =
integers.stream()
.filter(elements -> !elements.equals(elementToRemove))
.collect(Collectors.toList());
This will remove all the 1's.
I wont to remove just the first 1, so I will be left with a list like [2,4,2,5,1]
I know how to do it using indexOf() and sublist() and addAll(). But I feel this is not as good as using streams.
Looking for functional solutions using streams for implementing remove(int index) and remove(T element).
I want to do it functionally, as I dont want to mutate the original
list.
You can still perform the removal operation and not mutate the source without going functional.
But I feel this is not as good as using streams.
Quite the opposite as this is done better without streams:
List<Integer> result = new ArrayList<>(integers); // copy the integers list
result.remove(Integer.valueOf(elementToRemove)); // remove from the new list leaving the old list unmodified
I agree with #Aominè but this can be a alternative in stream API
IntStream.range(0,integers.size())
.filter(i->i != integers.indexOf(elementToRemove))
.mapToObj(i->integers.get(i))
.collect(Collectors.toList());
As #Aominè commented for optimize, find index of elementToRemove firstly then use it in the filter.
While my other answer is definitely the way I recommend to proceed with and #Hadi has also provided the "stream" alternative which is also valid. I decided to play about with different ways to achieve the same result using features as of JDK-8.
In JDK-9 there is a takeWhile and dropWhile methods where the former returns a stream consisting of the longest prefix of elements taken from a stream that match a given predicate.
The latter returns a stream consisting of the remaining elements of a given stream after dropping the longest prefix of elements that match a given predicate.
The idea here is to consume the elements while it's not equal to the elementToRemove:
integers.stream()
.takeWhile(e -> !Objects.equals(e, elementToRemove))
and drop the elements while it's not equal to the elementToRemove and skip(1) to exclude the elementToRemove:
integers.stream()
.dropWhile(e -> !Objects.equals(e, elementToRemove))
.skip(1)
hence yielding two streams where the first stream is all the preceding numbers to elementToRemove and the second stream plus the skip(1) is all the elements after the elementToRemove then we simply concatenate them and collect to a list implementation.
List<Integer> result = Stream.concat(integers.stream()
.takeWhile(e -> !Objects.equals(e, elementToRemove)),
integers.stream()
.dropWhile(e -> !Objects.equals(e, elementToRemove))
.skip(1))
.collect(Collectors.toList());
Assuming the element to remove does not exist in the list the takeWhile will consume all the elements and the dropWhile will drop all the elements and when we merge these two streams we get back the initial elements.
Overall this will accomplish the same result as the other answers.
However, do not use this solution in production code as it's suboptimal and not obvious to the eye what the code does. it's only here to show different ways to accomplish the said requirement.

How to sort a list/stream using an unknown number of comparators?

I have the following snippet:
List<O> os = new ArrayList<>();
os.add(new O("A", 3, "x"));
os.add(new O("A", 2, "y"));
os.add(new O("B", 1, "z"));
Comparator<O> byA = Comparator.comparing(O::getA);
Comparator<O> byB = Comparator.comparing(O::getB);
// I want to use rather this...
List<Comparator<O>> comparators = new ArrayList<>();
comparators.add(byA);
comparators.add(byB);
os.stream()
.sorted(byB.thenComparing(byA))
.forEach(o -> System.out.println(o.getC()));
As you can see, I sort using explicitly two comparators. But what if I have the unknown number of comparators in some list and I want to sort by them all? Is there any way? Or should rather use the old fashion way comparator with multiple ifs?
If you have multiple comparators in a list or any other collection, you can replace them with a single one by performing the reduction on a Stream:
List<Comparator<String>> comparators = ...
Comparator<String> combined = comparators.stream()
.reduce(Comparator::thenComparing)
.orElse(someDefaultComparator); // empty list case
All instances will be composed together using thenComparing according to their order from the input list.
The same would be achievable using a non-stream approach by utilizing a simple for-loop:
Comparator<String> result = comparators.get(0);
for (int i = 1; i < comparators.size(); i++) {
result = result.thenComparing(comparators.get(i));
}
You can reduce stream of comparators to a single comparator by calling .thenComparing() on accumulated comparator and current comparator in the iteration:
Optional<Comparator<O>> comparator = Optional.ofNullable(comparators.stream()
.reduce(null, (acc, current) -> acc == null ? current : acc.thenComparing(current), (a, b) -> a));
os.stream()
.sorted(comparator.orElse((a,b) -> 0))
.forEach(o -> System.out.println(o.getC()));
In this example I use Optional<Comparator<O>> and wrap reduction result with Optional.ofNullable() to handle a case when with empty comparators list. Then you can decide when you pass a result to sorted() method what to do in case of an empty list - you can use (a,b)->0 comparator that does not sort anything.
Then it doesn't matter how many comparators you want to apply. But there is one but - the order of comparators in given collection matters. In given example I apply comparators in ascending order (starting from the first element of the list to the last one). It affects the result of sorting heavily.
For example, in your example you call byB.thenComparing(byA). It produces different result then byA.thenComparing(byB). I can assume that in some cases you want to control the order in which comparators are applied.
Live demo:
https://jdoodle.com/a/4Xz
If you are allowed to use google guava, you could simply compound your Comparables.
https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/collect/Ordering.html#compound-java.lang.Iterable-
Comparator comparator = Ordering.compound(comparators);
If no comparators are given, the original order will be kept.
Javadoc of guava on this method although proposes for Java 8:
list.stream().sorted(comparators.stream().reduce(defaultComparator, Comparator::thenComparing));
So the solution with 'compound' is recommended for Java 7 (or less).

How to get index of findFirst() in java 8?

I have the following code:
ArrayList <String> entries = new ArrayList <String>();
entries.add("0");
entries.add("1");
entries.add("2");
entries.add("3");
String firstNotHiddenItem = entries.stream()
.filter(e -> e.equals("2"))
.findFirst()
.get();
I need to know what is the index of that first returned element, since I need to edit it inside of entries ArrayList. As far as I know get() returns the value of the element, not a reference. Should I just use
int indexOf(Object o)
instead?
You can get the index of an element using an IntStream like:
int index = IntStream.range(0, entries.size())
.filter(i -> "2".equals(entries.get(i)))
.findFirst().orElse(-1);
But you should use the List::indexOf method which is the preferred way, because it's more concise, more expressive and computes the same results.
You can't in a straightforward way - streams process elements without context of where they are in the stream.
However, if you're prepared to take the gloves off...
int[] position = {-1};
String firstNotHiddenItem = entries.stream()
.peek(x -> position[0]++) // increment every element encounter
.filter("2"::equals)
.findFirst()
.get();
System.out.println(position[0]); // 2
The use of an int[], instead of a simple int, is to circumvent the "effectively final" requirement; the reference to the array is constant, only its contents change.
Note also the use of a method reference "2"::equals instead of a lambda e -> e.equals("2"), which not only avoids a possible NPE (if a stream element is null) and more importantly looks way cooler.
A more palatable (less hackalicious) version:
AtomicInteger position = new AtomicInteger(-1);
String firstNotHiddenItem = entries.stream()
.peek(x -> position.incrementAndGet()) // increment every element encounter
.filter("2"::equals)
.findFirst()
.get();
position.get(); // 2
This will work using Eclipse Collections with Java 8
int firstIndex = ListIterate.detectIndex(entries, "2"::equals);
If you use a MutableList, you can simplify the code as follows:
MutableList<String> entries = Lists.mutable.with("0", "1", "2", "3");
int firstIndex = entries.detectIndex("2"::equals);
There is also a method to find the last index.
int lastIndex = entries.detectLastIndex("2"::equals);
Note: I am a committer for Eclipse Collections
Yes, you should use indexOf("2") instead. As you might have noticed, any stream based solution has a higher complexity, without providing any benefit.
In this specific situation, there is no significant difference in performance, but overusing streams can cause dramatic performance degradation, e.g. when using map.entrySet().stream().filter(e -> e.getKey().equals(object)).map(e -> e.getValue()) instead of a simple map.get(object).
The collection operations may utilize their known structure while most stream operation imply a linear search. So genuine collection operations are preferable.
Of course, if there is no collection operation, like when your predicate is not a simple equality test, the Stream API may be the right tool. As shown in “Is there a concise way to iterate over a stream with indices in Java 8?”, the solution for any task involving the indices works by using the indices as starting point, e.g. via IntStream.range, and accessing the list via List.get(int). If the source in not an array or a random access List, there is no equally clean and efficient solution. Sometimes, a loop might turn out to be the simplest and most efficient solution.

Get last element of Stream/List in a one-liner

How can I get the last element of a stream or list in the following code?
Where data.careas is a List<CArea>:
CArea first = data.careas.stream()
.filter(c -> c.bbox.orientationHorizontal).findFirst().get();
CArea last = data.careas.stream()
.filter(c -> c.bbox.orientationHorizontal)
.collect(Collectors.toList()).; //how to?
As you can see getting the first element, with a certain filter, is not hard.
However getting the last element in a one-liner is a real pain:
It seems I cannot obtain it directly from a Stream. (It would only make sense for finite streams)
It also seems that you cannot get things like first() and last() from the List interface, which is really a pain.
I do not see any argument for not providing a first() and last() method in the List interface, as the elements in there, are ordered, and moreover the size is known.
But as per the original answer: How to get the last element of a finite Stream?
Personally, this is the closest I could get:
int lastIndex = data.careas.stream()
.filter(c -> c.bbox.orientationHorizontal)
.mapToInt(c -> data.careas.indexOf(c)).max().getAsInt();
CArea last = data.careas.get(lastIndex);
However it does involve, using an indexOf on every element, which is most likely not you generally want as it can impair performance.
It is possible to get the last element with the method Stream::reduce. The following listing contains a minimal example for the general case:
Stream<T> stream = ...; // sequential or parallel stream
Optional<T> last = stream.reduce((first, second) -> second);
This implementations works for all ordered streams (including streams created from Lists). For unordered streams it is for obvious reasons unspecified which element will be returned.
The implementation works for both sequential and parallel streams. That might be surprising at first glance, and unfortunately the documentation doesn't state it explicitly. However, it is an important feature of streams, and I try to clarify it:
The Javadoc for the method Stream::reduce states, that it "is not constrained to execute sequentially".
The Javadoc also requires that the "accumulator function must be an associative, non-interfering, stateless function for combining two values", which is obviously the case for the lambda expression (first, second) -> second.
The Javadoc for reduction operations states: "The streams classes have multiple forms of general reduction operations, called reduce() and collect() [..]" and "a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless."
The documentation for the closely related Collectors is even more explicit: "To ensure that sequential and parallel executions produce equivalent results, the collector functions must satisfy an identity and an associativity constraints."
Back to the original question: The following code stores a reference to the last element in the variable last and throws an exception if the stream is empty. The complexity is linear in the length of the stream.
CArea last = data.careas
.stream()
.filter(c -> c.bbox.orientationHorizontal)
.reduce((first, second) -> second).get();
If you have a Collection (or more general an Iterable) you can use Google Guava's
Iterables.getLast(myIterable)
as handy oneliner.
One liner (no need for stream;):
Object lastElement = list.isEmpty() ? null : list.get(list.size()-1);
Guava has dedicated method for this case:
Stream<T> stream = ...;
Optional<T> lastItem = Streams.findLast(stream);
It's equivalent to stream.reduce((a, b) -> b) but creators claim it has much better performance.
From documentation:
This method's runtime will be between O(log n) and O(n), performing
better on efficiently splittable streams.
It's worth to mention that if stream is unordered this method behaves like findAny().
list.stream().sorted(Comparator.comparing(obj::getSequence).reversed()).findFirst().get();
reverse the order and get the first element from the list. here object has sequence number, Comparator provides multiple functionalities can be used as per logic.
Another way to get the last element is by using sort.
Optional<CArea> num=data.careas.stream().sorted((a,b)->-1).findFirst();
You can also use skip() function as below...
long count = data.careas.count();
CArea last = data.careas.stream().skip(count - 1).findFirst().get();
it's super simple to use.
One more approach. Pair will have first and last elements:
List<Object> pair = new ArrayList<>();
dataStream.ForEach(o -> {
if (pair.size() == 0) {
pair.add(o);
pair.add(o);
}
pair.set(1, o);
});
If you need to get the last N number of elements. Closure can be used.
The below code maintains an external queue of fixed size until, the stream reaches the end.
final Queue<Integer> queue = new LinkedList<>();
final int N=5;
list.stream().peek((z) -> {
queue.offer(z);
if (queue.size() > N)
queue.poll();
}).count();
Another option could be to use reduce operation using identity as a Queue.
final int lastN=3;
Queue<Integer> reduce1 = list.stream()
.reduce(
(Queue<Integer>)new LinkedList<Integer>(),
(m, n) -> {
m.offer(n);
if (m.size() > lastN)
m.poll();
return m;
}, (m, n) -> m);
System.out.println("reduce1 = " + reduce1);

Categories