Given that I have some function that takes two parameters and returns one value , is it possible to convert a Map to a List in a Stream as a non-terminal operation?
The nearest I cam find is to use forEach on the map to create instances and add them to a pre-defined List, then start a new Stream from that List. Or did I just miss something?
Eg: The classic "find the 3 most frequently occurring words in some long list of words"
wordList.stream().collect(groupingBy(Function.identity, Collectors.counting))).
(now I want to stream the entrySetof that map)
sorted((a,b) -> a.getValue().compareTo(b.getValue))).limit(3).forEach(print...
You should get the entrySet of the map and glue the entries to the calls of your binary function:
inputMap.entrySet().stream().map(e->myFun(e.getKey(),e.getValue()));
The result of the above is a stream of T instances.
Update
Your additional example confirms what was discussed in the comments below: group by and sort are by their nature terminal operations. They must be performed in full to be able to produce even the first element of the output, so involving them as non-terminal operations doesn't buy anything in terms of performance/memory footprint.
It happens that Java 8 defines sorted as a non-terminal operation, however that decision could lead to deceptive code because the operation will block until it has received all upstream elements, and will have to retain them all while receiving.
You can also convert Hashmap entries into ArrayList by using following technique,
ArrayList list = hashMap.values();
Related
I interchanged the positions of map and filter. Does it make any difference related to number of iterations done by code ?
List<String> Aname3 =names.stream()
.map(name->name.toUpperCase())
.filter(name->name.startsWith("A"))
.collect(Collectors.toList());
List<String> Aname4 =names.stream()
.filter(name->!name.startsWith("A"))
.map(name->name.toLowerCase())
.collect(Collectors.toList());
Filtering may filer out some elements so mapping to upper case will be done on fewer elements. So, yes. It is better to filter first.
First of all, these two stream pipelines have different logic that will produce different outputs. Even if both filter and map calls received the same functional interfaces as input, the outcome of the two pipelines may still be different, since map changes the elements of the Stream in a manner that may affect the outcome of filter, so the order of map and filter affects the output.
As for the number of iterations, applying the filter first means that map would only be applied to elements that pass the filter.
On the other hand, applying map first means that it will be applied to all the elements in the Stream.
Therefore the second stream pipeline will perform less operations (assuming not all the elements pass the filter).
I want to process a List using Java streams, but not sure if I can guarantee the sort is processed before the map method in the following expression:
list.stream()
.sorted((a, b) -> b.getStartTime().compareTo(a.getStartTime()))
.mapToDouble(e -> {
double points = (e.getDuration() / 60);
...
return points * e.getType().getMultiplier();
}
).sum();
Since I need to perform some calculations based in that specific order.
Yes, you can guarantee that, because the operations in the stream pipeline are applied in the order they are declared (once a terminal operation has been executed).
From Stream docs:
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate)), and a terminal operation (which produces a result or side-effect, such as count() or forEach(Consumer)). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.
The key word in the above paragraph is pipeline, whose definition in Wikipedia starts as follows:
In software engineering, a pipeline consists of a chain of processing elements (processes, threads, coroutines, functions, etc.), arranged so that the output of each element is the input of the next...
Not only sorted will be applied before map, but it will obviously traverse the underlying source. sorted will get all the elements, put them into an array or ArrayList (depending if the size is known), sort that and them give one element at a time to the map operation.
Is there any way in Java 8 to group the elements in a java.util.stream.Stream without collecting them? I want the result to be a Stream again. Because I have to work with a lot of data or even infinite streams, I cannot collect the data first and stream the result again.
All elements that need to be grouped are consecutive in the first stream. Therefore I like to keep the stream evaluation lazy.
There's no way to do it using standard Stream API. In general you cannot do it as it's always possible that new item will appear in future which belongs to any of already created groups, so you cannot pass your group to downstream analysis until you process all the input.
However if you know in advance that items to be grouped are always adjacent in input stream, you can solve your problem using third-party libraries enhancing Stream API. One of such libraries is StreamEx which is free and written by me. It contains a number of "partial reduction" operators which collapse adjacent items into single based on some predicate. Usually you should supply a BiPredicate which tests two adjacent items and returns true if they should be grouped together. Some of partial reduction operations are listed below:
collapse(BiPredicate): replace each group with the first element of the group. For example, collapse(Objects::equals) is useful to remove adjacent duplicates from the stream.
groupRuns(BiPredicate): replace each group with the List of group elements (so StreamEx<T> is converted to StreamEx<List<T>>). For example, stringStream.groupRuns((a, b) -> a.charAt(0) == b.charAt(0)) will create stream of Lists of strings where each list contains adjacent strings started with the same letter.
Other partial reduction operations include intervalMap, runLengths() and so on.
All partial reduction operations are lazy, parallel-friendly and quite efficient.
Note that you can easily construct a StreamEx object from regular Java 8 stream using StreamEx.of(stream). Also there are methods to construct it from array, Collection, Reader, etc. The StreamEx class implements Stream interface and 100% compatible with standard Stream API.
Java 8 streams allow us to collect elements while grouping by an arbitrary constraint. For example:
Map<Type, List<MyThing>> grouped = stream
.collect(groupingBy(myThing -> myThing.type()));
However this has the drawback that the stream must be completely read through, so there is no chance of lazy evaluation of future operations on grouped.
Is there a way to do a grouping operation to get something like Stream<Tuple<Type, Stream<MyThing>>>? Is it even conceptually possible to group lazily in any language without evaluating the whole data set?
The concept of lazy grouping doesn't really make sense. Grouping, by definition, means selecting groups in advance to avoid the overhead of searching through all the elements for each key. "Lazy grouping" would look like this:
List<MyThing> get(Type key) {
source.stream()
.filter(myThing -> myThing.type().equals(key))
.collect(toList());
}
If you prefer to defer iteration to when you know you need it, or if you want to avoid the memory overhead of caching a grouping map, this is perfectly fine. But you can't optimize the selection process without iterating ahead of time.
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, "forked" streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream.
Taken from the doc at:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html
So i think there is no way to split it without consuming it and creating new streams.
I do not think that this would make sense since reading from one partition stream (Tuple<Type, Stream<MyThing>>) of a lazy stream Stream<Tuple<Type, Stream<MyThing>>> could produce an arbitrarily large amount of consumed memory in the other partitions.
E.g. consider the lazy stream of positive integers in natural order and group them by their smallest prime factor. Then reading from the last received element of the stream of partitions would produce an ever increasing number of integers in the streams received before.
Is it even conceptually possible to group lazily in any language without evaluating the whole data set?
No, you cannot group an entire data set correctly without checking the entire data set or having a guarantee of an exploitable pattern in the data. For example, I can group the first 10,000 integers into even-odd lazily, but I can't lazily group even-odd for a random set of 10,000 integers.
As far as grouping in a non-terminal fashion... it's not something that seems like a good idea. Conceptually, a grouping function on a stream should return multiple streams, as if it were branching the different streams, and Java 8 does not support that.
If you really want to use native Stream methods to group non-terminally, you could abuse the sorted method. Give it a sorter that treats the groups differently but treats all elements within a group as equal and you'll end up with group1,group2,group3,etc. This won't give you lazy evaluation, but it is grouping.
I have the following code that does a group by on a List, and then operates on each grouped List in turn converting it to a single item:
Map<Integer, List<Record>> recordsGroupedById = myList.stream()
.collect(Collectors.groupingBy(r -> r.get("complex_id")));
List<Complex> whatIwant = recordsGroupedById.values().stream().map(this::toComplex)
.collect(Collectors.toList());
The toComplex function looks like:
Complex toComplex(List<Record> records);
I have the feeling I can do this without creating the intermediate map, perhaps using reduce. Any ideas?
The input stream is ordered with the elements I want grouped sequentially in the stream. Within a normal loop construct I'd be able to determine when the next group starts and create a "Complex" at that time.
Create a collector that combines groupingBy and your post-processing function with collectingAndThen.
Map<Integer, Complex> map = myList.stream()
.collect(collectingAndThen(groupingBy(r -> r.get("complex_id"),
Xxx::toComplex));
If you just want a Collection<Complex> here, you can then ask the map for its values().
Well you can avoid Map (honestly!) and do everything in single pipeline using my StreamEx library:
List<Complex> result = StreamEx.of(myList)
.sortedBy(r -> r.get("complex_id"))
.groupRuns((r1, r2) -> r1.get("complex_id").equals(r2.get("complex_id")))
.map(this::toComplex)
.toList();
Here we first sort input by complex_id, then use groupRuns custom intermediate operation which groups adjacent stream element to the List if the given BiPredicate applied to two adjacent elements returns true. Then you have a stream of lists which is mapped to stream of Complex objects and finally collected to the list.
There are actually no intermediate maps and groupRuns is actually lazy (in sequential mode it keeps no more than one intermediate List at a time), it also parallelizes well. On the other hand my tests show that for unsorted input such solution is slower than groupingBy-based as it involves sorting the whole input. And of course sortedBy (which is just a shortcut for sorted(Comparator.comparing(...))) takes intermediate memory to store the input. If your input is already sorted (or at least partially sorted, so TimSort can perform fast), then such solution usually faster than groupingBy.
No you can't. You must collect all data to ensure the contents of all groups are known before moving foreward. Obviously, however, if you can perform processes on each element in the group as it is assigned to the group then that can be done.
Think about it this way - imagine the very first item in the list and the very last item in the list contain the same complex_id. You must then have to wait for the end of the list anyway to fully gather that group (and all the others) so you must gather all groups together before processing.
Also - you should obviously be able to do:
List<Complex> whatIwant = myList.stream()
.collect(Collectors.groupingBy(r -> r.get("complex_id")))
.values()
.stream()
.map(this::toComplex)
.collect(Collectors.toList());