I'd like to duplicate a Java 8 stream so that I can deal with it twice. I can collect as a list and get new streams from that;
// doSomething() returns a stream
List<A> thing = doSomething().collect(toList());
thing.stream()... // do stuff
thing.stream()... // do other stuff
But I kind of think there should be a more efficient/elegant way.
Is there a way to copy the stream without turning it into a collection?
I'm actually working with a stream of Eithers, so want to process the left projection one way before moving onto the right projection and dealing with that another way. Kind of like this (which, so far, I'm forced to use the toList trick with).
List<Either<Pair<A, Throwable>, A>> results = doSomething().collect(toList());
Stream<Pair<A, Throwable>> failures = results.stream().flatMap(either -> either.left());
failures.forEach(failure -> ... );
Stream<A> successes = results.stream().flatMap(either -> either.right());
successes.forEach(success -> ... );
I think your assumption about efficiency is kind of backwards. You get this huge efficiency payback if you're only going to use the data once, because you don't have to store it, and streams give you powerful "loop fusion" optimizations that let you flow the whole data efficiently through the pipeline.
If you want to re-use the same data, then by definition you either have to generate it twice (deterministically) or store it. If it already happens to be in a collection, great; then iterating it twice is cheap.
We did experiment in the design with "forked streams". What we found was that supporting this had real costs; it burdened the common case (use once) at the expense of the uncommon case. The big problem was dealing with "what happens when the two pipelines don't consume data at the same rate." Now you're back to buffering anyway. This was a feature that clearly didn't carry its weight.
If you want to operate on the same data repeatedly, either store it, or structure your operations as Consumers and do the following:
stream()...stuff....forEach(e -> { consumerA(e); consumerB(e); });
You might also look into the RxJava library, as its processing model lends itself better to this kind of "stream forking".
You can use a local variable with a Supplier to set up common parts of the stream pipeline.
From http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/:
Reusing Streams
Java 8 streams cannot be reused. As soon as you call any terminal operation the stream is closed:
Stream<String> stream = Stream.of("d2", "a2", "b1", "b3", "c")
.filter(s -> s.startsWith("a"));
stream.anyMatch(s -> true); // ok
stream.noneMatch(s -> true); // exception
Calling `noneMatch` after `anyMatch` on the same stream results in the following exception:
java.lang.IllegalStateException: stream has already been operated upon or closed
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
at
java.util.stream.ReferencePipeline.noneMatch(ReferencePipeline.java:459)
at com.winterbe.java8.Streams5.test7(Streams5.java:38)
at com.winterbe.java8.Streams5.main(Streams5.java:28)
To overcome this limitation we have to to create a new stream chain for every terminal operation we want to execute, e.g. we could create a stream supplier to construct a new stream with all intermediate operations already set up:
Supplier<Stream<String>> streamSupplier =
() -> Stream.of("d2", "a2", "b1", "b3", "c")
.filter(s -> s.startsWith("a"));
streamSupplier.get().anyMatch(s -> true); // ok
streamSupplier.get().noneMatch(s -> true); // ok
Each call to get() constructs a new stream on which we are save to call the desired terminal operation.
Use a Supplier to produce the stream for each termination operation.
Supplier<Stream<Integer>> streamSupplier = () -> list.stream();
Whenever you need a stream of that collection,
use streamSupplier.get() to get a new stream.
Examples:
streamSupplier.get().anyMatch(predicate);
streamSupplier.get().allMatch(predicate2);
We've implemented a duplicate() method for streams in jOOλ, an Open Source library that we created to improve integration testing for jOOQ. Essentially, you can just write:
Tuple2<Seq<A>, Seq<A>> duplicates = Seq.seq(doSomething()).duplicate();
Internally, there is a buffer storing all values that have been consumed from one stream but not from the other. That's probably as efficient as it gets if your two streams are consumed about at the same rate, and if you can live with the lack of thread-safety.
Here's how the algorithm works:
static <T> Tuple2<Seq<T>, Seq<T>> duplicate(Stream<T> stream) {
final List<T> gap = new LinkedList<>();
final Iterator<T> it = stream.iterator();
#SuppressWarnings("unchecked")
final Iterator<T>[] ahead = new Iterator[] { null };
class Duplicate implements Iterator<T> {
#Override
public boolean hasNext() {
if (ahead[0] == null || ahead[0] == this)
return it.hasNext();
return !gap.isEmpty();
}
#Override
public T next() {
if (ahead[0] == null)
ahead[0] = this;
if (ahead[0] == this) {
T value = it.next();
gap.offer(value);
return value;
}
return gap.poll();
}
}
return tuple(seq(new Duplicate()), seq(new Duplicate()));
}
More source code here
Tuple2 is probably like your Pair type, whereas Seq is Stream with some enhancements.
You could create a stream of runnables (for example):
results.stream()
.flatMap(either -> Stream.<Runnable> of(
() -> failure(either.left()),
() -> success(either.right())))
.forEach(Runnable::run);
Where failure and success are the operations to apply. This will however create quite a few temporary objects and may not be more efficient than starting from a collection and streaming/iterating it twice.
Another way to handle the elements multiple times is to use Stream.peek(Consumer):
doSomething().stream()
.peek(either -> handleFailure(either.left()))
.foreach(either -> handleSuccess(either.right()));
peek(Consumer) can be chained as many times as needed.
doSomething().stream()
.peek(element -> handleFoo(element.foo()))
.peek(element -> handleBar(element.bar()))
.peek(element -> handleBaz(element.baz()))
.foreach(element-> handleQux(element.qux()));
cyclops-react, a library I contribute to, has a static method that will allow you duplicate a Stream (and returns a jOOλ Tuple of Streams).
Stream<Integer> stream = Stream.of(1,2,3);
Tuple2<Stream<Integer>,Stream<Integer>> streams = StreamUtils.duplicate(stream);
See comments, there is performance penalty that will be incurred when using duplicate on an existing Stream. A more performant alternative would be to use Streamable :-
There is also a (lazy) Streamable class that can be constructed from a Stream, Iterable or Array and replayed multiple times.
Streamable<Integer> streamable = Streamable.of(1,2,3);
streamable.stream().forEach(System.out::println);
streamable.stream().forEach(System.out::println);
AsStreamable.synchronizedFromStream(stream) - can be used to create a Streamable that will lazily populate it's backing collection, in a way such that can be shared across threads. Streamable.fromStream(stream) will not incur any synchronization overhead.
For this particular problem you can use also partitioning. Something like
// Partition Eighters into left and right
List<Either<Pair<A, Throwable>, A>> results = doSomething();
Map<Boolean, Object> passingFailing = results.collect(Collectors.partitioningBy(s -> s.isLeft()));
passingFailing.get(true) <- here will be all passing (left values)
passingFailing.get(false) <- here will be all failing (right values)
We can make use of Stream Builder at the time of reading or iterating a stream.
Here's the document of Stream Builder.
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.Builder.html
Use case
Let's say we have employee stream and we need to use this stream to write employee data in excel file and then update the employee collection/table
[This is just use case to show the use of Stream Builder]:
Stream.Builder<Employee> builder = Stream.builder();
employee.forEach( emp -> {
//store employee data to excel file
// and use the same object to build the stream.
builder.add(emp);
});
//Now this stream can be used to update the employee collection
Stream<Employee> newStream = builder.build();
I had a similar problem, and could think of three different intermediate structures from which to create a copy of the stream: a List, an array and a Stream.Builder. I wrote a little benchmark program, which suggested that from a performance point of view the List was about 30% slower than the other two which were fairly similar.
The only drawback of converting to an array is that it is tricky if your element type is a generic type (which in my case it was); therefore I prefer to use a Stream.Builder.
I ended up writing a little function that creates a Collector:
private static <T> Collector<T, Stream.Builder<T>, Stream<T>> copyCollector()
{
return Collector.of(Stream::builder, Stream.Builder::add, (b1, b2) -> {
b2.build().forEach(b1);
return b1;
}, Stream.Builder::build);
}
I can then make a copy of any stream str by doing str.collect(copyCollector()) which feels quite in keeping with the idiomatic usage of streams.
Related
I have the following for loop:
List<Player> players = new ArrayList<>();
for (Team team : teams) {
ArrayList<TeamPlayer> teamPlayers = team.getTeamPlayers();
for (teamPlayer player : teamPlayers) {
players.add(new Player(player.getName, player.getPosition());
}
}
and I'm trying to convert it to a Stream:
List<Player> players = teams.forEach(t -> t.getTeamPlayers()
.forEach(p -> players.add(new Player(p.getName(), p.getPosition())))
);
But I'm getting a compilation error:
variable 'players' might not have been initialized
Why is this happening? Maybe there's an alternative way to create the stream, I was thinking of using flatMap but not sure how to apply it.
First of all, you need to understand that Streams don't act like Loops.
Hence, don't try to mimic a loop. Examine the tools offered by the API. Operation forEach() is there for special cases when you need to perform side-effects, not in order to accumulate elements from the stream into a Collection.
Note: with teams.forEach() you're not actually using a stream, but method Iterable.forEach() which is available with every implementation of Iterable.
To perform reduction on streams, we have several specialized operations like collect, reduce, etc. (for more information refer to the API documentation - Reduction).
collect() operation is meant to perform mutable reduction. You can use to collect the data into a list by providing built-in Collector Collectors.toList() as an argument. And since Java 16 operation toList() was introduced into API, which is implemented on top of the toArray() operation and performs better than namesake collector (therefore it's a preferred option if your JDK version allows you to use it).
I was thinking of using flatMap but not sure how to apply it.
Operation flatMap() is meant to perform one-to-many transformations. It expects a Function which takes a stream element and generates a Stream of the resulting type, elements of the generated stream become a replacement for the initial element.
Note: that general approach to writing streams to use as fewer operations as possible (because one of the main advantages that Functional programming brings to Java is simplicity). For that reason, applying flatMap() when a stream element produces a Collection in a single step is idiomatic, since it's sorter than performing map().flatMap() in two steps.
That's how implementation might look like:
List<Team> teams = List.of();
List<Player> players = teams.stream() // Stream<Team>
.flatMap(team -> team.getTeamPlayers().stream()) // Stream<Player>
.map(player -> new Player(player.getName(), player.getPosition()))
.toList(); // for Java 16+ or collect(Collectors.toList())
This is basically the answer of Alexander Ivanchenko, but with method reference.
final var players = teams.stream()
.map(Team::getTeamPlayers)
.flatMap(Collection::stream)
.map(p -> new Player(p.getName(), p.getPosition()))
.toList();
If your Player class has a factory method like (depending on the relation between Player and TeamPlayer:
public static Player fromTeamPlayer(final TeamPlayer teamPlayer) {
return new Player(teamPlayer.getName(), teamPlayer.getPosition());
}
You could further reduce it to:
final var players = teams.stream()
.map(Team::getTeamPlayers)
.flatMap(Collection::stream)
.map(Player::fromTeamPlayer)
.toList();
I have two methods: funca() and funcb() which return a value of type X or a List<X> respectively like shown below:
X funca(Event e) { ... }
List<X> funcb(Event e) { ... }
I want to use them in the Stream and collect the result into a list.
These method methods should be called under different conditions, like shown below in pseudocode:
List<Event> input = // initializing the input
List<X> resultList = input.values().stream()
.(event -> event.status=="active" ? funca(event) : funcb(event))
.collect(Collectors.toList());
Can someone please tell me how I can achieve this so that whether the function returns a list of values or values?
Since one of your functions produces a Collection as a result, you need a stream operation that allows performing one-to-many transformation. For now, Stream IPA offers two possibilities: flatMap() and mapMulti().
To approach the problem, you need to take a closer look at them and think in terms of these operations.
flatMap()
This operation requires a function producing a Stream, and elements of this new stream become a replacement for the initial element.
Therefore, you need to wrap the result returned by the funca() with Singleton-Stream using Stream.of() (there's no need for wrapping the element with a List, like shown in another answer flatMap() is not capable to consume Collections).
List<X> = input.values().stream()
.flatMap(event -> "active".equals(event.getStatus()) ?
Stream.of(funca(event)) : funcb(event).stream()
)
.toList(); // for Java 16+ or collect(Collectors.toList())
mapMulti()
This operation was introduced with Java 16 and is similar to flatMap() but acts differently.
Contrary to flatMap it doesn't consume a new Stream. As an argument it expects a BiConsumer. Which in turn takes a stream element and a Consumer of the resulting type. Every element offered to the Consumer becomes a part of the resulting stream.
mapMulti() might be handy if funcb() produces a list which is very moderate in size (refer to documentation linked above for more details), otherwise flatMap() would be the right tool.
List<X> = input.values().stream()
.<X>mapMulti((event, consumer) -> {
if ("active".equals(event.getStatus())) consumer.accept(funca(event));
else funcb(event).forEach(consumer);
})
.toList(); // for Java 16+ or collect(Collectors.toList())
Sidenote: don't use == to compare reference types (like String) unless you need to make sure that both references are pointing to the same object, use equals() method instead.
Embed the result of funcA into a list and flatMap the lists:
List<X> result = input.stream()
.flatMap(e -> e.status.equals("active") ? List.of(funcA(e)) : funcB(e))
.collect(Collectors.toList());
i am trying to test a Rest API PUT request. It is a spring boot application.
PUT request is used to do an update in the existing list of objects
traditional way of writing is working.
data is the data in the memory - which is a List<Bean> and
name (string type) is the key to find the object in the data and objectBean is the one to replace once we find with the key(that is name)
public void update(Bean objectBean, String name) {
for(int i = 0; i < data.size() ; i++) {
Bean l = data.get(i);
if(l.getName().equals(name)) {
data.set(i, objectBean);
return;
}
}
};
but i tried to write using Stream in java 8 . below is the code
Data.stream().map(p -> p.getName().equals(name) ? objectBean: p );
but this gives empty list.
Using streams here makes code only more complicated.
If you really wants you can introduce it to find the index i value. After that you can do the replacement.
IntStream.range(0, data.size())
.filter(i -> data.get(i).getName().equals(name)).findFirst()
.ifPresent(i -> data.set(i, objectBean));
Given that data is some List with Bean objects, you'd need to return your collected stream:
return data.stream()
.map(bean -> bean.getName().equals(name) ? objectBean : bean)
.collect(Collectors.toList());
If data is a non-empty Iterable then the output must be as well as map takes a Function object. However, this is not a good use case for the Stream API:
Firstly, streams are designed for side-effect-free purposes (i.e., creating new data structures rather updating them). The stream API supports forEach(Consumer<super T>) which is designed for side effects, but so do many other collections, in fact, all Iterables, whereas the immutable operations such as map and flatMap are not.
Second, I can't see the rest of your program, but at least in this snippet you seem to be updating your data structure based on the name, and you assume the name is unique because you stopped as soon as you reached the first Bean with the name you're looking for. Consider using Map<String, Bean> as your data structure.
Lastly, streams are lazy data structures, meaning that all the chained operations get computed when you collect. This provides incentive to chain a lot of computations together - chaining just a single map doesn't give you any performance advantages (tho it does give you referential transparency).
return data.stream()
.filter(bean -> bean.getName().equals(name))
.findAny()
I've encountered a situation that I though possible to handle using the Stream API but I simply cannot figure out a proper solution.
The case is the following : I have a stream of elements sorted by an identifier field. There are several elements with the same value for this identifier, and I need to deduplicate them based on conditions on other fields. Conceptually, it can be seen as a reduce operation on several chunks of the stream yielding to a stream of the same type.
For now, the only solution I manage to come with, is to collect the stream based on the common identifier to obtain something like Map<Id, List<Elem>> and then use this map's stream to apply my deduplication rules and go on. The problem (and why I won't use this solution) is that collect is a terminal operation, re-streaming after it means that I will iterate over my elements twice.
UPDATE
Consider the following class :
public static class Item {
private final int _id;
private final double _price;
public Item(final int id, final double price) {
_id = id;
_price = price;
}
public int id() {
return _id;
}
public double price() {
return _price;
}
}
And the following stream :
final Stream<Item> items = Stream.<Item>builder()
.add(new Item(1, 4))
.add(new Item(1, 6))
.add(new Item(1, 3))
.add(new Item(2, 5))
.add(new Item(2, 1))
.add(new Item(3, 5))
.build();
After the required operation, if the rule of deduplication is "with the highest price", the stream should only contains Item(1, 6), Item(2, 5) and Item(3, 5).
If I do this imperatively, I can consume my items while they have the same id, backing them up in a temporary collection, and deduplicate this collection when encountering an item with a different id.
If I use collect to first group the items by id, I will consume all the data at once before moving to the next operation, and I need to avoid that.
For most cases of that kind, a temporary storage, like a Map, is inevitable. After all, it’s the map’s efficient lookup algorithm that allows to identify the group each element belongs to. Also, it’s possible that the first group contains the first and the very last element of the source stream and the only way to find out whether this is the case, is iterating the entire source stream. This might not be true for your special case of pre-sorted data, but the API doesn’t provide a way to exploit this for a grouping operation. And it wouldn’t play nicely with parallel Stream support, if it existed.
But consider the groupingBy collector accepting a downstream Collector which allows you to reduce the groups to their final result in-place. If it is a true reduction, you can use, e.g. reducing as downstream collector. This allows you to collect the elements into a Map<Id, Reduced> rather than Map<Id, List<Elem>>, so you don’t collect into Lists that have to be reduced afterwards.
For any similar case, if you can describe the follow-up operation as a Collector, its processing will indeed start right when encountering the first element of a group. Note that there are other combining Collectors like mapping and collectingAndThen. Java 9 will also add filtering and flatMapping, so you can express a lot of typical Stream operations in form of a downstream collector. For convenience, this collector combines a mapping step with a follow-up reduction step.
Further processing of the groups can only be done after the full completion of the grouping, by accessing Map.values(). If the final result is supposed to be a Collection, it’s not necessary to stream over it again, as the existing collection operations are sufficient, e.g. you can use new ArrayList<>(map.values()) if you need a List rather than an unspecific Collection.
If your concern is that the operation should not be performed until the caller commences a terminal operation on the final Stream, you can use an operation like this:
public Stream<ResultType> stream() {
return StreamSupport.stream(() -> items.stream()
.collect(Collectors.groupingBy(classificationFunc,
Collectors.reducing(id, mappingFunc, reductionFunc)))
.values().spliterator(),
Spliterator.SIZED, false);
}
I haven't tested this, but using the StreamEx library, you should be able to collapse() adjacent elements like this:
items.collapse((a, b) -> a.id() == b.id(), (a, b) -> a.price() < b.price() ? b : a)
I am having problems doing the following using Couchbase Java client 2.2.2 and Rx Observables 1.0.15:
I have a list of strings which are document names
Along with each original document for a document name I would like to load another document (deduced from the original document name) so I would get a pair of documents. If any of those two documents do not exist, do not use this pair any more.
If the pair is valid (i.e. both documents exist) then use both documents to create a custom object from them
combine those transformed items into a list
What I have come up with so far looks really mean:
List<E> resultList = new ArrayList<>();
Observable
.from(originalDocumentNames)
.flatmap(key -> {
Observable firstDocument = bucket.async().get(key);
Observable secondDocument = bucket.async().get(getSecondKeyNameFrom(key));
return Observable.merge(firstDocument, secondDocument);
})
.reduce((jsonDocument1, jsonDocument2) -> {
if (jsonDocument1 == null || jsonDocument2 == null) {
return null;
}
resultList.add(createCustomObject(jsonDocument1, jsonDocument2);
return null;
})
.filter(Objects.nonNull)
.singleOrDefault(null)
.subscribe(new Subscriber<E>() {
public void onComplete() {
//use resultList in a callback function
}
});
This does not work. I do not know where, but I think I am using Observable.merge the wrong way.
Also I think I am approaching the whole problem the wrong way.
So the main questions it seems are:
how do I emit an additional item to an Observable stream?
how can I reduce two items into an item of another type? (reduce(T, T, T) does not allow that)
am I taking it on wrong?
You could use zip inside the flatmap. Zip will emit as many items as the Observable with the fewest items. So if one of the documents is missing, its sequence will be empty and zip will skip it.
Observable
.from(originalDocumentNames)
.flatmap(key -> {
//the stream of 0-1 original document
Observable firstDocument = bucket.async().get(key);
//the stream of 0-1 associated document
Observable secondDocument = bucket.async().get(getSecondKeyNameFrom(key));
//using zip and the createCustomObject method reference as a zip function to combine pairs of documents
return Observable.zip(firstDocument, secondDocument, this::createCustomObject);
})
.toList() //let RxJava aggregate into a List
.subscribe(
//the "callback" function, onNext will be called only once with toList
list -> doSomething(list),
//always try to define onError (best practice)
e -> processErrors(e)
);
There's several issue in this code :
side effect, the reduce operation is adding to a list outside the Observable chain, that's wrong. The reduce should either return the list or don't exists at all as Rx has a toList operation. Also because of the reduce operation that returns null the next operations have to handle that ; this is rather inelegant.
merge operation is wrong, you should instead zip in the flatmap and build the pair/aggregate.
Optional point : the flatmap operation does not handle if either get operation will return multiple items (maybe that's de facto the case with couchbase)
Note I don't have an IDE so no code for now. But in my point replacing merge by zip and removing reduce certainly should help.