Parallel Stream gives null items, How to do in Java 8 - java

I'm getting some weird results when trying to use a parallel stream and I know a workaround, but it doesn't seem ideal
// Create the set "selected"
somethingDao.getSomethingList().parallelStream()
.filter(something -> !selected.contains(something.getSomethingId()))
.forEach(something ->
somethingSubGroupDTO.addFilterDTO(
new FilterDTO(something.getSomethingName(), something.getSomethingDescription(), false))
);
selected.clear();
somethingDao.getSomethingList returns a List
selected is a HashSet<Integer> that is not modified during this operation.
somethingSubGroupDTO.addFilterDTO is a helper function that adds to an unsynchronized List. This is the problem. As an unsynchronzed list I get less items in the list than expected AND some items are null. If I turn this into a synchronized list it works. Obviously adding lock contention to a parallel stream is not ideal.
At the high level I know it's possible to do this in such a way that each stream will do its own processing and when they join they will aggregate. (At least I can imagine such a process without lock contention) However since I'm new to Java 8 stream processing I'm not aware of how. How do I do this same operation without having contention at a single point?

Don't use forEach and collect your Stream into a List instead:
somethingDao.getSomethingList().parallelStream()
.filter(something -> !selected.contains(something.getSomethingId()))
.map(something -> new FilterDTO(something.getSomethingName(), something.getSomethingDescription(), false))
.collect(toList());
Then, you can set the returned list directly into your somethingSubGroupDTO object, instead of adding one item at a time.

Related

How to apply flatMap() to implement the following logic with Streams

I have the following for loop:
List<Player> players = new ArrayList<>();
for (Team team : teams) {
ArrayList<TeamPlayer> teamPlayers = team.getTeamPlayers();
for (teamPlayer player : teamPlayers) {
players.add(new Player(player.getName, player.getPosition());
}
}
and I'm trying to convert it to a Stream:
List<Player> players = teams.forEach(t -> t.getTeamPlayers()
.forEach(p -> players.add(new Player(p.getName(), p.getPosition())))
);
But I'm getting a compilation error:
variable 'players' might not have been initialized
Why is this happening? Maybe there's an alternative way to create the stream, I was thinking of using flatMap but not sure how to apply it.
First of all, you need to understand that Streams don't act like Loops.
Hence, don't try to mimic a loop. Examine the tools offered by the API. Operation forEach() is there for special cases when you need to perform side-effects, not in order to accumulate elements from the stream into a Collection.
Note: with teams.forEach() you're not actually using a stream, but method Iterable.forEach() which is available with every implementation of Iterable.
To perform reduction on streams, we have several specialized operations like collect, reduce, etc. (for more information refer to the API documentation - Reduction).
collect() operation is meant to perform mutable reduction. You can use to collect the data into a list by providing built-in Collector Collectors.toList() as an argument. And since Java 16 operation toList() was introduced into API, which is implemented on top of the toArray() operation and performs better than namesake collector (therefore it's a preferred option if your JDK version allows you to use it).
I was thinking of using flatMap but not sure how to apply it.
Operation flatMap() is meant to perform one-to-many transformations. It expects a Function which takes a stream element and generates a Stream of the resulting type, elements of the generated stream become a replacement for the initial element.
Note: that general approach to writing streams to use as fewer operations as possible (because one of the main advantages that Functional programming brings to Java is simplicity). For that reason, applying flatMap() when a stream element produces a Collection in a single step is idiomatic, since it's sorter than performing map().flatMap() in two steps.
That's how implementation might look like:
List<Team> teams = List.of();
List<Player> players = teams.stream() // Stream<Team>
.flatMap(team -> team.getTeamPlayers().stream()) // Stream<Player>
.map(player -> new Player(player.getName(), player.getPosition()))
.toList(); // for Java 16+ or collect(Collectors.toList())
This is basically the answer of Alexander Ivanchenko, but with method reference.
final var players = teams.stream()
.map(Team::getTeamPlayers)
.flatMap(Collection::stream)
.map(p -> new Player(p.getName(), p.getPosition()))
.toList();
If your Player class has a factory method like (depending on the relation between Player and TeamPlayer:
public static Player fromTeamPlayer(final TeamPlayer teamPlayer) {
return new Player(teamPlayer.getName(), teamPlayer.getPosition());
}
You could further reduce it to:
final var players = teams.stream()
.map(Team::getTeamPlayers)
.flatMap(Collection::stream)
.map(Player::fromTeamPlayer)
.toList();

Why removeif() not available on streams of collection

I see removeif() on ArrayList, but when I do stream() it there is no option of it. Is it because removeif() change the size of collection and stream needs a fix size to work upon ?
to remove a element from a stream you can use Stream::filter. Example:
.filter(e -> e.getId() == 4)
Is it because removeif() change the size of collection and stream needs a fix size to work upon ?
No, in fact stream can even work with a infinite number of objects
Stream doesn't change source collection, stream take elements from sorce (it could be collection, infinite generator) then pass they through chain (transforming, filtering) on each step of mapping it will be a new objects, and then collect what passed in result (it could be collection, joined string or Integer) and return result in variable. It is provide declarative style and immutability, that can do great work in multithread calculations without side effect.

Using anonymous function with stream's map

I have a list of objects that I want to modify with one of the setters is it bad to call it in anonymous map and what are the possible side effects:
.stream().map(foo -> { foo.setDate(date);return foo;})
.collect(Collectors.toList()));
Intellij is telling me to switch it to peek
.stream().peek(foo -> foo.setDate(date).collect(Collectors.toList()));
But I read that peek should be used for debugging only. Should I avoid both ways?
Why don't you use foreach ?
.forEach(foo -> { foo.setDate(date);})
You don't even need to stream the collection.
You will save yourself the cost of creating a new collection as well.
The first way is more than OK in this case, what matters is that you don't change the source of the stream structurally, meaning adding/removing elements to it while you stream.
And indeed IntelliJ is wrong about this, a map is a lot more suited than a peek (it is only for debugging).

Java stream operation invocations

Can anyone point to a official Java documentation which describes how many times Stream will invoke each "non-interfering and stateless" intermediate operation for each element.
For example:
Arrays.asList("1", "2", "3", "4").stream()
.filter(s -> check(s))
.forEach(s -> System.out.println(s));
public boolean check(Object o) {
return true;
}
The above currently will invoke check method 4 times.
Is it possible that in the current or future versions of JDKs the check method gets executed more or less times than the number of elements in the stream created from List or any other standard Java API?
This does not have to do with the source of the stream, but rather the terminal operation and optimization done in the stream implementation itself. For example:
Stream.of(1,2,3,4)
.map(x -> x + 1)
.count();
Since java-9, map will not get executed a single time.
Or:
someTreeSet.stream()
.sorted()
.findFirst();
sorted might not get executed at all, since the source is a TreeSet and getting the first element is trivial, but if this is implemented inside stream API or not, is a different question.
So the real answer here - it depends, but I can't imagine one operation that would get executed more that the numbers of elements in the source.
From the documentation:
Laziness-seeking. Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, "find the first String with three consecutive vowels" need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
By that virtue, because filter is an intermediate operation which creates a new Stream as part of its operation, due to its laziness, it will only ever invoke the filter predicate once per element as part of its rebuilding of the stream.
The only way that your method would possibly have a different number of invocations against it in the stream is if the stream were somehow mutated between states, which given the fact that nothing in a stream actually runs until the terminal operation, would only realistically be possible due to a bug upstream.

Stream on Map doesn't save .map changes

Can someone explain me why the first code example doesn't save the changes I've made with .map on the Map but the second code example does?
First code example:
stringIntegerMap.entrySet().stream()
.map(element -> element.setValue(100));
Second code example:
stringIntegerMap.entrySet().stream()
.map(element -> element.setValue(100))
.forEach(System.out::println);
Also, why does the second code example only print the values and not the whole element (key + value) ?
Your stream operations are lazy-evaluated.
If you do not invoke a terminal operation such as forEach (or collect, etc.), the streaming never actually occurs, hence your setValue is not executed.
Note that modifying the collection/map you are streaming is generally advised against.
Finally, the API for Map.Entry#setValue is here.
You'll notice the method returns:
old value corresponding to the entry
So, when you perform the map operation, the stream generated contains the values.
Some sources here (search for "stream operations and pipelines", and also the part about "non-interference" might help).
Streams are composed of a source, intermediate operations and terminal operations.
The terminal operations start the pipeline processing by lazily gathering elements from the source, then applying intermediate operations and finally executing the terminal operation.
Stream.map is an intermediate operation, whereas Stream.forEach is terminal. So in your first snippet the pipeline processing never starts (hence intermediate operations are never executed), because there's no terminal operation. When you use forEach in your 2nd snippet, then all the pipeline is processed.
Please take a look at the java.util.stream package docs, where there's extensive information about streams and how to use them properly (i.e. you shouldn't modify the source of the stream from within intermediate or final operations, such as you're doing in Stream.map).
Edit:
As to your final question:
why does the second code example only print the values and not the whole element (key + value) ?
Mena's answer explains it well: Map.Entry.setValue not only sets the given value to the entry, but also returns the old value. As you're using Map.Entry.setValue in a lambda expression within the Stream.map intermediate operation, you're actually transforming each Map.Entry element of the stream into the value it had before setting a new value. So, what arrives to Stream.forEach are the old values of the map, while the map has new values set by means of the side-effect produced by Map.Entry.setValue.

Categories