Nested parallel stream execution in Java - findAny() randomly fails - java

The following code throws the IllegalArgumentException in every 10-15 try for the same input:
AllDirectedPaths<Vertex, Edge> allDirectedPaths = new AllDirectedPaths<>(graph);
List<GraphPath<Vertex, Edge>> paths = allDirectedPaths.getAllPaths(entry, exit, true, null);
return paths.parallelStream().map(path -> path.getEdgeList().parallelStream()
.map(edge -> {
Vertex source = edge.getSource();
Vertex target = edge.getTarget();
if (source.containsInstruction(method, instructionIndex)) {
return source;
} else if (target.containsInstruction(method, instructionIndex)) {
return target;
} else {
return null;
}
}).filter(Objects::nonNull)).findAny().flatMap(Stream::findAny)
.orElseThrow(() -> new IllegalArgumentException("Given trace refers to no vertex in graph!"));
The idea of the code is to find a vertex that wraps a certain instruction (see containsInstruction()), whereas the vertex is on at least one path from the entry to the exit vertex. I'm aware that the code is not optimal in terms of performance (every intermediate vertex on a path is looked up twice), but that doesn't matter.
The input is simply a trace (String) from which the method and instructionIndex can be derived. All other variables are fixed in that sense. Moreover, the method containsInstruction() doesn't have any side effects.
Does it matter where to put the 'findAny()' stream operation? Should I place it directly following the filter operation? Or are nested parallel streams the problem?

You should use .flatMap(path -> ... ) and remove .flatMap(Stream::findAny).
Your code doesn't work because the first findAny() returns a stream that is always non null, but that might hold null elements.
Then, when you apply the second findAny() by means of the Optional.flatMap(Stream::findAny) call, this last find operation might return an empty Optional, as the result of ending up with a null element of the inner stream.
This is how the code should look:
return paths.stream()
.flatMap(path -> path.getEdgeList().stream()
.map(edge ->
edge.getSource().containsInstruction(method, instructionIndex) ?
edge.getSource() :
edge.getTarget().containsInstruction(method, instructionIndex) ?
edge.getTarget() :
null)
.filter(Objects::nonNull))
.findAny()
.orElseThrow(() -> new IllegalArgumentException("whatever"));
Note aside: why parallel streams? There doesn't seem to be CPU bound tasks in your pipeline. Besides, parallel streams create a lot of overhead. They are useful in very few scenarios, i.e. tens of thousands of elements and intensive CPU operations along the pipeline
EDIT: As suggested in the comments, the map and filter operations of the inner stream could be safely moved to the outer stream. This way, readability is improved and there's no difference performance-wise:
return paths.stream()
.flatMap(path -> path.getEdgeList().stream())
.map(edge ->
edge.getSource().containsInstruction(method, instructionIndex) ?
edge.getSource() :
edge.getTarget().containsInstruction(method, instructionIndex) ?
edge.getTarget() :
null)
.filter(Objects::nonNull)
.findAny()
.orElseThrow(() -> new IllegalArgumentException("whatever"));
Another note: maybe refactoring the code inside map to a method of the Edge class would be better, so that the logic to return either the source, the target or null is in the class that already has all the information.

Related

Iterate with streams (with a flag in for) [keep state within lambda]

I have two lists, allowedOU and parts. I need to know how to iterate through streams, check the condition and if it is true, include the element in a third list, and change the flag (heritable).
for (String part : parts) {
for (BeanOU it : allowedOU) {
if (part.startsWith("OU") && it.OU.equals(part.substring(3) && heritableFlag) {
list.add(part.substring(3, part.length()));
heritableFlag = it.heritable;
break;
}
}
}
I tried something like this
parts.stream()
.filter(parte -> allowedOU.stream()
.anyMatch(allowed -> (parte.startsWith("OU"))
&& allowed.OU.equals(parte.substring(3, parte.length()))
&& finalHeritableFlag))
.forEach(here we don't have it variable...)
"The results of the flow pipeline can be non-deterministic or incorrect if the behavior parameters of the flow operations are stateful."
"Most stream operations accept parameters that describe user-specified behavior, which are often lambda expressions. To preserve correct behavior, these behavior parameters should not interfere, and in most cases should be stateless. Such parameters are always instances of a functional interface such as Function, and are often lambda expressions or method references".
I leave this in case someone needs it!
an approximation to stop when a certain condition is met, it offers us is takeWhile
List<String> partsList = Arrays.asList(parts);
partsList = partsList.stream().filter((a->a.startsWith("OU"))).map(s->s.substring(3)).collect(Collectors.toList());
List<String> finalPartsList1 = partsList;
listOUallowedByUser = allowedOU.stream().takeWhile(allowed -> allowed.heritable).filter(f-> finalPartsList1.contains(f.OU)).map(a-> a.OU).collect(Collectors.toList());;
thanks for mark-rotteveel and alexander-ivanchenko :-)

Using Streams to replace loops

I have the following code:
boolean found = false;
for (Commit commit : commits.values()) {
if (commit.getLogMessage().compareTo(commitMessageToSearch) == 0) {
System.out.println(commit.getSha());
found = true;
}
}
if (!found) {
System.out.println("aksdhlkasj");
}
Is there some way to write this succinctly using streams or anything else in Java
You can use Stream#filter along with Stream#findFirst.
System.out.println(commits.values().stream()
.filter(commit -> commit.getLogMessage().compareTo(commitMessageToSearch) == 0)
.findFirst().map(Commit::getSha).orElse("aksdhlkasj"));
In case you want to print out all the occurrences and print some String only in case, there was no item found, I am afraid there is no way other than collecting all the relevant sha values into a list and checking for its emptiness using Optional:
commits.values()
.stream()
.filter(commit -> commit.getLogMessage().compareTo(commitMessageToSearch) == 0)
.map(Commit::getSha)
.peek(System.out::println)
.collect(Collectors.collectingAndThen(Collectors.toList(), Optional::of))
.filter(List::isEmpty)
.ifPresent(emptyList -> System.out.println("aksdhlkasj"));
Although the intermediate output Optional<List<?>> is against common sense, it helps the further processing using Optinal and comfortably handling the case the list is empty.
However, this form is in my opinion more readable:
List<String> shaList = commits.values()
.stream()
.filter(commit -> commit.getLogMessage().compareTo(commitMessageToSearch) == 0)
.map(Commit::getSha)
.peek(System.out::println)
.collect(Collectors.toList());
if (shaList.isEmpty()) {
System.out.println("aksdhlkasj");
}
The following options are available:
Use StreamEx library (maven repo) and its quasi-intermediate operation StreamEx::ifEmpty:
import one.util.streamex.StreamEx;
// ...
String msgSearch = "commit message";
StreamEx.of(commits.values())
.filter(c -> c.getLogMessage().compareTo(msgSearch) == 0)
.ifEmpty("no commit message found: " + msgSearch)
.map(Commit::getSha)
.forEach(System.out::println);
Plus: very concise and clean, single run
Minus: use of a 3rd-party lib
Check the contents of the stream using short-circuiting match without redundant collecting into a list just to check if it's empty
if (commits.values().stream()
.map(Commit::getLogMessage) // equals should be equivalent to compareTo == 0
.anyMatch(msgSearch::equals)
) {
commits.values().stream()
.filter(c -> c.getLogMessage().compareTo(msgSearch) == 0)
.map(Commit::getSha)
.forEach(System.out::println);
} else {
System.out.println("no commit message found: " + msgSearch);
}
Plus: using standard API (no 3-rd party) without side-effect
Minuses: too verbose, in the worst case (long list with the matching element in the end) double iteration of the stream
Use effectively final variable and its setting from the stream as a side effect.
Disclaimer: usage of stateful streams and side effects are not recommended in general:
The best approach is to avoid stateful behavioral parameters to stream operations entirely...
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care
Thus, if AtomicBoolean is carefully selected as a thread-safe and fast container of a boolean value for found flag, which is only set to true from inside the stream (never reset), the following solution may be offered and the risks of safety and performance are mitigated:
// effectively final and thread-safe container boolean
AtomicBoolean found = new AtomicBoolean();
commits.values().stream()
.filter(c -> c.getLogMessage().compareTo(commitMessageToSearch) == 0)
.map(Commit::getSha)
.forEach(sha -> {
found.set(true);
System.out.println(sha);
});
if (!found.get()) {
System.out.println("none found");
}
Plus: using standard API (no 3rd-party library), single run, no redundant collection
Minus: using side-effect, discouraged by purists, just mimics of for-each loop

Compare String with each element of a String Array using Streams

Hi i want to count how many time a String is found in a Array of Strings using Streams
What i have thought so far is this:
Stream<String> stream=Arrays.stream(array);
int counter= (int) stream.filter(c-> c.contains("something")).count();
return counter;
The problem that i get is that most of the time i get an error of NullPointerException and i think is because of .count() if it doesn't get any much inside filter(c-> c.contains("something")).
And i came to this conclusion cause if i run it with out .count() like that stream.filter(c-> c.contains("something")); without returning nothing, it won't throw an Exception. I'm not sure about it but that's what i think.
Any ideas on how i can count the times a String appears in and Array of Strings using Streams?
null is a valid element of an array, so you have to be prepared to handle these. For example:
int counter = stream.filter(c -> c != null && c.contains("something")).count();
The problem that i get is that most of the time i get an error of
NullPointerException and i think is because of .count() And i came to
this conclusion cause if i run it with out .count()
it won't throw an Exception.
The reason being you cannot replicate the NullPointerException without calling count is because streams are lazy evaluated i.e. the entire pipeline is not executed until an eager operation (an operation which triggers the processing of the pipeline) is invoked.
We can come to the conclusion that Arrays.stream(array) is not the culprit for the NullPointerException because it would have blown up regardless of wether you called an eager operation on the stream or not as the parameter to Arrays.stream should be nonNull or else it would bomb out with the aforementioned error.
Thus we can come to the conclusion that the elements inside the array are the culprits for this error in the code you've illustrated but then you should ask your self are null elements allowed in the first place and if so then filter them out before performing c.contains("something") and if not then you should debug at which point in your application were nulls added to the array when they should not be. find the bug rather than suppress it.
if null's are allowed in the first place then the solution is simple i.e. filter the nulls out before calling .contains:
int counter = (int)stream.filter(Objects::nonNull)
.filter(c -> c.contains("something")) // or single filter with c -> c != null && c.contains("something") as pred
.count();
You have to filter for null values first. Do it either the way #pafauk. answered or by filtering sepraretly. That requires the null filter to be applied before the one you already use:
public static void main(String[] args) {
List<String> chainedChars = new ArrayList<>();
chainedChars.add("something new"); // match
chainedChars.add("something else"); // match
chainedChars.add("anything new");
chainedChars.add("anything else");
chainedChars.add("some things will never change");
chainedChars.add("sometimes");
chainedChars.add(null);
chainedChars.add("some kind of thing");
chainedChars.add("sumthin");
chainedChars.add("I have something in mind"); // match
chainedChars.add("handsome thing");
long somethings = chainedChars.stream()
.filter(java.util.Objects::nonNull)
.filter(cc -> cc.contains("something"))
.count();
System.out.printf("Found %d somethings", somethings);
}
outputs
Found 3 somethings
while switching the filter lines will result in a NullPointerException.

Java Stream .map ternary operator

I have a TXT.File with Lines which are representing some Objects
R-Line (one)
RN-Line (1...many)
they are connected with id so in order to read the file I made a lines().stream
Stream<Boolean> inLines = in.lines()
//limit lines due to memory footprint
.limit(10)
//filter each line by the given id
.filter(identN -> ident.matches(".*\\t[5]\\t.*"))
/**
* should return all lines with id 5
* if line starts with RN put it in rnArray else in rArray so the objects are connected but i need for validation purposes each line seperate??
*/
.map(y -> (y.startsWith("RN") ? synonym1.add(y) : substance.add(y)));
System.out.println("syn1 = " + synonym1.toString() + "substance: = " + substance + " InLines"+ inLines);
Response is empty :
syn1 = []substance: = [] InLinesjava.util.stream.ReferencePipeline$3#3aa9e816
But it doesn't work. The return of the .map should be another stream so how can I incorporate this logic if I use for each it won't work since I need also the r-Line.
Cause
The response is empty since there is no terminal operation invoked on the Stream that you've created(inLines). Hence both synonym1 and substance remain empty while you try to access them while printing to the console.
Alternate
What you might just be looking for is to replace the final map operation with a forEach, since it would persist both synonym1 and substance types of elements found which seems to be your primary use case. This can be done as:
.forEach(y -> {
if (y.startsWith("RN")) {
synonym1.add(y);
} else {
substance.add(y);
}
});
Note
Currently, it doesn't make much sense to collect the Stream<Boolean> into a Collection, since that would include the result of .add operation on the synonym1 and substance collections for each filtered element.
Thanks #Naman, it helped me alot since i found a way that allows me too use the ternary operator and split the stream into 2 seperate Lists by grouping the
.forEach((x) -> ((x.startsWith("RN"))?synonym:substance).add(x));
so the problem is that the x operator was not in brackets i think it is because
grouping objects or is there any other explaination
Thanks

How to collect results after filtering and mapping a parallelStream in Java8?

I want to extract a collection of objects from another collection. The objects to be filtered must be a specific type (or subtype) and must intersect with a given Shape. I want to do it with parallelStream
I have the following code:
public class ObjectDetector {
...
public ObjectDetector(final Collection<WorldObject> objects,
final BiFunction<Shape, Shape, Boolean> isIntersecting) {
...
}
public List<ISensor> getSonarObjects(final Shape triangle) {
return selectIntersecting(triangle, ISensor.class);
}
private <T> List<T> selectIntersecting(Shape triangle, Class<T> type) {
return objects.parallelStream()
.filter(o -> type.isInstance(o) && isIntersecting.apply(o.getShape(), triangle))
.map(o -> type.cast(o)).collect(Collectors.toList());
The problematic part is in the List<T> selectIntersecting(Shape triangle, Class<T> type) method, in which objects is a Collection and isIntersecting is a BiFunction<Shape,Shape,Boolean>.
When I'm using stream() instead of parallelStream() all my tests are green. So I may assume that the filtering and mapping logic works fine. However when I am trying to use the parallelStream() my tests are failing unpredictably. The only coherence that I was able to observe is that the size() of the returned List<T> is less than or equal to (but of course never greater) the size I expect.
A failing testcase for example:
int counter = 0;
public BiFunction<Shape, Shape, Boolean> every2 = (a, b) -> {
counter++;
return counter % 2 == 0 ? true : false;
};
#Test
public void getEvery2Sonar() {
assertEquals("base list size must be 8",8,list.size());
ObjectDetector detector = new ObjectDetector(list, every2);
List<ISensor> sonarable = detector.getSonarObjects(triangle);
assertEquals("number of sonar detectables should be 3", 3, sonarable.size());
}
And the test result is:
Failed tests: getEvery2Sonar(hu.oe.nik.szfmv.environment.ObjectDetectorTest): number of sonar detectables should be 3 expected:<3> but was:<2>
In my understanding - as it is written here - it is possible to collect a parallelStream into non-concurrent Collection.
I've also tried to find some clues on the Parallelism tutorial page, but I'm still clueless.
Could someone please provide me an explanation about what am I doing wrong?
Your predicate function has side effects - this is going to go badly with parallelStream because the evaluation order across the input stream is non-deterministic, plus you have no locking on your mutable state.
Indeed, the documentation for filter states* that the predicate must be stateless.
I'm not sure what behaviour you're trying to achieve here, so I'm not sure what an appropriate "fix" might be.
* No pun intended.

Categories