In our project we are migrating to java 8 and we are testing the new features of it.
On my project I'm using Guava predicates and functions to filter and transform some collections using Collections2.transform and Collections2.filter.
On this migration I need to change for example guava code to java 8 changes. So, the changes I'm doing are the kind of:
List<Integer> naturals = Lists.newArrayList(1,2,3,4,5,6,7,8,9,10,11,12,13);
Function <Integer, Integer> duplicate = new Function<Integer, Integer>(){
#Override
public Integer apply(Integer n)
{
return n * 2;
}
};
Collection result = Collections2.transform(naturals, duplicate);
To...
List<Integer> result2 = naturals.stream()
.map(n -> n * 2)
.collect(Collectors.toList());
Using guava I was very confortable debugging the code since I could debug each transformation process but my concern is how to debug for example .map(n -> n*2).
Using the debugger I can see some code like:
#Hidden
#DontInline
/** Interpretively invoke this form on the given arguments. */
Object interpretWithArguments(Object... argumentValues) throws Throwable {
if (TRACE_INTERPRETER)
return interpretWithArgumentsTracing(argumentValues);
checkInvocationCounter();
assert(arityCheck(argumentValues));
Object[] values = Arrays.copyOf(argumentValues, names.length);
for (int i = argumentValues.length; i < values.length; i++) {
values[i] = interpretName(names[i], values);
}
return (result < 0) ? null : values[result];
}
But it isn't as straighforward as Guava to debug the code, actually I couldn't find the n * 2 transformation.
Is there a way to see this transformation or a way to easy debug this code?
EDIT: I've added answer from different comments and posted answers
Thanks to Holger comment that answered my question, the approach of having lambda block allowed me to see the transformation process and debug what happened inside lambda body:
.map(
n -> {
Integer nr = n * 2;
return nr;
}
)
Thanks to Stuart Marks the approach of having method references also allowed me to debug the transformation process:
static int timesTwo(int n) {
Integer result = n * 2;
return result;
}
...
List<Integer> result2 = naturals.stream()
.map(Java8Test::timesTwo)
.collect(Collectors.toList());
...
Thanks to Marlon Bernardes answer I noticed that my Eclipse doesn't show what it should and the usage of peek() helped to display results.
I usually have no problem debugging lambda expressions while using Eclipse or IntelliJ IDEA. Just set a breakpoint and be sure not to inspect the whole lambda expression (inspect only the lambda body).
Another approach is to use peek to inspect the elements of the stream:
List<Integer> naturals = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12,13);
naturals.stream()
.map(n -> n * 2)
.peek(System.out::println)
.collect(Collectors.toList());
UPDATE:
I think you're getting confused because map is an intermediate operation - in other words: it is a lazy operation which will be executed only after a terminal operation was executed. So when you call stream.map(n -> n * 2) the lambda body isn't being executed at the moment. You need to set a breakpoint and inspect it after a terminal operation was called (collect, in this case).
Check Stream Operations for further explanations.
UPDATE 2:
Quoting Holger's comment:
What makes it tricky here is that the call to map and the lambda
expression are in one line so a line breakpoint will stop on two
completely unrelated actions.
Inserting a line break right after map(
would allow you to set a break point for the lambda expression only.
And it’s not unusual that debuggers don’t show intermediate values of
a return statement. Changing the lambda to n -> { int result=n * 2; return result; }
would allow you to inspect result. Again, insert line
breaks appropriately when stepping line by line…
IntelliJ has such a nice plugin for this case as a Java Stream Debugger plugin. You should check it out: https://plugins.jetbrains.com/plugin/9696-java-stream-debugger?platform=hootsuite
It extends the IDEA Debugger tool window by adding the Trace Current Stream Chain button, which becomes active when debugger stops inside of a chain of Stream API calls.
It has nice interface for working with separate streams operations and gives you opportunity to follow some values that u should debug.
You can launch it manually from the Debug window by clicking here:
Debugging lambdas also works well with NetBeans. I'm using NetBeans 8 and JDK 8u5.
If you set a breakpoint on a line where there's a lambda, you actually will hit once when the pipeline is set up, and then once for each stream element. Using your example, the first time you hit the breakpoint will be the map() call that's setting up the stream pipeline:
You can see the call stack and the local variables and parameter values for main as you'd expect. If you continue stepping, the "same" breakpoint is hit again, except this time it's within the call to the lambda:
Note that this time the call stack is deep within the streams machinery, and the local variables are the locals of the lambda itself, not the enclosing main method. (I've changed the values in the naturals list to make this clear.)
As Marlon Bernardes pointed out (+1), you can use peek to inspect values as they go by in the pipeline. Be careful though if you're using this from a parallel stream. The values can be printed in an unpredictable order across different threads. If you're storing values in a debugging data structure from peek, that data structure will of course have to be thread-safe.
Finally, if you're doing a lot of debugging of lambdas (especially multi-line statement lambdas), it might be preferable to extract the lambda into a named method and then refer to it using a method reference. For example,
static int timesTwo(int n) {
return n * 2;
}
public static void main(String[] args) {
List<Integer> naturals = Arrays.asList(3247,92837,123);
List<Integer> result =
naturals.stream()
.map(DebugLambda::timesTwo)
.collect(toList());
}
This might make it easier to see what's going on while you're debugging. In addition, extracting methods this way makes it easier to unit test. If your lambda is so complicated that you need to be single-stepping through it, you probably want to have a bunch of unit tests for it anyway.
Just to provide more updated details (Oct 2019), IntelliJ has added a pretty nice integration to debug this type of code that is extremely useful.
When we stop at a line that contains a lambda if we press F7 (step into) then IntelliJ will highlight what will be the snippet to debug. We can switch what chunk to debug with Tab and once we decided it then we click F7 again.
Here some screenshots to illustrate:
1- Press F7 (step into) key, will display the highlights (or selection mode)
2- Use Tab multiple times to select the snippet to debug
3- Press F7 (step into) key to step into
Intellij IDEA 15 seems to make it even easier, it allows to stop in a part of the line where lambda is, see the first feature: http://blog.jetbrains.com/idea/2015/06/intellij-idea-15-eap-is-open/
Debugging using IDE's are always-helpful, but the ideal way of debugging through each elements in a stream is to use peek() before a terminal method operation since Java Steams are lazily evaluated, so unless a terminal method is invoked, the respective stream will not be evaluated.
List<Integer> numFromZeroToTen = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
numFromZeroToTen.stream()
.map(n -> n * 2)
.peek(n -> System.out.println(n))
.collect(Collectors.toList());
Related
This question already has answers here:
Intermediate stream operations not evaluated on count
(3 answers)
Closed last month.
The following program is from OCP Study Guide by Jeanne Boyarsky and Scott Selikoff:
import java.util.*;
class WhaleDataCalculator {
public int processRecord(int input) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
// Handle interrupted exception
}
return input + 1;
}
public void processAllData(List<Integer> data) {
data.stream().map(a -> processRecord(a)).count();
}
public static void main(String[] args) {
WhaleDataCalculator calculator = new WhaleDataCalculator();
// Define the data
List<Integer> data = new ArrayList<Integer>();
for (int i = 0; i < 4000; i++)
data.add(i);
// Process the data
long start = System.currentTimeMillis();
calculator.processAllData(data);
double time = (System.currentTimeMillis() - start) / 1000.0;
// Report results
System.out.println("\nTasks completed in: " + time + " seconds");
}
}
The authors claim
Given that there are 4,000 records, and each record takes 10
milliseconds to process, by using a serial stream(), the results will
take approximately 40 seconds to complete this task.
However, when I am running this in my system, it is taking between 0.006 seconds to 0.009 seconds on every run.
Where is the discrepancy?
That's because of the use of count, which performs a trick in later Java versions.
Since you're only interested in the number of elements, count will try to get the size directly from the source, and will skip most other operations. This is possible because you are only doing a map and not, for example, a filter, so the number of elements will not change.
If you add peek(System.out::println), you'll see no output as well.
If you call forEach instead of count, running the code will probably take 40 seconds.
Since Java 9 operation count() has been optimized in such so that if during the initialization of the stream (when stages of the pipeline are being chained) it turns out that there are no operations which can change the number of elements in the stream source allows evaluating the number of elements it contains, then count() does not trigger the execution of the pipeline, but instead asks the source "how many of these guys do you have?" and immediately returns the value.
So while running processAllData() a Stream instance would be constructed and right after that the method would terminate, because none of the elements would be actually processed.
Here's a quote from the documentation:
API Note:
An implementation may choose to not execute the stream pipeline
(either sequentially or in parallel) if it is capable of computing the
count directly from the stream source. In such cases no source
elements will be traversed and no intermediate operations will be
evaluated. Behavioral parameters with side-effects, which are strongly
discouraged except for harmless cases such as debugging, may be
affected.For example, consider the following stream:
List<String> l = Arrays.asList("A", "B", "C", "D");
long count = l.stream().peek(System.out::println).count();
The number of elements covered by the stream source, a List, is known
and the intermediate operation, peek, does not inject into or remove
elements from the stream (as may be the case for flatMap or filter
operations). Thus the count is the size of the List and there is no
need to execute the pipeline and, as a side-effect, print out the list
elements.
And by the way, besides the trick behind this test, this case doesn't require the usage of Stream API. Since the value returned by count() is ignored and everything that is need is to fire a side-effect on each element of the list, then Iterable.forEach() can be used instead:
public void processAllData(List<Integer> data) {
data.forEach(a -> processRecord(a));
}
The call of .map(a -> processRecord(a)) did not run at all, the reason is because you are running this program with a JDK version more than 1.8.
Let's take this example to make it easy to understand:
long number = Stream.of("x", "x", "x").map(e -> {
System.out.println("Hello");
return e;
}).count();
System.out.println(number);
Try to run it using a JDK 1.8 , after that run it using a JDK 11.
In java 8, count() acts as a terminal operation, all the intermediate operations(map method here) will be executed, the map operation will be executed and will print the hello message. you will get this output:
Hello
Hello
Hello
3
In greater than 1.8 Java versions, 11 as example here, Java can determine the number of elements of the stream directly, if there is no intermediate operation that can change the number of the elements of the stream (example : filter() ), no intermediate method will be executed, just the count method will be executed, so you will not see any hello message but the number of the element of this stream will be calculated and you can use it. your output will be like that:
3
If you like to see the hello message in the Java versions greater than 1.8, you should add an intermediate operation to your stream pipeline that can change the number of element of the stream, let's add the filter method to the pipeline and see the output on java 11:
long number = Stream.of("x", "x", "x").map(e -> {
System.out.println("Hello");
return e;
}).filter(element-> element.equals("x")).count();
System.out.println(number);
Here the output:
Hello
Hello
Hello
3
This question already has answers here:
Java 8 Streams peek api
(4 answers)
How to use Streams api peek() function and make it work?
(2 answers)
Closed 4 years ago.
I've got 2 statements, I expected that they should "print" same result:
Arrays.stream("abc".split("")).forEach(System.out::println);//first
Arrays.stream("abc".split("")).peek(new Consumer<String>() {//second
#Override
public void accept(String s) {
System.out.println(s);//breakpoint
}
});
In fact, the first statement will print
a
b
c
Ok, but the second statement prints nothing. I tried to set a breakpoint in the line of "//breakpoint" inside IntelliJ, but it wasn't hit.
So how should I change the second statement to use "peek" as it create a new stream while processing every element using "Consumer"?
Thanks a lot.
Stream.peek, as stated in the javadocs of the API as well, is meant mainly for debugging purposes and performing any update operations on the stream during the peek operation is not recommended.
For example, you can verify the intermediate stream state with the following code and what it eventually results in:
Arrays.stream("acb".split(""))
.peek(System.out::println) // print a c b
.sorted()
.forEach(System.out::println); // print a b c
In general, this operation is an intermediate operation wouldn't be executed unless and terminal operation is performed on the stream as mentioned in the Stream operations and pipelines section of the docs and that is exactly the reason why your first statement will print.
Note: Though as suggested in a few other answers, the action within peek
is not invoked in the cases when its able to optimize the result for some short-circuiting operations like findFirst etc.
In cases where the stream implementation is able to optimize away the
production of some or all the elements (such as with short-circuiting
operations like findFirst, or in the example described in count()),
the action will not be invoked for those elements.
peek() is not terminal operation, you need to add any terminal operation to make peek work, e.g.
Arrays.stream("abc".split("")).peek(new Consumer<String>() { //second
#Override
public void accept(String s) {
System.out.println(s);//breakpoint
}
}).count();
The peek() is not a terminal operation, it produces an intermediate stream. Your stream would be executed only when it finds a terminal operation.
For eg: if you add the count() terminal operation to your second stream, you will get the expected output.
Note - You got an output for the first stream because forEach() is a terminal operation.
Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy. So, Steam will start executing the operation pipeline once it gets any terminal operation. In your first case forEach is the terminal operation, so the stream executed. But in the second ca,se the last operation in the pipeline is peek() which is not a terminal operation.
I have read a lot about Java 8 streams lately, and several articles about lazy loading with Java 8 streams specifically: here and over here. I can't seem to shake the feeling that lazy loading is COMPLETELY useless (or at best, a minor syntactic convenience offering zero performance value).
Let's take this code as an example:
int[] myInts = new int[]{1,2,3,5,8,13,21};
IntStream myIntStream = IntStream.of(myInts);
int[] myChangedArray = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n))
.toArray();
This will log in the console, because the terminal operation, in this case toArray(), is called, and our stream is lazy and executes only when the terminal operation is called. Of course I can also do this:
IntStream myChangedInts = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n));
And nothing will be printed, because the map isn't happening, because I don't need the data. Until I call this:
int[] myChangedArray = myChangedInts.toArray();
And voila, I get my mapped data, and my console logs. Except I see zero benefit to it whatsoever. I realize I can define the filter code long before I call to toArray(), and I can pass around this "not-really-filtered stream around), but so what? Is this the only benefit?
The articles seem to imply there is a performance gain associated with laziness, for example:
In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimized to make it being capable of processing the large amount of data with high performance.
and
Java 8 Streams API optimizes stream processing with the help of short circuiting operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behavior.
It sounds literally like breaking out of a loop, and not associated with laziness at all.
Finally, there is this perplexing line in the second article:
Lazy operations achieve efficiency. It is a way not to work on stale data. Lazy operations might be useful in the situations where input data is consumed gradually rather than having whole complete set of elements beforehand. For example consider the situations where an infinite stream has been created using Stream#generate(Supplier<T>) and the provided Supplier function is gradually receiving data from a remote server. In those kind of the situations server call will only be made at a terminal operation when it's needed.
Not working on stale data? What? How does lazy loading keep someone from working on stale data?
TLDR: Is there any benefit to lazy loading besides being able to run the filter/map/reduce/whatever operation at a later time (which offers zero performance benefit)?
If so, what's a real-world use case?
Your terminal operation, toArray(), perhaps supports your argument given that it requires all elements of the stream.
Some terminal operations don't. And for these, it would be a waste if streams weren't lazily executed. Two examples:
//example 1: print first element of 1000 after transformations
IntStream.range(0, 1000)
.peek(System.out::println)
.mapToObj(String::valueOf)
.peek(System.out::println)
.findFirst()
.ifPresent(System.out::println);
//example 2: check if any value has an even key
boolean valid = records.
.map(this::heavyConversion)
.filter(this::checkWithWebService)
.mapToInt(Record::getKey)
.anyMatch(i -> i % 2 == 0)
The first stream will print:
0
0
0
That is, intermediate operations will be run just on one element. This is an important optimization. If it weren't lazy, then all the peek() calls would have to run on all elements (absolutely unnecessary as you're interested in just one element). Intermediate operations can be expensive (such as in the second example)
Short-circuiting terminal operation (of which toArray isn't) make this optimization possible.
Laziness can be very useful for the users of your API, especially when the final result of the Stream pipeline evaluation might be very large!
The simple example is the Files.lines method in the Java API itself. If you don't want to read the whole file into the memory and you only need the first N lines, then just write:
Stream<String> stream = Files.lines(path); // lazy operation
List<String> result = stream.limit(N).collect(Collectors.toList()); // read and collect
You're right that there won't be a benefit from map().reduce() or map().collect(), but there's a pretty obvious benefit with findAny() findFirst(), anyMatch(), allMatch(), etc. Basically, any operation that can be short-circuited.
Good question.
Assuming you write textbook perfect code, the difference in performance between a properly optimized for and a stream is not noticeable (streams tend to be slightly better class loading wise, but the difference should not be noticeable in most cases).
Consider the following example.
// Some lengthy computation
private static int doStuff(int i) {
try { Thread.sleep(1000); } catch (InterruptedException e) { }
return i;
}
public static OptionalInt findFirstGreaterThanStream(int value) {
return IntStream
.of(MY_INTS)
.map(Main::doStuff)
.filter(x -> x > value)
.findFirst();
}
public static OptionalInt findFirstGreaterThanFor(int value) {
for (int i = 0; i < MY_INTS.length; i++) {
int mapped = Main.doStuff(MY_INTS[i]);
if(mapped > value){
return OptionalInt.of(mapped);
}
}
return OptionalInt.empty();
}
Given the above methods, the next test should show they execute in about the same time.
public static void main(String[] args) {
long begin;
long end;
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanStream(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanFor(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
}
OptionalInt[8]
5119
OptionalInt[8]
5001
Anyway, we spend most of the time in the doStuff method. Let's say we want to add more threads to the mix.
Adjusting the stream method is trivial (considering your operations meets the preconditions of parallel streams).
public static OptionalInt findFirstGreaterThanParallelStream(int value) {
return IntStream
.of(MY_INTS)
.parallel()
.map(Main::doStuff)
.filter(x -> x > value)
.findFirst();
}
Achieving the same behavior without streams can be tricky.
public static OptionalInt findFirstGreaterThanParallelFor(int value, Executor executor) {
AtomicInteger counter = new AtomicInteger(0);
CompletableFuture<OptionalInt> cf = CompletableFuture.supplyAsync(() -> {
while(counter.get() != MY_INTS.length-1);
return OptionalInt.empty();
});
for (int i = 0; i < MY_INTS.length; i++) {
final int current = MY_INTS[i];
executor.execute(() -> {
int mapped = Main.doStuff(current);
if(mapped > value){
cf.complete(OptionalInt.of(mapped));
} else {
counter.incrementAndGet();
}
});
}
try {
return cf.get();
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
return OptionalInt.empty();
}
}
The tests execute in about the same time again.
public static void main(String[] args) {
long begin;
long end;
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanParallelStream(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
ExecutorService executor = Executors.newFixedThreadPool(10);
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanParallelFor(5678, executor));
end = System.currentTimeMillis();
System.out.println(end-begin);
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
executor.shutdownNow();
}
OptionalInt[8]
1004
OptionalInt[8]
1004
In conclusion, although we don't squeeze a big performance benefit out of streams (considering you write excellent multi-threaded code in your for alternative), the code itself tends to be more maintainable.
A (slightly off-topic) final note:
As with programming languages, higher level abstractions (streams relative to fors) make stuff easier to develop at the cost of performance. We did not move away from assembly to procedural languages to object-oriented languages because the later offered greater performance. We moved because it made us more productive (develop the same thing at a lower cost). If you are able to get the same performance out of a stream as you would do with a for and properly written multi-threaded code, I would say it's already a win.
I have a real example from our code base, since I'm going to simplify it, not entirely sure you might like it or fully grasp it...
We have a service that needs a List<CustomService>, I am suppose to call it. Now in order to call it, I am going to a database (much simpler than reality) and obtaining a List<DBObject>; in order to obtain a List<CustomService> from that, there are some heavy transformations that need to be done.
And here are my choices, transform in place and pass the list. Simple, yet, probably not that optimal. Second option, refactor the service, to accept a List<DBObject> and a Function<DBObject, CustomService>. And this sounds trivial, but it enables laziness (among other things). That service might sometimes need only a few elements from that List, or sometimes a max by some property, etc. - thus no need for me to do the heavy transformation for all elements, this is where Stream API pull based laziness is a winner.
Before Streams existed, we used to use guava. It had Lists.transform( list, function) that was lazy too.
It's not a fundamental feature of streams as such, it could have been done even without guava, but it's s lot simpler that way. The example here provided with findFirst is great and the simplest to understand; this is the entire point of laziness, elements are pulled only when needed, they are not passed from an intermediate operation to another in chunks, but pass from one stage to another one at a time.
One interesting use case that hasn't been mentioned is arbitrary composition of operations on streams, coming from different parts of the code base, responding to different sorts of business or technical requisites.
For example, say you have an application where certain users can see all the data but certain other users can only see part of it. The part of the code that checks user permissions can simply impose a filter on whatever stream is being handed about.
Without lazy streams, that same part of the code could be filtering the already realized full collection, but that may have been expensive to obtain, for no real gain.
Alternatively, that same part of the code might want to append its filter to a data source, but now it has to know whether the data comes from a database, so it can impose an additional WHERE clause, or some other source.
With lazy streams, it's a filter that can be implemented ever which way. Filters imposed on streams from the database can translate into the aforementioned WHERE clause, with obvious performance gains over filtering in-memory collections resulting from whole table reads.
So, a better abstraction, better performance, better code readability and maintainability, sounds like a win to me. :)
Non-lazy implementation would process all input and collect output to a new collection on each operation. Obviously, it's impossible for unlimited or large enough sources, memory-consuming otherwise, and unnecessarily memory-consuming in case of reducing and short-circuiting operations, so there are great benefits.
Check the following example
Stream.of("0","0","1","2","3","4")
.distinct()
.peek(a->System.out.println("after distinct: "+a))
.anyMatch("1"::equals);
If it was not behaving as lazy you would expect that all elements would pass through the distinct filtering first. But because of lazy execution it behaves differently. It will stream the minimum amount of elements needed to calculate the result.
The above example will print
after distinct: 0
after distinct: 1
How it works analytically:
First "0" goes until the terminal operation but does not satisfy it. Another element must be streamed.
Second "0" is filtered through .distinct() and never reaches terminal operation.
Since the terminal operation is not satisfied yet, next element is streamed.
"1" goes through terminal operation and satisfies it.
No more elements need to be streamed.
This question already has an answer here:
Java 8 peek vs map
(1 answer)
Closed 4 years ago.
collection.stream().map(x -> {
if(condition) {
x.setY(true);
}
return x;
}).collect(Collectors.toList()));
I have a collection, and I want to set something in x variable when a certain condition is met. I'm doing it like it's shown in the code above. It does its job, and works. However, IntelliJ suggests that I swap .map with .peek. So the code would look like:
collection.stream().peek(x -> {
if(condition) {
x.setY(true);
}
}).collect(Collectors.toList()));
It's one line shorter, but from what I read on peek documentation:
API Note:
This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline:
So, is IntelliJ's suggestion misleading?
Example:
If you use peek() with count() in java 8, peek() will work, but if you use it in java 9, it won't unless you have filter(), because count() in java 9 doesn't go over all the elements.
You should prefer forEach to peek.
List<Integer> l = new ArrayList<>(Arrays.asList(1,2,3));
long c = l.stream().peek(System.out::println).count();
System.out.println(c);
Try the code above in java 8 and 9 and see the difference. My point is that you should follow what the API documentation says and use it accordingly.
No is not misleading.
The JavaDoc states:
peek
Stream peek(Consumer action)
Returns a stream consisting of the elements of this stream, additionally
performing the provided action on each element as elements are consumed from the
resulting stream.
The JavaDoc is very clear that peek have the ability to modify the stream.
I'm wondering if I can add an operation to a stream, based off of some sort of condition set outside of the stream. For example, I want to add a limit operation to the stream if my limit variable is not equal to -1.
My code currently looks like this, but I have yet to see other examples of streams being used this way, where a Stream object is reassigned to the result of an intermediate operation applied on itself:
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
stream.collect(Collectors.toList());
As stated in this stackoverflow post, the filter isn't actually applied until a terminal operation is called. Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
There is no semantic difference between a chained series of invocations and a series of invocations storing the intermediate return values. Thus, the following code fragments are equivalent:
a = object.foo();
b = a.bar();
c = b.baz();
and
c = object.foo().bar().baz();
In either case, each method is invoked on the result of the previous invocation. But in the latter case, the intermediate results are not stored but lost on the next invocation. In the case of the stream API, the intermediate results must not be used after you have called the next method on it, thus chaining is the natural way of using stream as it intrinsically ensures that you don’t invoke more than one method on a returned reference.
Still, it is not wrong to store the reference to a stream as long as you obey the contract of not using a returned reference more than once. By using it they way as in your question, i.e. overwriting the variable with the result of the next invocation, you also ensure that you don’t invoke more than one method on a returned reference, thus, it’s a correct usage. Of course, this only works with intermediate results of the same type, so when you are using map or flatMap, getting a stream of a different reference type, you can’t overwrite the local variable. Then you have to be careful to not use the old local variable again, but, as said, as long as you are not using it after the next invocation, there is nothing wrong with the intermediate storage.
Sometimes, you have to store it, e.g.
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt"))) {
stream.filter(s -> !s.isEmpty()).forEach(System.out::println);
}
Note that the code is equivalent to the following alternatives:
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt")).filter(s->!s.isEmpty())) {
stream.forEach(System.out::println);
}
and
try(Stream<String> srcStream = Files.lines(Paths.get("myFile.txt"))) {
Stream<String> tmp = srcStream.filter(s -> !s.isEmpty());
// must not be use variable srcStream here:
tmp.forEach(System.out::println);
}
They are equivalent because forEach is always invoked on the result of filter which is always invoked on the result of Files.lines and it doesn’t matter on which result the final close() operation is invoked as closing affects the entire stream pipeline.
To put it in one sentence, the way you use it, is correct.
I even prefer to do it that way, as not chaining a limit operation when you don’t want to apply a limit is the cleanest way of expression your intent. It’s also worth noting that the suggested alternatives may work in a lot of cases, but they are not semantically equivalent:
.limit(condition? aLimit: Long.MAX_VALUE)
assumes that the maximum number of elements, you can ever encounter, is Long.MAX_VALUE but streams can have more elements than that, they even might be infinite.
.limit(condition? aLimit: list.size())
when the stream source is list, is breaking the lazy evaluation of a stream. In principle, a mutable stream source might legally get arbitrarily changed up to the point when the terminal action is commenced. The result will reflect all modifications made up to this point. When you add an intermediate operation incorporating list.size(), i.e. the actual size of the list at this point, subsequent modifications applied to the collection between this point and the terminal operation may turn this value to have a different meaning than the intended “actually no limit” semantic.
Compare with “Non Interference” section of the API documentation:
For well-behaved stream sources, the source can be modified before the terminal operation commences and those modifications will be reflected in the covered elements. For example, consider the following code:
List<String> l = new ArrayList(Arrays.asList("one", "two"));
Stream<String> sl = l.stream();
l.add("three");
String s = sl.collect(joining(" "));
First a list is created consisting of two strings: "one"; and "two". Then a stream is created from that list. Next the list is modified by adding a third string: "three". Finally the elements of the stream are collected and joined together. Since the list was modified before the terminal collect operation commenced the result will be a string of "one two three".
Of course, this is a rare corner case as normally, a programmer will formulate an entire stream pipeline without modifying the source collection in between. Still, the different semantic remains and it might turn into a very hard to find bug when you once enter such a corner case.
Further, since they are not equivalent, the stream API will never recognize these values as “actually no limit”. Even specifying Long.MAX_VALUE implies that the stream implementation has to track the number of processed elements to ensure that the limit has been obeyed. Thus, not adding a limit operation can have a significant performance advantage over adding a limit with a number that the programmer expects to never be exceeded.
There is two ways you can do this
// Do some stream stuff
List<E> results = list.stream()
.filter(e -> e.getTimestamp() < max);
.limit(limit > 0 ? limit : list.size())
.collect(Collectors.toList());
OR
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
List<E> results = stream.collect(Collectors.toList());
As this is functional programming you should always work on the result of each function. You should specifically avoid modifying anything in this style of programming and treat everything as if it was immutable if possible.
Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
It should work, however it reads as a mix of imperative and functional coding. I suggest writing it as a fixed stream as per my first answer.
I think your first line needs to be:
stream = stream.filter(e -> e.getTimestamp() < max);
so that your using the stream returned by filter in subsequent operations rather than the original stream.
I known it is a bit too late, but I had the same question myself and didn't find the satisfying answer, however, inspired by this question and answers I came to the following solution:
return Stream.of( ///< wrap target stream in other stream ;)
/*do regular stream stuff*/
stream.filter(e -> e.getTimestamp() < max)
).flatMap(s -> limit != -1 ? s.limit(limit) : s) ///< apply limit only if necessary and unwrap stream of stream to "normal" stream
.collect(Collectors.toList()) ///< do final stuff