Debugging RxJava zip operator that halts - java

Writing in Java I call zip() method that receives a few method that return an Observable<...>.
Currently I am not able to progress to the following map and this is probably due to the fact that one of the methods didn't return a value yet. (Though it seems all methods where called.)
Is there a way to debug the process and see why it is stuck?
Thanks.

Suppose you have:
result = Observable.zip(sourceA, sourceB, sourceC)
Just add a .doOnNext() on each of the sources to log what they are emitting (or instead of doOnNext, subscribe to each). For instance:
result = Observable.zip(sourceA.doOnNext(/*logging...*/),
sourceB.doOnNext(/*logging...*/),
sourceC.doOnNext(/*logging...*/))
What's probably happening is that one of these sources isn't emitting at the same frequency as the others. zip must be used when you strictly know that all the sources emit events at the same pace/frequency. You might want to try using combineLatest. The difference between the two are:
zip: the returned Observable emits the n-th 'combination' item only when all the n-th items of the sources have been emitted. See a diagram.
combineLatest: the returned Observable emits an 'combination' item whenever any of its sources emits an item. See a diagram.

Related

Why is the same Comparator acting differently in unit tests vs. when run as a web app?

TL;DR: After much trial and error, it appears as though the issue is Tomcat related, maybe in regards to the configured java versions, rather than the java language itself. See 'Edit 3' below for more details.
I have been using Java 8 streams and Comparators for awhile now and have never seen this type of behavior before, so I'm asking out of curiosity to see if anyone can find what's wrong with my stream.
I'm working on "Java-8-ifying" a legacy project by replacing our antiquated collection processing with streams (I was asked why I'm doing this, and the short answer is that we're essentially re-writing the project, but only have the time budget to do it incrementally. I'm doing the first step - updating the java version. There is a lot of messy code around collection logic, so "Java-8-ifying" is serving to clean up a lot of code and making things easier to read and maintain). For now, we're still using the old data types, so any 'date's mentioned are dealing with java.util.Date instances instead of the new Java 8 types.
This is my Comparator in ServiceRequest.java (which is a POJO):
public static final Comparator<ServiceRequest> BY_ACTIVITY_DATE_DESC = Comparator.comparing(
ServiceRequest::getActivityDate, Comparator.nullsLast(Comparator.reverseOrder()));
When unit tested, this Comparator works as expected. ServiceRequests with a later activityDate are first in the resulting list, ones with an earlier activityDate are further down the list, and ones with a null activityDate are at the bottom. For reference, here is a complete copy of the unit test:
#Test
public void testComparator_BY_ACTIVITY_DATE_DESC() {
ServiceRequest olderRequest = new ServiceRequest();
olderRequest.setActivityDate(DateUtil.yesterday());
ServiceRequest newerRequest = new ServiceRequest();
newerRequest.setActivityDate(DateUtil.tomorrow());
ServiceRequest noActivityDateRequest = new ServiceRequest();
List<ServiceRequest> sortedRequests = Arrays.asList(olderRequest, noActivityDateRequest, newerRequest).stream()
.sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
.collect(Collectors.toList());
assertEquals(sortedRequests.get(0), newerRequest);
assertEquals(sortedRequests.get(1), olderRequest);
assertEquals(sortedRequests.get(2), noActivityDateRequest);
}
Note: DateUtil is a legacy utility that creates java.util.Date instances for our testing purposes.
This test always passes with flying colors, as I would expect. However, I have a controller that assembles a list of open service requests and groups them by requester identifier and selects only the most recent request for that user into a map. I attempted to convert this logic into the given stream:
private Map<Long, ServiceRequestViewBean> ServiceRequestsByUser(List<ServiceRequest> serviceRequests) {
return serviceRequests.stream()
.sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
.collect(Collectors.toMap(
serviceRequest -> serviceRequest.getRequester().getId(),
serviceRequest -> new ServiceRequestViewBean(serviceRequest),
(firstServiceRequest, secondServiceRequest) -> firstServiceRequest)
);
}
My logic was that after the requests were sorted by most recent requests first, whenever multiple requests made by the same user were processed, only the most recent one would be put into the map.
However the observed behavior was that the OLDEST request was being put in the map instead. Note: I've since verified that when the controller code is invoked via a jUnit test, the behavior is as expected; the erroneous behavior only exhibits when the endpoint on the controller is called while running on tomcat. See 'Edit 3' for more details
I added some peeks to view the ServiceRequest IDs (not the requester IDs, which in this case would be the same when the merge function is encountered) before sorting, after sorting, and in the merge function for the map collecting. For simplicity, I limited the data to a single requester's 4 requests.
The expected order of ServiceRequest IDs:
ID ACTIVITY DATE
365668 06-JUL-18 09:01:44
365649 05-JUL-18 15:41:40
365648 05-JUL-18 15:37:43
365647 05-JUL-18 15:31:47
Output of my peeks:
Before Sorting: 365647
Before Sorting: 365648
Before Sorting: 365649
Before Sorting: 365668
After Sorting: 365647
After Sorting: 365648
First request: 365647, Second request: 365648
After Sorting: 365649
First request: 365647, Second request: 365649
After Sorting: 365668
First request: 365647, Second request: 365668
I thought the interspersion of the map merge output with the after-sort peek was interesting, but I guess since there were no more stateful-intermediate operations, it just decided to add things to the map as it was peeking at them.
Since the peek output was the same before and after sorting, I concluded that either the sorting had no effect on the encounter-order, or that the comparator was sorting in ascending order for some reason (contrary to intended design), and either the input from the database just happened to be in this order, or the stream resolved the sorting before either of the peeks (though I'm not sure that's possible...). Out of curiosity, I put a sort on the database call to see if it would change the outcome of this stream. I told the database call to sort by activity date descending so that the order of the input going into the stream would be guaranteed. If the comparator was somehow inverted, it should flip the order of the items back to ascending order.
However the output of the DB-ordered stream was much like the first, only the order held true to the original order produced by the database sort... which leads me to believe that my comparator is having absolutely NO effect on this stream.
My question is WHY is this? Does the toMap collector ignore encounter order? If so, why does this cause the sorted call to be ineffective? I thought that sorted, as a stateful intermediate step, forced subsequent steps to observe encounter order (with the exception of forEach, as there is a forEachOrdered).
When I looked up the javadoc for toMap, it had a note about concurrency:
The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are merged into the Map in encounter order, using toConcurrentMap(Function, Function, BinaryOperator) may offer better parallel performance.
This leads me to believe that encounter order SHOULD be preserved by the toMap collector. I'm very lost and confused as to why I'm observing this particular behavior. I know I can solve this issue by doing a date comparison in my merge function, but I seek to understand WHY my comparator seems to work when used with a toList collector, but not with a toMap collector.
Thank you in advance for your insight!
Edit 1: Many have suggested using a LinkedHashMap to solve the problem, so I implemented that solution like so:
return serviceRequests.stream()
.sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
.collect(Collectors.toMap(
serviceRequest -> serviceRequest.getRequester().getId(),
serviceRequest -> new ServiceRequestViewBean(serviceRequest),
(serviceRequestA, serviceRequestB) -> serviceRequestA,
LinkedHashMap::new));
But when tested, it's actually resolving to an older one instead of the desired newest one that the comparator should be enforcing. I'm still confused.
Note: I've since verified that the erroneous behavior only exhibits when run as a webapp on Tomcat. When this code is called via a jUnit test, it functions as one would expect it to. See 'Edit 3' for more details
Edit 2: Interestingly enough, when I implemented the solution I -thought- would work (sorting in the merge function) is also not working:
return serviceRequests.stream()
.sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
.collect(Collectors.toMap(
serviceRequest -> serviceRequest.getRequester().getId(),
serviceRequest -> new ServiceRequestViewBean(serviceRequest),
(firstServiceRequest, secondServiceRequest) -> {
return Stream.of(firstServiceRequest, secondServiceRequest)
.peek(request -> System.out.println("- Before Sort -\n\tRequester ID: "
+ request.getRequester().getId() + "\n\tRequest ID: " + request.getId()))
.sorted(ServiceRequest.BY_ACTIVITY_DATE_DESC)
.peek(request -> System.out.println("- After sort -\n\tRequester ID: "
+ request.getRequester().getId() + "\n\tRequest ID: " + request.getId()))
.findFirst().get();
}));
Which produces the following output:
- Before Sort -
Requester ID: 67200307
Request ID: 365647
- Before Sort -
Requester ID: 67200307
Request ID: 365648
- After sort -
Requester ID: 67200307
Request ID: 365647
- Before Sort -
Requester ID: 67200307
Request ID: 365647
- Before Sort -
Requester ID: 67200307
Request ID: 365649
- After sort -
Requester ID: 67200307
Request ID: 365647
- Before Sort -
Requester ID: 67200307
Request ID: 365647
- Before Sort -
Requester ID: 67200307
Request ID: 365668
- After sort -
Requester ID: 67200307
Request ID: 365647
Note: I've since verified that this erroneous output is only produced when running as a web app on Tomcat. When the code is invoked via jUnit test, it functions correctly as one would expect. See 'Edit 3' for more details
This seems to indicate that my comparator really is doing nothing, or is aggressively sorting in the opposite order that it does in the unit test, or perhaps findFirst is doing the same thing as toMap was doing. However the javadoc for findFirst would suggest that findFirst respects encounter order when an intermediate step such as sorted is used.
Edit 3: I was asked to make a minimum, complete, and verifiable sample project, so I did: https://github.com/zepuka/ecounter-order-map-collect
I tried a couple different tactics to try and reproduce the issue (each tagged in the repo), but was unable to reproduce the erroneous behavior I've been getting in my controller. My first solution, and all suggestions I've since tried, have all produced the desired, correct behavior! So why then when the application is run do I get different behavior? For kicks and giggles, I exposed the method on my controller as public so I could unit test it and used the exact same data that's been giving me trouble during the run in the unit test - it functions normally in the jUnit test. There must be something different that allows this code to run correctly in unit tests and in a normal Java main method, but incorrectly when run on my tomcat server.
The Java version I'm compiling with and running my server on are the same, though: 1.8.0_171-b11 (Oracle). At first I was building and running from within Netbeans, but I did a command line build and tomcat startup just to be sure there wasn't some weird Netbeans setting interfering. When I look at the run properties in netbeans though, it does saying it's using a 'Java EE 7 Web' as the Java EE version alongside the server setting (which is Apache Tomcat 8.5.29, running java 8), and I'll admit I have no idea what the 'Java EE version' is all about.
So I've added the Tomcat tag to this post as my problem seems to be Tomcat-related rather than Java language-related. At this point, the only way to solve my problem seems to be using a non-stream approach to building the map, but I would still love to know what people's thoughts are on what configuration I could look into to fix the issue.
Edit 4: I tried to get around the problem by using the old ways of doing things, and when I avoid using a Comparator in a Stream, things are fine, but as soon as I introduce a Comparator to a Stream anywhere in the process, the web app fails to behave correctly. I tried processing the list without a stream, and only using a stream on the two requests to use the comparator when merging into the map, but that doesn't work. I tried just making the Comparator old fashioned with an inline class definition, using plain old java instead of Comparator.comparing, but using that in a stream fails. Only when I'm avoiding the streams and comparator entirely does it seem to work.
Finally got to the bottom of it!
I was able to isolate the issue by first applying the new comparator to everywhere that needed it. I was able to observe that most of these behaved as I would expect, and only on a certain page was the problem surfacing.
My prior debugging output only included the IDs, but this time I included the activity dates for convenience, and when I hit that one JSP, they were null!
The root of the problem is that in ONE case in a single DAO method (which did some regex parsing to call different internal methods - yeah, it's a mess), it wasn't using the row mapper I had checked before... this particular helper mehod contained a nefarious inline 'row mapper' that primitively used a loop and index to get the results of the query and put them in the object, and it was missing the activity date column. Seems that the history of the development of this particular page (as documented by the commit history I dug through), suffered from performance issues, so when they 'improved performance', they made that inline row mapper to only include the most crucial pieces of data needed at the time. And it turns out that it was the only page that fell into that particular branch of regular expression logic, which I had not previously noticed.
And that's just another reason why this project is earning the 'worst case of spaghetti code' award. This one was particularly hard to track down as it was impossible to confirm if a 'working' result was truly working or if it just happened to work that time because order wasn't guaranteed from the database, nor when all the dates were null.
TL;DR: Not tomcat's fault, but a rogue inline row mapper in a corner branch of DAO logic only triggered by a specific JSP

Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?

Getting a Spliterator from a Stream pipeline may return an instance of a StreamSpliterators.WrappingSpliterator. For example, getting the following Spliterator:
Spliterator<String> source = new Random()
.ints(11, 0, 7) // size, origin, bound
.filter(nr -> nr % 2 != 0)
.mapToObj(Integer::toString)
.spliterator();
Given the above Spliterator<String> source, when we traverse the elements individually through the tryAdvance (Consumer<? super P_OUT> consumer) method of Spliterator, which in this case is an instance of StreamSpliterators.WrappingSpliterator, it will first accumulate items into an internal buffer, before consuming those items, as we can see in StreamSpliterators.java#298. From a simple point of view, the doAdvance() inserts items first into buffer and then it gets the next item and pass it to consumer.accept (…).
public boolean tryAdvance(Consumer<? super P_OUT> consumer) {
boolean hasNext = doAdvance();
if (hasNext)
consumer.accept(buffer.get(nextToConsume));
return hasNext;
}
However, I am not figuring out the need of this buffer.
In this case, why the consumer parameter of the tryAdvance is not simply used as a terminal Sink of the pipeline?
Keep in mind that this is the Spliterator returned by the public method Stream.spliterator(), so no assumptions about the caller can be made (as long as it is within the contract).
The tryAdvance method may get called once for each of the stream’s elements and once more to detect the end of the stream, well, actually, it might get called an arbitrary number of times even after hitting the end. And there is no guaranty that the caller will always pass the same consumer.
To pass a consumer directly to the source spliterator without buffering, you will have to compose a consumer that will perform all pipeline stages, i.e. call a mapping function and use its result or test a predicate and not call the downstream consumer if negative and so on. The consumer passed to the source spliterator would also be responsible to notify the WrappingSpliterator somehow about a value being rejected by the filter as the source spliterator’s tryAdvance method still returns true in that case and the operation would have to be repeated then.
As Eugene correctly mentioned, this is the one-fits-all implementation that doesn’t consider how many or what kind of pipeline stages are there. The costs of composing such a consumer could be heavy and might have to be reapplied for each tryAdvance call, read for every stream element, e.g. when different consumers are passed to tryAdvance or when equality checks do not work. Keep in mind that consumers are often implemented as lambda expressions and the identity or equality of the instances produced by lambda expressions is unspecified.
So the tryAdvance implementation avoids these costs by composing only one consumer instance on the first invocation that will always store the element into the same buffer, also allocated on the first invocation, if not rejected by a filter. Note that under normal circumstances, the buffer will only hold one element. Afaik, flatMap is the only operation that may push more elements to the buffer. But note that the existence of this non-lazy behavior of flatMap is also the reason why this buffering strategy is required, at least when flatMap is involved, to ensure that the Spliterator implementation handed out by a public method will fulfill the contract of passing at most one element to the consumer during one invocation of tryAdvance.
In contrast, when you call forEachRemaining, these problems do not exist. There is only one Consumer instance during the entire operation and the non-laziness of flatMap doesn’t matter either, as all elements will get consumed anyway. Therefore, a non-buffering transfer will be attempted, as long as no previous tryAdvance call was made that could have caused buffering of some elements:
public void forEachRemaining(Consumer<? super P_OUT> consumer) {
if (buffer == null && !finished) {
Objects.requireNonNull(consumer);
init();
ph.wrapAndCopyInto((Sink<P_OUT>) consumer::accept, spliterator);
finished = true;
}
else {
do { } while (tryAdvance(consumer));
}
}
As you can see, as long as the buffer has not been initialized, i.e. no previous tryAdvance call was made, consumer::accept is bound as Sink and a complete direct transfer made.
I mostly agree with great #Holger answer, but I would put accents differently. I think it is hard for you to understand the need for a buffer because you have very simplistic mental model of what Stream API allows. If one thinks about Stream as a sequence of map and filter, there is no need for additional buffer because those operations have 2 important "good" properties:
Work on one element at a time
Produce 0 or 1 element as a result
However those are not true in general case. As #Holger (and I in my original answer) mentioned there is already flatMap in Java 8 that breaks rule #2 and in Java 9 they've finally added takeWhile that actually transforms on whole Stream -> Stream rather than on a per-element basis (and that is AFAIK the first intermediate shirt-circuiting operation).
Another point I don't quite agree with #Holger is that I think that the most fundamental reason is a bit different than the one he puts in the second paragraph (i.e. a) that you may call tryAdvance post the end of the Stream many times and b) that "there is no guaranty that the caller will always pass the same consumer"). I think that the most important reason is that Spliterator being functionally identical to Stream has to support short-circuiting and laziness (i.e. ability to not process the whole Stream or else it can't support unbound streams). In other words, even if Spliterator API (quite strangely) required that you must use the same Consumer object for all calls of all methods for a given Spliterator, you would still need tryAdvance and that tryAdvance implementation would still have to use some buffer. You just can't stop processing data if all you've got is forEachRemaining(Consumer<? super T> ) so you can't implement anything similar to findFirst or takeWhile using it. Actually this is one of the reasons why inside JDK implementation uses Sink interface rather than Consumer (and what "wrap" in wrapAndCopyInto stands for): Sink has additional boolean cancellationRequested() method.
So to sum up: a buffer is required because we want Spliterator:
To use simple Consumer that provides no means to report back end of processing/cancellation
To provide means to stop processing of the data by a request of the (logical) consumer.
Note that those two are actually slightly contradictory requirements.
Example and some code
Here I'd like to provide some example of code that I believe is impossible to implement without additional buffer given current API contract (interfaces). This example is based on your example.
There is simple Collatz sequence of integers that is conjectured to always eventually hit 1. AFAIK this conjecture is not proved yet but is verified for many integers (at least for whole 32-bit int range).
So assume that the problem we are trying to solve is following: from a stream of Collatz sequences for random start numbers in range from 1 to 1,000,000 find the first that contains "123" in its decimal representation.
Here is a solution that uses just Stream (not a Spliterator):
static String findGoodNumber() {
return new Random()
.ints(1, 1_000_000) // unbound!
.flatMap(nr -> collatzSequence(nr))
.mapToObj(Integer::toString)
.filter(s -> s.contains("123"))
.findFirst().get();
}
where collatzSequence is a function that returns Stream containing the Collatz sequence until the first 1 (and for nitpickers let it also stop when current value is bigger than Integer.MAX_VALUE /3 so we don't hit overflow).
Every such Stream returned by collatzSequence is bound. Also standard Random will eventually generate every number in the provided range. It means that we are guaranteed that there eventually will be some "good" number in the stream (for example just 123) and findFirst is short-circuiting so the whole operation will actually terminate. However no reasonable Stream API implementation can predict this.
Now let's assume that for some strange reason you want to perform the same thing using intermediate Spliterator. Even though you have only one piece of logic and no need for different Consumers, you can't use forEachRemaining. So you'll have to do something like this:
static Spliterator<String> createCollatzRandomSpliterator() {
return new Random()
.ints(1, 1_000_000) // unbound!
.flatMap(nr -> collatzSequence(nr))
.mapToObj(Integer::toString)
.spliterator();
}
static String findGoodNumberWithSpliterator() {
Spliterator<String> source = createCollatzRandomSpliterator();
String[] res = new String[1]; // work around for "final" closure restriction
while (source.tryAdvance(s -> {
if (s.contains("123")) {
res[0] = s;
}
})) {
if (res[0] != null)
return res[0];
}
throw new IllegalStateException("Impossible");
}
It is also important that for some starting numbers the Collatz sequence will contain several matching numbers. For example, both 41123 and 123370 (= 41123*3+1) contain "123". It means that we really don't want our Consumer to be called post the first matching hit. But since Consumer doesn't expose any means to report end of processing, WrappingSpliterator can't just pass our Consumer to the inner Spliterator. The only solution is to accumulate all results of inner flatMap (with all the post-processing) into some buffer and then iterate over that buffer one element at a time.
Spliterators are designed to handle sequential processing of each item in encounter order, and parallel processing of items in some order. Each method of the Spliterator must be able to support both early binding and late binding. The buffering is intended to gather data into suitable, processable chunks, that follow the requirements for ordering, parallelization and mutability.
In other words, tryAdvance() is not the only method in the class, and other methods have to work with each other to deliver the external contract. To do that, in the face of sub-classes that may override some or all of the methods, requires that each method obey its internal contract.
This is something that I've read from Holger quite in a few posts and I'll just sum it up here; if there's a certain exact duplicate (I'll try to find one) - I will close and delete my answer in respect to that one.
First, is why WrappingSpliterator are needed in the first place - for stateful operations like sorted, distinct, etc - but I think you already understood that. I assume for flatMap also - since it is eager.
Now, when you call spliterator, IFF there are no stateful operations there is no real reason to wrap that into a WrappingSpliterator obviously, but at the moment this is not done. This could be changed in a future release - where they can detect if there are stateful operations before you call spliterator; but they don't do that now and simply treat every operation as stateful, thus wrapping it into WrappingSpliterator

Rx Java subscribe only to one item emitted

I have a BehaviorSubject data that contains actual data (or maybe nothing, if nothing has been emitted to it). I want to subscribe for only one item it emits, i.e. either the current observed value or the first to be passed to it from somewhere else. I'm currently doing it the following way:
Subscription firstItemSubscription = data.subscribe(item -> {
firstItemSubscription.unsubscribe();
processItem(item);
});
Is there any operator or transformer that I could use instead? Or probably there is completely different, more Rx approach that would allow me to do the thing I want to?
Yes use just need to use take(1)
Observable observable = //some observable
observable.take(1).subscribe(/* do your thing */);

Why does my RxJava Observable not emit or complete unless it's blocking?

Background
I have a number of RxJava Observables (either generated from Jersey clients, or stubs using Observable.just(someObject)). All of them should emit exactly one value. I have a component test that mocks out all the Jersey clients and uses Observable.just(someObject), and I see the same behaviour there as when running the production code.
I have several classes that act upon these observables, perform some calculations (& some side-effects - I might make them direct return values later) and return empty void observables.
At one point, in one such class, I'm trying to zip several of my source observables up and then map them - something like the below:
public Observable<Void> doCalculation() {
return Observable.zip(
getObservable1(),
getObservable2(),
getObservable3(),
UnifyingObject::new
).concatMap(unifyingObject -> unifyingObject.processToNewObservable())
}
// in Unifying Object
public Observable<Void> processToNewObservable() {
// ... do some calculation ...
return Observable.empty();
}
The calculating classes are then all combined and waited on:
// Wait for rule computations to complete
List<Observable<Void>> calculations = ...;
Observable.zip(calculations, results -> results)
.toBlocking().lastOrDefault(null);
The problem
The trouble is, processToNewObservable() is never being executed. By process of elimination, I can see it's getObservable1() that's the trouble - if I replace it with Observable.just(null), everything's executed as I'd imagine (but with a null value where I want a real one).
To reiterate, getObservable1() returns an Observable from a Jersey client in production code, but that client is a Mockito mock returning Observable.just(someValue) in my test.
Investigation
If I convert getObservable1() to blocking, then wrap the first value in just(), again, everything executes as I'd imagine (but I don't want to introduce the blocking step):
Observable.zip(
Observable.just(getObservable1().toBlocking().first()),
getObservable2(),
getObservable3(),
UnifyingObject::new
).concatMap(unifyingObject -> unifyingObject.processToNewObservable())
My first thought was that perhaps something else was consuming the value emitted from my observable, and zip was seeing that it was already complete, thus determining that the result of zipping them should be an empty observable. I've tried adding .cache() onto every observable source I can think is related, however, but that hasn't altered the behaviour.
I've also tried adding next / error / complete / finally handlers on getObservable1 (without converting it to blocking) just before the zip, but none of them executed either:
getObservable1()
.doOnNext(...)
.doOnCompleted(...)
.doOnError(...)
.finallyDo(...);
Observable.zip(
getObservable1(),
getObservable2(),
getObservable3(),
UnifyingObject::new
).concatMap(unifyingObject -> unifyingObject.processToNewObservable())
The question
I'm very new to RxJava, so I'm pretty sure I'm missing something fundamental. The question is: what stupid thing could I be doing? If that's not obvious from what I've said so far, what can I do to help diagnose the issue?
The Observable must emit to start the chain. You have to think of your pipeline as a declaration of what will happen when the Observable emits.
You didn't share what was actually being observed, but Observable.just() causes the Observable to emit the wrapped object immediately.
Based on the response in the comment, either one of the getObservable doesn't return any value but just completes or the Mockito mocking does something wrong. The following standalone example works for me. Could you check it and start slowly mutating it to see where things break?
Observable.zip(
Observable.just(1),
Observable.just(2),
Observable.just(3),
(a, b, c) -> new Integer[] { a, b, c })
.concatMap(a -> Observable.from(a))
.subscribe(System.out::println)
;
Note: I didn't find my answer here very satisfying, so I dug in a bit further and found a much smaller reproduction case, so I've asked a new question here: Why does my RxJava Observable emit only to the first consumer?
I've figured out at least part of my troubles (and, apologies to all who tried to answer, I don't think you stood much of a chance given my explanation).
The various classes which perform these calculations were all returning Observable.empty() (as per processToNewObservable() in my original example). As far as I can tell, Observable.zip() doesn't subscribe to the Nth observable it's zipping until the N-1th observable has emitted a value.
My original example claimed it was getObservable1() that was misbehaving - that was actually slight inaccurate, it was a later observable in the parameter list. As I understand it, the reason making it blocking, then turning that value into an Observable again worked is because making it blocking and calling first forced its execution, and I got the side-effects I wanted.
If I change all my calculating classing to return Observable.just(null) instead, everything works: the final zip() of all the calculation classes' observables works through them all, so all the expected side-effects happen.
Returning a null Void seems like I'm definitely Doing Something Wrong from a design point of view, but at least this particular question is answered.

Mutable parameters in Java 8 Streams

Looking at this question: How to dynamically do filtering in Java 8?
The issue is to truncate a stream after a filter has been executed. I cant use limit because I dont know how long the list is after the filter. So, could we count the slements after the filter?
So, I thought I could create a class that counts and pass the stream through a map.The code is in this answer.
I created a class that counts but leave the elements unaltered, I use a Function here, to avoid to use the lambdas I used in the other answer:
class DoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
So my Stream was finally:
persons.stream()
.filter(u -> u.size > 12)
.filter(u -> u.weitght > 12)
.map(counter)
.sorted((p1, p2) -> p1.age - p2.age)
.collect(Collectors.toList())
.stream()
.limit((int) (counter.i.intValue() * 0.5))
.sorted((p1, p2) -> p2.length - p1.length)
.limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
But my question is about another part of the my example.
collect(Collectors.toList()).stream().
If I remove that line the consequences are that the counter is ZERO when I try to execute limit. I am somehow cheating the "efectively final" requirement by using a mutable object.
I may be wrong, but I iunderstand that the stream is build first, so if we used mutable objects to pass parameters to any of the steps in the stream these will be taken when the stream is created.
My question is, if my assumption is right, why is this needed? The stream (if non parallel) could be pass sequentially through all the steps (filter, map..) so this limitation is not needed.
Short answer
My question is, if my assumption is right, why is this needed? The
stream (if non parallel) could be pass sequentially through all the
steps (filter, map..) so this limitation is not needed.
As you already know, for parallel streams, this sounds pretty obvious: this limitation is needed because otherwise the result would be non deterministic.
Regarding non-parallel streams, it is not possible because of their current design: each item is only visited once. If streams did work as you suggest, they would do each step on the whole collection before going to the next step, which would probably have an impact on performance, I think. I suspect that's why the language designers made that decision.
Why it technically does not work without collect
You already know that, but here is the explanation for other readers.
From the docs:
Streams are lazy; computation on the source data is only performed
when the terminal operation is initiated, and source elements are
consumed only as needed.
Every intermediate operation of Stream, such as filter() or limit() is actually just some kind of setter that initializes the stream's options.
When you call a terminal operation, such as forEach(), collect() or count(), that's when the computation happens, processing items following the pipeline previously built.
This is why limit()'s argument is evaluated before a single item has gone through the first step of the stream. That's why you need to end the stream with a terminal operation, and start a new one with the limit() you'll then know.
More detailed answer about why not allow it for parallel streams
Let your stream pipeline be step X > step Y > step Z.
We want parallel treatment of our items. Therefore, if we allow step Y's behavior to depend on the items that already went through X, then Y is non deterministic. This is because at the moment an item arrives at step Y, the set of items that have already gone through X won't be the same across multiple executions (because of the threading).
More detailed answer about why not allow it for non-parallel streams
A stream, by definition, is used to process the items in a flow. You could think of a non-parallel stream as follows: one single item goes through all the steps, then the next one goes through all the steps, etc. In fact, the doc says it all:
The elements of a stream are only visited once during the life of a
stream. Like an Iterator, a new stream must be generated to revisit
the same elements of the source.
If streams didn't work like this, it wouldn't be any better than just do each step on the whole collection before going to the next step. That would actually allow mutable parameters in non-parallel streams, but it would probably have a performance impact (because we would iterate multiple times over the collection). Anyway, their current behavior does not allow what you want.

Categories