How to Concurrently Process Lines of Text with RxJava - java

I was wondering how I could concurrently process lines of text with RxJava. Right now, what I have is an observable from an entry set, and a subscriber that on each entry, processes the entry in the onNext call. The subscriber subscribes to the observable with this line of code
obs.observeOn(Schedulers.io()).subscribe(sub);
But when I run it, it runs as slowly as the sequential version, and seems to be processing it sequentially. How would I make this concurrent?

Your observeOn(Schedulers.io()) call means that all emissions will be observed on that one thread. You want to get them onto their own threads.
Here I use flatMap to create a new observable for each item emitted from the source. Inside the mapping function I have to defer the processing work until subscription, else the entire chain is blocked while processing completes. I also have to ensure that subscription happens on a new thread via subscribeOn.
Random r = new Random();
Observable.from(new String[]{"First", "Second", "Third", "Fourth", "Fifth"})
.flatMap(new Func1<String, Observable<String>>() {
public Observable<String> call(final String s) {
return Observable.defer(new Func0<Observable<String>>() {
public Observable<String> call() {
Thread.sleep(r.nextInt(1000));
return Observable.just(s);
}
}).subscribeOn(Schedulers.newThread());
}
})
.subscribe(new Action1<String>() {
#Override
public void call(String s) {
System.out.println("Observed " + s + " on thread " + Thread.currentThread().getId());
}
});
This gives me output like (note out-of-order and on different threads - ie, processed in parallel):
Observed Fourth on thread 17
Observed Second on thread 15
Observed Fifth on thread 18
Observed First on thread 14
Observed Third on thread 16

Related

Has CompletableFuture.allOf() any advantage over a loop with CompletableFuture.join() when just waiting for completion?

I am making multiple async calls to my database. I store all those async calls on a List<CompletableFuture<X>> list. I want to collect all the results together, so I need to wait for all of those calls to complete.
One way is to create a CompletableFuture.allOf(list.toArray(...))...
Another way is to use: list.stream.map(cf -> cf.join())...
I was just wondering if there are any advantages of creating the global CompletableFuture and waiting for it to complete (when all the individual CompletableFuture complete) over directly waiting for the individual CompletableFutures to complete.
The main thread gets blocked either way.
static CompletableFuture<Void> getFailingCF() {
return CompletableFuture.runAsync(() -> {
System.out.println("getFailingCF :: Started getFailingCF.. ");
throw new RuntimeException("getFailingCF:: Failed");
});
}
static CompletableFuture<Void> getOkCF() {
return CompletableFuture.runAsync(() -> {
System.out.println("getOkCF :: Started getOkCF.. ");
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(3));
System.out.println("getOkCF :: Completed getOkCF.. ");
});
}
public static void main(String[] args) {
List<CompletableFuture<Void>> futures = new ArrayList<>();
futures.add(getFailingCF());
futures.add(getOkCF());
// using CompletableFuture.allOf
var allOfCF = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
allOfCF.join();
// invoking join on individual CF
futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
}
In the code snippet above, the difference lies in handling exception: The CompletableFuture.allOf(..) wraps any exception thrown by any of the CompletableFutures while allowing rest of the threads (executing the CompletableFuture) continue their execution.
The list.stream.map(cf -> cf.join())... way immediately throws the exception and terminates the app (and all threads executing the CFs in the list).
Note that invoking join() on allOf throws the wrapped exception, too. It will also terminate the app. But, by this time, unlike list.stream.map(cf -> cf.join())..., the rest of the threads have completed their processing.
allOfCF.whenComplete(..) is one of the graceful ways to handle the execution result (normal or exceptional) of all the CFs:
allOfCF.whenComplete((v, ex) -> {
System.out.println("In whenComplete...");
System.out.println("----------- Exception Status ------------");
System.out.println(" 1: " + futures.get(0).isCompletedExceptionally());
System.out.println(" 2: " + futures.get(1).isCompletedExceptionally());
});
In the list.stream.map(cf -> cf.join())... way, one needs to wrap the join() call in try/catch.

Thread execution of value emitting code and value receiving code in RxJava

I have following code:
private static void log(Object msg) {
System.out.println(
Thread.currentThread().getName() +
": " + msg);
}
Observable<Integer> naturalNumbers = Observable.create(emitter -> {
log("Invoked"); // on main thread
Runnable r = () -> {
log("Invoked on another thread");
int i = 0;
while(!emitter.isDisposed()) {
log("Emitting "+ i);
emitter.onNext(i);
i += 1;
}
};
new Thread(r).start();
});
Disposable disposable = naturalNumbers.subscribe(i -> log("Received "+i));
So here we have 2 important lambda expressions. First is the one we pass to Observable.create, second is the callback one we pass to Observable.subscribe(). In first lambda, we create a new thread and then emit values on that thread. In second lambda, we have the code to receive those values emitted in first lambda code. I observe that both code are executed on same thread.
Thread-0: Invoked on another thread
Thread-0: Emitting 0
Thread-0: Received 0
Thread-0: Emitting 1
Thread-0: Received 1
Thread-0: Emitting 2
Thread-0: Received 2
Why is it so? Does RxJava by default run code emitting values(observable) and the code receiving values(observer) on same thread?
Let's see, what happens, if you use a Thread to execute a runnable:
Test
#Test
void threadTest() throws Exception {
log("main");
CountDownLatch countDownLatch = new CountDownLatch(1);
new Thread(
() -> {
log("thread");
countDownLatch.countDown();
})
.start();
countDownLatch.await();
}
Output
main: main
Thread-0: thread
It seems, that the main entry point is called from main thread and the newly created Thread is called Thread-0.
Why is it so? Does RxJava by default run code emitting values(observable) and the code receiving values(observer) on same thread?
By default RxJava is single-threaded. Therefore the the producer, if not definied differently by observeOn, subscribeOn or different threading layout, will emit values on the consumer (subsriber)-thread. This is because RxJava runs everything on the subscribing stack by default.
Example 2
#Test
void fdskfkjsj() throws Exception {
log("main");
Observable<Integer> naturalNumbers =
Observable.create(
emitter -> {
log("Invoked"); // on main thread
Runnable r =
() -> {
log("Invoked on another thread");
int i = 0;
while (!emitter.isDisposed()) {
log("Emitting " + i);
emitter.onNext(i);
i += 1;
}
};
new Thread(r).start();
});
Disposable disposable = naturalNumbers.subscribe(i -> log("Received " + i));
Thread.sleep(100);
}
Output2
main: main
main: Invoked
Thread-0: Invoked on another thread
Thread-0: Emitting 0
Thread-0: Received 0
Thread-0: Emitting 1
In your example it is apparent, that the main method is called from the main thread. Furthermore the subscribeActual call is also run on the calling-thread (main). But the Observable#create lambda calls onNext from the newly created thread Thread-0. The value is pushed to the subscriber from the calling thread. In this case, the calling thread is Thread-0, because it calls onNext on the downstream subscriber.
How to separate producer from consumer?
Use observeOn/ subscribeOn operators in order to handle concurrency in RxJava.
Should I use low-level Thread constructs แบith RxJava?
No you should not use new Thread in order to seperate the producer from the consumer. It is quite easy to break the contract, that onNext can not be called concurrently (interleaving) and therefore breaking the contract. This is why RxJava provides a construct called Scheduler with Workers in order to mitigate such mistakes.
Note:
I think this article describes it quite well: http://introtorx.com/Content/v1.0.10621.0/15_SchedulingAndThreading.html . Please note this is Rx.NET, but the principle is quite the same. If you want to read about concurrency with RxJava you could also look into Davids Blog (https://akarnokd.blogspot.com/2015/05/schedulers-part-1.html) or read this Book (Reactive Programming with RxJava https://www.oreilly.com/library/view/reactive-programming-with/9781491931646/)

How to create blocking backpressure with rxjava Flowables?

I have a Flowable that we are returning in a function that will continually read from a database and add it to a Flowable.
public void scan() {
Flowable<String> flow = Flowable.create((FlowableOnSubscribe<String>) emitter -> {
Result result = new Result();
while (!result.hasData()) {
result = request.query(skip, limit);
partialResult.getResult()
.getFeatures().forEach(feature -> emmitter.emit(feature));
}
}, BackpressureStrategy.BUFFER)
.subscribeOn(Schedulers.io());
return flow;
}
Then I have another object that can call this method.
myObj.scan()
.parallel()
.runOn(Schedulers.computation())
.map(feature -> {
//Heavy Computation
})
.sequential()
.blockingSubscribe(msg -> {
logger.debug("Successfully processed " + msg);
}, (e) -> {
logger.error("Failed to process features because of error with scan", e);
});
My heavy computation section could potentially take a very long time. So long in fact that there is a good chance that the database requests will load the whole database into memory before the consumer finishes the first couple entries.
I have read up on backpressure with rxjava but the only 4 options essentially make me drop data or replace it with the last.
Is there a way to make it so that when I call emmitter.emit(feature) the call blocks until there is more room in the Flowable?
I.E I want to treat the Flowable as a blocking queue where push will sleep if the queue is past the capacity.

RxJava: subscribeOn and observeOn not working as expected

maybe I just really understand the inner workings of subscribeOn and observeOn, but I recently encountered something really odd. I was under the impression, that subscribeOn determines the Scheduler where to initially start processing (especially when we, e.g., have a lot of maps which change the stream of data) and then observeOn can be used anywhere between those maps to change Schedulers when appropriate (first do networking, then computation, finally change UI thread).
However, I noticed that when not directly chaining those calls to my Observable or Single, it won't work. Here's a minimal working Example JUnit Test:
import org.junit.Test;
import rx.Single;
import rx.schedulers.Schedulers;
public class SubscribeOnTest {
#Test public void not_working_as_expected() throws Exception {
Single<Integer> single = Single.<Integer>create(singleSubscriber -> {
System.out.println("Doing some computation on thread " + Thread.currentThread().getName());
int i = 1;
singleSubscriber.onSuccess(i);
});
single.subscribeOn(Schedulers.computation()).observeOn(Schedulers.io());
single.subscribe(integer -> {
System.out.println("Observing on thread " + Thread.currentThread().getName());
});
System.out.println("Doing test on thread " + Thread.currentThread().getName());
Thread.sleep(1000);
}
#Test public void working_as_expected() throws Exception {
Single<Integer> single = Single.<Integer>create(singleSubscriber -> {
System.out.println("Doing some computation on thread " + Thread.currentThread().getName());
int i = 1;
singleSubscriber.onSuccess(i);
}).subscribeOn(Schedulers.computation()).observeOn(Schedulers.io());
single.subscribe(integer -> {
System.out.println("Observing on thread " + Thread.currentThread().getName());
});
System.out.println("Doing test on thread " + Thread.currentThread().getName());
Thread.sleep(1000);
}
}
The test not_working_as_expected() gives me following output
Doing some computation on thread main
Observing on thread main
Doing test on thread main
whereas working_as_expected() gives me
Doing some computation on thread RxComputationScheduler-1
Doing test on thread main
Observing on thread RxIoScheduler-2
The only difference being that in the first test, after the creation of the single there is a semicolon and only then the schedulers are applied, and in the working example the method calls are directly chained to the creation of the Single. But shouldn't that be irrelevant?
All "modifications" performed by operators are immutable, meaning that they return a new stream that receives notifications in an altered manner from the previous one. Since you just called subscribeOn and observeOn operators and didn't store their result, the subscription made later is on the unaltered stream.
One side note: I didn't quite understand your definition of subscribeOn behavior. If you meant that map operators are somehow affected by it, this is not true. subscribeOn defines a Scheduler, on which the OnSubscribe function is called. In your case the function you pass to the create() method. On the other hand, observeOn defines the Scheduler on which each successive stream (streams returned by applied operators) is handling emissions coming from an upstream.
.subscribeOn(*) - returns you new instance of Observable, but in first test you just ignore that and then subscribe on original Observable, which obviously by default subscribes on default, main thread.

Proper termination of a stuck Couchbase Observable

I'm trying to delete a batch of couchbase documents in rapid fashion according to some constraint (or update the document if the constraint isn't satisfied). Each deletion is dubbed a "parcel" according to my terminology.
When executing, I run into a very strange behavior - the thread in charge of this task starts working as expected for a few iterations (at best). After this "grace period", couchbase gets "stuck" and the Observable doesn't call any of its Subscriber's methods (onNext, onComplete, onError) within the defined period of 30 seconds.
When the latch timeout occurs (see implementation below), the method returns but the Observable keeps executing (I noticed that when it kept printing debug messages when stopped with a breakpoint outside the scope of this method).
I suspect couchbase is stuck because after a few seconds, many Observables are left in some kind of a "ghost" state - alive and reporting to their Subscriber, which in turn have nothing to do because the method in which they were created has already finished, eventually leading to java.lang.OutOfMemoryError: GC overhead limit exceeded.
I don't know if what I claim here makes sense, but I can't think of another reason for this behavior.
How should I properly terminate an Observable upon timeout? Should I? Any other way around?
public List<InfoParcel> upsertParcels(final Collection<InfoParcel> parcels) {
final CountDownLatch latch = new CountDownLatch(parcels.size());
final List<JsonDocument> docRetList = new LinkedList<JsonDocument>();
Observable<JsonDocument> obs = Observable
.from(parcels)
.flatMap(parcel ->
Observable.defer(() ->
{
return bucket.async().get(parcel.key).firstOrDefault(null);
})
.map(doc -> {
// In-memory manipulation of the document
return updateDocs(doc, parcel);
})
.flatMap(doc -> {
boolean shouldDelete = ... // Decide by inner logic
if (shouldDelete) {
if (doc.cas() == 0) {
return Observable.just(doc);
}
return bucket.async().remove(doc);
}
return (doc.cas() == 0 ? bucket.async().insert(doc) : bucket.async().replace(doc));
})
);
obs.subscribe(new Subscriber<JsonDocument>() {
#Override
public void onNext(JsonDocument doc) {
docRetList.add(doc);
latch.countDown();
}
#Override
public void onCompleted() {
// Due to a bug in RxJava, onError() / retryWhen() does not intercept exceptions thrown from within the map/flatMap methods.
// Therefore, we need to recalculate the "conflicted" parcels and send them for update again.
while(latch.getCount() > 0) {
latch.countDown();
}
}
#Override
public void onError(Throwable e) {
// Same reason as above
while (latch.getCount() > 0) {
latch.countDown();
}
}
};
);
latch.await(30, TimeUnit.SECONDS);
// Recalculating remaining failed parcels and returning them for another cycle of this method (there's a loop outside)
}
I think this is indeed due to the fact that using a countdown latch doesn't signal the source that the flow of data processing should stop.
You could use more of rxjava, by using toList().timeout(30, TimeUnit.SECONDS).toBlocking().single() instead of collecting in an (un synchronized and thus unsafe) external list and of using the countdownLatch.
This will block until a List of your documents is returned.
When you create your couchbase env in code, set computationPoolSize to something large. When the Couchbase clients runs out of threads using async it just stops working, and wont ever call the callback.

Categories