Creating stream of elements from collection

Creating stream of elements from collection - java

I am using Junit 5 Dynamic tests.
My intention is to create a stream of elements from the collection to pass it on to test in JUnit5.
However with this code, I am able to run only 1000 records. How do I make this work seamlessly non-blocking.
MongoCollection<Document> collection = mydatabase.getCollection("mycoll");
final List<Document> cache = Collections.synchronizedList(new ArrayList<Document>());
FindIterable<Document> f = collection.find().batchSize(1000);
f.batchCursor(new SingleResultCallback<AsyncBatchCursor<Document>>() {
#Override
public void onResult(AsyncBatchCursor<Document> t, Throwable thrwbl) {
t.next(new SingleResultCallback<List<Document>>() {
#Override
public void onResult(List<Document> t, Throwable thrwbl) {
if (thrwbl != null) {
th.set(thrwbl);
}
cache.addAll(t);
latch.countDown();;
}
});
}
});
latch.await();
return cache.stream().map(batch->process(batch));
Updated Code
#ParameterizedTest
#MethodSource("setUp")
void cacheTest(MyClazz myclass) throws Exception {
assertTrue(doTest(myclass));
}
public static MongoClient getMongoClient() {
// get client here
}
private static Stream<MyClazz> setUp() throws Exception {
MongoDatabase mydatabase = getMongoClient().getDatabase("test");
List<Throwable> failures = new ArrayList<>();
CountDownLatch latch = new CountDownLatch(1);
List<MyClazz> list = Collections.synchronizedList(new ArrayList<>());
mydatabase.getCollection("testcollection").find()
.toObservable().subscribe(
document -> {
list.add(process(document));
},
throwable -> {
failures.add(throwable);
},
() -> {
latch.countDown();
});
latch.await();
return list.stream();
}
public boolean doTest(MyClazz myclass) {
// processing goes here
}
public MyClazz process(Document doc) {
// doc gets converted to MyClazz
return MyClazz;
}
Even now, I see that all the data is loaded after which the unit testing happens.
I think this is because of latch.await(). However, if I remove that, there is a chance that no test cases are run as the db could possibly be loading collection.
My use case is : I have million records in mongo and am running sort of integration test case with them. It wouldn't be feasible to load all of them in memory and hence I am attempting the streaming solution.

I don't think I fully understand your use case but given that your question is tagged with java and mongo-asyc-driver this requirement is certainly achievable:
create a stream of elements from the collection to pass it on to test ... make this work seamlessly non-blocking
The following code:
Uses the MongoDB RxJava driver to query a collection
Creates a Rx Observable from that collection
Subscribes to that Observable
Records exceptions
Marks completion
CountDownLatch latch = new CountDownLatch(1);
List<Throwable> failures = new ArrayList<>();
collection.find()
.toObservable().subscribe(
// on next, this is invoked for each document returned by your find call
document -> {
// presumably you'll want to do something here to meet this requirement: "pass it on to test in JUnit5"
System.out.println(document);
},
/// on error
throwable -> {
failures.add(throwable);
},
// on completion
() -> {
latch.countDown();
});
// await the completion event
latch.await();
Notes:
This requires use of the MongoDB RxJava driver (i.e. classes in the com.mongodb.rx.client namespace ... the org.mongodb::mongodb-driver-rx Maven artifact)
In your question you are invoking collection.find().batchSize() which clearly indicates that you are not currently using the Rx driver (since batchSize cannot be a Rx friendly concept :)
The above code is verified with v1.4.0 of the MongoDB RxJava driver and v1.1.10 of io.reactive::rxjava
Update 1: based on the change to your question (which follows my original answer), you have asked: " I see that all the data is loaded after which the unit testing happens. I think this is because of latch.await()"? I think you are pop[ulating a list from the observable stream and only after the observable is exhausted do you start invoking doTest(). This approach involves (1) streaming results from MongoDB; (2) storing those results in-memory; (3) running doTest() for each stored result. If you really want to stream all-the-way then you should call doTest() from within your observable's subscription. For example:
mydatabase.getCollection("testcollection").find()
.toObservable().subscribe(
document -> {
doTest(process(document));
},
throwable -> {
failures.add(throwable);
},
() -> {
latch.countDown();
});
latch.await();
The above code will invoke doTest() as it receives each document from MongoDB and when the entire observable is exhausted the latch will be decremented and your code will complete.

Related

Retry stream in akka on failure of any stage in flow

I am using akka stream to process my data. In which I have 1 Source which consists of element UUID.
The flow is as follows :
is fetching the Element from some third party HTTP service which returns complete Element with its properties.
Then I retrieve required data from that element and convert it to object that my application understands.
Then i write the data from that object to DB.
Finally, I update the DB with status of all the elements in the stream.
Now I want to add retry mechanism to this flow so that if any of the stages in flow fails it should retry the stage for some no of times say 3 and if after that it fails then the only failure of the stream should happen. For example, if there are some issues with third-party service like HTTP 504 error then most of the time after retrying this element success. So is there any way in akka to achieve this.
Currently, I am maintaining 1 list to store all the failed element ids like below.
Code :
List<UUID> failedId = new CopyOnWriteArrayList<>();
Source.from(elementIdToProcess).map(f -> {
failedId.add(f);
return f;
}).via(featureFlow()).via(filterNullFeaturesFlow())
.via(mapToFeatureFlow()).via(setFeaturesAdditionalInfo())
.runForeach(features -> {
features.parallelStream().forEach(feature -> {
try {
featureCommitter.save(feature);
failedId.remove(UUID.fromString(feature.getFeatureId()));
} catch (Exception e) {
throw new RuntimeException(e);
}
});
}, materializer);

You can retry like this
source.recoverWithRetries(attempts = 2, {
case _: RuntimeException ⇒ source
})
Or you can have a back off strategy using either RestartSource, RestartSink or RestartFlow.
All of this is documented in the Akka docs

I had a similar issue and solved it with Future feature named retry
Base on your code I would implemente it this way:
#BeforeClass
public static void setUpClass() {
system = ActorSystem.create();
mat = ActorMaterializer.create(system);
}
#AfterClass
public static void tearDown() {
TestKit.shutdownActorSystem(system);
system = null;
}
#Test
public void shouldRetryArbitraryNumberOfTimes() {
doThrow(new RuntimeException()).when(featureCommitter).process(anyString(), anyString(), anyString());
TestSubscriber.Probe<Message> probe = getJmsSource().runWith(TestSink.probe(system), mat);
Throwable throwable = probe.request(1).expectError();
verify(featureCommitter, timeout(5000).times(4)).save(any(Feature.class));
}
private Source<Feature> getJmsSource() {
List<UUID> failedId = new CopyOnWriteArrayList<>();
return Source.from(elementIdToProcess)
.via(featureFlow())
.via(filterNullFeaturesFlow())
.via(mapToFeatureFlow())
.via(setFeaturesAdditionalInfo())
.mapAsync(1, features -> {
features.parallelStream().forEach(feature -> retry(getCompletionStageCallable(feature),
3,
Duration.ofMillis(200),
system.scheduler(),
system.dispatcher());
});
}
private Callable<CompletionStage<Feature>> getCompletionStageCallable(Feature feature) {
() -> {
return return CompletableFuture.supplyAsync(() -> {
try {
featureCommitter.save(feature);
} catch (Exception e) {
throw new RuntimeException(e);
}
return feature;
}
}
The main thing to take into account here is that we are not handling the retry as part of the stream but rather as a way to handle the future.
As you can see I moved the saving out of the Sink and into a mapAsync in which I set the parallelism to 1. This is so that I can use a TestSubscriber.Probe to validate the stream behavior.
I hope this helps.
Regards

Equivalent of VertX CompositeFuture in RxJava

The VertX example for when you need to query multiple asynchronous resources and use them all in a single operation is:
Future<HttpServer> httpServerFuture = Future.future();
httpServer.listen(httpServerFuture.completer());
Future<NetServer> netServerFuture = Future.future();
netServer.listen(netServerFuture.completer());
CompositeFuture.all(httpServerFuture, netServerFuture).setHandler(ar -> {
if (ar.succeeded()) {
// All servers started
} else {
// At least one server failed
}
});
We need to query two different databases and then use the results in business logic, but the flow is equivalent.
What's the VertX/RxJava equivalent?
Currently people are doing this by nesting a new .flatMap() call every time they need a new variable. I'm left feeling there must be a better way...
We don't actually need the queries to be concurrent but we need to cache both results and pass them to the business logic at the same time some how.

there are many ways to do this, but i've tried to pick an approach that tacks closely to your sample:
#Override
public void start(Future<Void> startFuture) throws Exception {
final HttpServer httpServer = vertx.createHttpServer();
final Completable initializeHttpServer = httpServer.rxListen().toCompletable();
final NetServer netServer = vertx.createNetServer();
final Completable initializeNetServer = netServer.rxListen().toCompletable();
initializeHttpServer.andThen(initializeNetServer)
.subscribe(
() -> { /* All servers started */ },
error -> { /* At least one server failed */ }
);
}
the rxListen() invocations are converted into Completable instances, which are then run serially upon subscription.
the subscriber's onComplete callback will be invoked when both servers are done binding to their respective ports, or...
the onError callback will be invoked if an exception occurs
(also, fwiw, "nesting" flatMap operations for something as trivial as this shouldn't be necessary. "chaining" such operations, however, would be idiomatic usage).
hope that helps!
--UPDATE--
having read the question more carefully, i now see that you were actually asking about how to handle the results of two discrete asynchronous operations.
an alternative to flatMap'ing your way to combining the results would be to use the zip operator, like so:
#Override
public void start(Future<Void> startFuture) throws Exception {
final Single<String> dbQuery1 = Single.fromCallable(() -> { return "db-query-result-1"; });
final Single<String> dbQuery2 = Single.fromCallable(() -> { return "db-query-result-2"; });
Single.zip(dbQuery1, dbQuery2, (result1, result2) -> {
// handle the results from both db queries
// (with Pair being a standard tuple-like class)
return new Pair(result1, result2);
})
.subscribe(
pair -> {
// handle the results
},
error -> {
// something went wrong
}
);
}
per the docs, zip allows you to specify a series of reactive types (Single, Observable, etc) along with a function to transform all the results at once, with the central idea being that it will not emit anything until all the sources have emitted once (or more, depending on the reactive type).

Unit test of method with CompletableFuture inside

I have method which in async way calls connector.runSomeService(data) and handles the response in method handleServiceResponse(res, node).
public void runServiceOnAllNodes(Collection<Node> nodes, Object data) {
nodes.parallelStream().forEach(node -> {
CompletableFuture<ResponseEntity> response = CompletableFuture
.supplyAsync(()-> connector.runSomeService(data));
response.exceptionally(ex -> {
log.error("OMG...OMG!!!")
return null;
})
.thenAcceptAsync(res -> handleServiceResponse(res, node));
});
}
private void handleServiceResponse(ResponseEntity res, Node node) {
if (res.isOK) {
node.setOKStatus();
} else {
node.setFailStatus();
}
dbService.saveNode(node);
}
Try to create unit test but when I try to verify if response is properly handled, the result of UT is non deterministic.
#Test
public void testRunServiceOnAllNodes() {
// given
List<Collector> nodes = Arrays.asList(node1, node2, node3);
when(connector.runSomeService(eq(node1), eq(data))).thenReturn(ResponseEntity.ok().body("{message:OK}"));
when(connector.runSomeService(eq(node2), eq(data))).thenReturn(ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(""));
when(connector.runSomeService(eq(node3), eq(data))).thenThrow(new ResourceAccessException(""));
// when
engine.runServiceOnAllNodes(data, collectors);
// then
verify(connector, times(1)).runSomeService(eq(node1), eq(data));
verify(connector, times(1)).runSomeService(eq(node2), eq(data));
verify(connector, times(1)).runSomeService(eq(node3), eq(data));
verifyNoMoreInteractions(connector);
assertEquals(node1.getStatus(), "OK");
assertEquals(node2.getStatus(), "Fail");
}
It can end with a few different results eg.
Wanted but not invoked:
connector.runSomeService(node2);
However, there were other interactions with this mock:
connector.runSomeService(node1);
or
Argument(s) are different! Wanted:
connector.runSomeService(node1);
Actual invocation has different arguments:
connector.deployFileset(node2);
or sometimes it ends with success.
It is clear that the time of execution connector.runSomeService() and the time of the verification can interlace. The order of this two actions is not deterministic.
Using sleep sucks. Tried to gather all responses and calling future.get()
// when
engine.runServiceOnAllNodes(data, collectors);
for (CompletableFuture future : engine.getResponses()) {
future.get();
}
but I'm getting some exception but I still have the feeling that this way also sucks, isn't it?

I would suggest changing the runServiceOnAllNodes method to return a Future so your test, and, as a bonus, normal clients as well, can explicitly wait for the async behavior to finish.
public Future<Void> runServiceOnAllNodes(Collection<Node> nodes, Object data) {
return nodes.parallelStream().map(node -> {
CompletableFuture<ResponseEntity> response = CompletableFuture
.supplyAsync(()-> connector.runSomeService(data));
return response.exceptionally(ex -> {
LOGGER.error("OMG...OMG!!!");
return null;
})
.thenAcceptAsync(res -> handleServiceResponse(res, node));
})
.reduce(CompletableFuture::allOf).orElseGet(() -> CompletableFuture.completedFuture(null));
}
In your test, it is then simply a matter of calling get() on the future prior to making assertions and verifications.

How to test non-RxJava observables or async code in general?

I'm playing around with implementing my own observables or porting them from other languages for fun and profit.
The problem I've run into is that there's very little info on how to properly test observables or async code in general.
Consider the following test code:
// Create a stream of values emitted every 100 milliseconds
// `interval` uses Timer internally
final Stream<Number> stream =
Streams.interval(100).map(number -> number.intValue() * 10);
ArrayList<Number> expected = new ArrayList<>();
expected.add(0);
expected.add(10);
expected.add(20);
IObserver<Number> observer = new IObserver<Number>() {
public void next(Number x) {
assertEquals(x, expected.get(0));
expected.remove(0);
if(expected.size() == 0) {
stream.unsubscribe(this);
}
}
public void error(Exception e) {}
public void complete() {}
};
stream.subscribe(observer);
As soon as the stream is subscribed to, it emits the first value. onNext is called... And then the test exits successfully.
In JavaScript most test frameworks nowadays provide an optional Promise to the test case that you can call asynchronously on success/failure. Is anything similar available for Java?

Since the execution is asyncronious, you have to wait until is finish. You can just wait for some time in an old fashion way
your_code
wait(1000)
check results.
Or if you use Observables you can use TestSubscriber
In this example you can see how having an async operation we wait until the observer consume all items.
#Test
public void testObservableAsync() throws InterruptedException {
Subscription subscription = Observable.from(numbers)
.doOnNext(increaseTotalItemsEmitted())
.subscribeOn(Schedulers.newThread())
.subscribe(number -> System.out.println("Items emitted:" + total));
System.out.println("I finish before the observable finish. Items emitted:" + total);
new TestSubscriber((Observer) subscription)
.awaitTerminalEvent(100, TimeUnit.MILLISECONDS);
}
You can see more Asynchronous examples here https://github.com/politrons/reactive/blob/master/src/test/java/rx/observables/scheduler/ObservableAsynchronous.java

RxJava: How to get all results AND errors from an Observable

I'm working on a project that involves Hystrix, and I decided to use RxJava. Now, forget Hystrix for the rest of this because I believe the main problem is with my complete screwing up of writing the Observable code correctly.
Need:
I need a way to return an observable that represents a number of observables, each running a user task. I want that Observable to be able to return all results from the tasks, even errors.
Problem:
Observable streams die on errors. If I have three tasks and the second task throws an exception, I never receive the third task even if it would have succeeded.
My Code:
public <T> Observable<T> observeManagedAsync(String groupName,List<EspTask<T>> tasks) {
return Observable
.from(tasks)
.flatMap(task -> {
try {
return new MyCommand(task.getTaskId(),groupName,task).toObservable().subscribeOn(this.schedulerFactory.get(groupName));
} catch(Exception ex) {
return Observable.error(ex);
}
});
}
Given that MyCommand is a class that extends HystrixObservableCommand, it returns an Observable and so shouldn't figure in on the problems I'm seeing.
Attempt 1:
Used Observable.flatMap as above
Good: Each Command is scheduled on it's own thread and the tasks run asynchronously.
Bad: On first Command exception, Observable completes having emitted previous successful results and emitting the Exception. Any in-flight Commands are ignored.
Attempt 2:
Used Observable.concatMapDelayError instead of flatMap
Bad: For some reason, tasks run synchronously. Why??
Good: I get all the successful results.
~Good: OnError gets a Composite exception with a list of the exceptions thrown.
Any help will be greatly appreciated and probably result in me being very embarrassed for not having thought of it myself.
Additional Code
This test succeeds with Observable.flatMap, but fails when using Observable.concatMapDelayError because the tasks do not run asynchronously:
java.lang.AssertionError: Execution time ran over the 350ms limit: 608
#Test
public void shouldRunManagedAsyncTasksConcurrently() throws Exception {
Observable<String> testObserver = executor.observeManagedAsync("asyncThreadPool",getTimedTasks());
TestSubscriber<String> testSubscriber = new TestSubscriber<>();
long startTime = System.currentTimeMillis();
testObserver.doOnError(throwable -> {
System.out.println("error: " + throwable.getMessage());
}).subscribe(testSubscriber);
System.out.println("Test execution time: "+(System.currentTimeMillis()-startTime));
testSubscriber.awaitTerminalEvent();
long execTime = (System.currentTimeMillis()-startTime);
System.out.println("Test execution time: "+execTime);
testSubscriber.assertCompleted();
System.out.println("Errors: "+testSubscriber.getOnErrorEvents());
System.out.println("Results: "+testSubscriber.getOnNextEvents());
testSubscriber.assertNoErrors();
assertTrue("Execution time ran under the 300ms limit: "+execTime,execTime>=300);
assertTrue("Execution time ran over the 350ms limit: "+execTime,execTime<=350);
testSubscriber.assertValueCount(3);
assertThat(testSubscriber.getOnNextEvents(),containsInAnyOrder("hello","wait","world"));
verify(this.mockSchedulerFactory, times(3)).get("asyncThreadPool");
}
Tasks for the above unit test:
protected List<EspTask<String>> getTimedTasks() {
EspTask longTask = new EspTask("helloTask") {
#Override
public Object doCall() throws Exception {
Thread.currentThread().sleep(100);
return "hello";
}
};
EspTask longerTask = new EspTask("waitTask") {
#Override
public Object doCall() throws Exception {
Thread.currentThread().sleep(150);
return "wait";
}
};
EspTask longestTask = new EspTask("worldTask") {
#Override
public Object doCall() throws Exception {
Thread.currentThread().sleep(300);
return "world";
}
};
return Arrays.asList(longTask, longerTask, longestTask);
}

You can use Observable.onErrorReturn(), and return special value (e.g. null), then filter non-special values downstream. Keep in mind that source observable will complete on error. Also depending on use case Observable.onErrorResumeNext()methods can be useful aswell. If you are interested in error notifications, use Observable.materialize(), this will convert items and onError(), onComplete() into Notifications, which then can be filtered by Notification.getKind()
Edit.
All operators mentioned above should be added right after .toObservable().subscribeOn(this.schedulerFactory.get(groupName)); assuming try/catch was absent.

You want to use mergeDelayError:
public <T> Observable<T> observeManagedAsync(String groupName,List<EspTask<T>> tasks) {
return Observable.mergeDelayError(Observable
.from(tasks)
.map(task -> {
try {
return new MyCommand(task.getTaskId(),groupName,task).toObservable().subscribeOn(this.schedulerFactory.get(groupName));
} catch(Exception ex) {
return Observable.error(ex);
}
}));
}
Note that your MyCommand constructor should not throw any exceptions; this allows your code to be written more concisely:
public <T> Observable<T> observeManagedAsync(String groupName,List<EspTask<T>> tasks) {
return from(tasks)
.map(task -> new MyCommand(task.getTaskId(), groupName, task)
.toObservable()
.subscribeOn(this.schedulerFactory.get(groupName)))
.compose(Observable::mergeDelayError);
}
Keep in mind that this will still invoke onError at most once; if you need explicit handling of all errors, use something like an Either<CommandResult, Throwable> as the return type (or handle the errors and return an empty observable).

Use .materialize() to allow all emissions and errors to come through as wrapped notifications then deal with them as you wish:
.flatMap(task -> {
try {
return new MyCommand(task.getTaskId(),groupName,task)
.toObservable()
.subscribeOn(this.schedulerFactory.get(groupName))
.materialize();
} catch(Exception ex) {
return Observable.error(ex).materialize();
}
});

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Creating stream of elements from collection - java

Related

Retry stream in akka on failure of any stage in flow

Equivalent of VertX CompositeFuture in RxJava

Unit test of method with CompletableFuture inside

How to test non-RxJava observables or async code in general?

RxJava: How to get all results AND errors from an Observable

Categories

Resources