I have the following scheme when using rx entities:
Observerable1
|
------------------
| | |
S1(*) S2(*) S3 (onCompleted() starts another observable)
|
Observable2
|
S4(*)
I need to know when all of subscribers with * finish their work (S1,S2,S4). As they can execute in different threads I need some sync mechanism or there is some out of the box solution from rx-java ?
Some sample code to illustrate current design:
#Component
public class BatchReader {
#Autowired List<Subscriber<List<Integer>>> subscribers;
public void start() {
ConnectableObservable<List<Integer>> observable1 = createObservable();
subscribers.forEach(observable1::subscribe);
observable1.connect();
}
private ConnectableObservable<List<Integer>> createObservable() {
return Observable.create((Subscriber<? super List<Integer>> subscriber) -> {
try {
subscriber.onStart();
while (someCondition) {
List<Integer> numbers = ...;
subscriber.onNext(numbers);
}
subscriber.onCompleted();
} catch (Exception ex) {
subscriber.onError(ex);
}
}).observeOn(Schedulers.newThread()).publish();
}
}
In S3 I have following logic:
#Component
public class S3 extends Subscriber<List<Integer>> {
#Autowired AnotherBatchReader anotherBatchReader;
....
#Override
public void onCompleted() {
anotherBatchReader.start();
}
...
}
And S4 subscribes in AnotherBatchReader:
#Component
public class AnotherBatchReader {
#Autowired S4<List<Foo>> subscriber4;
public void start() {
Observable<List<Foo>> observable2 = createObservable();
observable2.subscribe(subscriber4);
}
private Observable<List<Foo>> createObservable() {
return Observable.create(subscriber -> {
try {
subscriber.onStart();
while (someConditionBar) {
List<Foo> foo = ...;
subscriber.onNext(foo);
}
subscriber.onCompleted();
}
} catch (RuntimeException ex) {
subscriber.onError(ex);
}
});
}
}
So is there a way to be properly notified when all subscribers I'm interested in done their work ? Is rx supports it out the box ? Or maybe there is a better design that will support it ?
EDIT:
I have separate subscribers, because each one have different behaviour. At the end subscribers with * (S1,S2,S3) will write their data to xml files. But
S1 receives data in onNext(), doing some work, and writes results directly to files
S2 receives data in onNext(), doing some work, accumulates results in field and then writes it with onCompleted
S3 receives data in onNext, doing some work, writes results to DB and after onCompleted is called starts another observable which begin to get data from db and push it to S4
S4 receives data in onNext(), doing some work and writes to files
The reason why I need to write data to DB in S3 is because the results that is generated from received data in onNext() has to be unique, but as I'm getting data in batches from Observable1 I can't guaranty this uniqueness, so DB take care of it.
And of course in S3 I can't just do the same as in S2 (accumulate all results in memory), because the multiplication of results that exists in S3 is significant comparing to S2.
Thanks for the clarifications. It seems to me that judicious application of existing Operators will minimize your code. Now, I don't have all the details, but what you're doing feels a lot like this:
Observable<T> observable1 = ...
.share();
Observable<?> S1 = S1(observable1);
Observable<?> S2 = S2(observable1);
Observable<?> S3 = S3(observable1);
Observable<?> S4 = defer(() -> readFromDatabase()).compose(S4::generate);
Observable.merge(S1,S2,S3.ignoreElements().switchIfEmpty(S4))
.ignoreElements()
.onComplete(...)
.subscribe();
Of course, the details will be different depending on whether the original observable is hot or cold, and the details of S[1-4].
Also, don't try to drive all Subscribers yourself, let the framework do that for you, and you will get so much more out of it - f.e.:
S4 = Observable.create(SyncOnSubscribe.generateStateless(
observer -> observer.onNext(<get List<Foo>>)
))
.takeWhile(list -> someConditionBar);
Edit: this is a case of the XY problem - we've all gone through it...
Related
I have a DTO class like this :
public class User {
#Field("id")
private String id;
private String userName;
private String emailId;
}
I have to provide an update and delete feature through API.
I have written the following code to delete the record:
public Mono<String> userData(User body) {
repo.removeUserDetails(userObj).subscribe();
return Mono.just("Remove Successful");
}
RemoveUserDetails method is something like this :
public Mono<User> removeUserDetails(User userObj) {
return findByUsername(userObj.getUsername())
.flatMap(existingUser -> {
// logic to delete the data from database which working as expected
}).switchIfEmpty(
Mono.defer(() -> {
return Mono.error(new Exception("User Name " + userObj.getUsername() + " doesn't exist."));
})
);
}
The problem with this code is even if the user is not existing, it is not showing the Mono error I'm returning. In every case, this always returns "Remove Successful".
How can I change my service layer method so that it can return whatever is received by the repo method? I'm new to Reactor code, so unable to figure out how to write it.
Whenever you call subscribe, consider it an immediate red flag. Subscription is something that should be handled by the framework you're using (Webflux in this case.)
If you subscribe yourself, such as in this example:
public Mono<String> userData(User body) {
repo.removeUserDetails(userObj).subscribe();
return Mono.just("Remove Successful");
}
...then you've essentially created a "fire and forget" type subscription, where you have no way of knowing if that publisher completed successfully, if it caused an error, how long it took to complete, whether it completed at all, or whether it emitted an element. So in this case, you're saying "send a request to remove user details, forget you sent it, and then before waiting for any kind of result, always return 'Remove successful'." This is almost never what you want.
You could use something like:
public Mono<String> userData(User body) {
return repo.removeUserDetails(userObj)
.then(Mono.just("Remove Successful"));
}
...which is much better as it includes everything as part of the reactive chain. In this case, you'll either get an error signal, or you'll get "Remove Successful".
However, chances are you don't need that String to be returned at all - you just need to know if it's successful or not. The standard way of doing that (I just need to know that it's completed successfully or not, I don't need it to return a value) is to use Mono<Void> as the return type and then(), something like:
public Mono<Void> userData(User body) {
return repo.removeUserDetails(userObj).then();
}
...which will give you a standard completion if the deletion was successful, and an error signal otherwise.
A common pattern you find when using reactive java code is handling nulls when collecting a list.
The following code is a simple example showing how to handle nulls returned by a Location by wrapping getLocation in a Mono.defer then handling a null using onErrorReturn.
The test code
List<String> items = inventory.testList().block();
items.forEach(System.out::println);
USA
Not Found
SPAIN
private List<Integer> clusters;
private List<Mono<Location>> locations;
private List<String> countryCodes;
public Mono<List<String>> testList() {
clusters = Arrays.asList(0, 1, 2);
locations = Arrays.asList(Mono.just(new Location(0)), null, Mono.just(new Location(2)));
countryCodes = Arrays.asList("USA", "FRANCE", "SPAIN");
return Flux.fromIterable(clusters)
.flatMap(cluster -> getLocation(cluster))
.collectList();
}
public Mono<String> getLocation(int clusterID) {
return Mono.defer(() -> locations.get(clusterID))
.flatMap(location -> Mono.just(location.id))
.flatMap(id -> Mono.just(countryCodes.get(id)))
.onErrorReturn(Exception.class, "Not Found");
}
Currently we have two separate API endpoints.
public Mono<ServerResponse> get(ServerRequest request) {
Sinks.StandaloneMonoSink<String> sink = Sinks.promise();
sinkMap.putIfAbsent(randomID, sink);
return sink.asMono().timeout(Duration.ofSeconds(60))
.flatMap(val -> ServerResponse.ok().body(BodyInserters.fromValue(val)))
}
public Mono<ServerResponse> push(ServerRequest request) {
Sinks.StandaloneMonoSink<String> sink = sinkMap.remove(randomID);
if (sink == null) {
return ServerResponse.notFound().build(); }
else {
return request.bodyToMono(String.class)
.flatMap(data -> {
sink.success(data);
return ServerResponse().ok().build();
}
}
}
The intention is for client to do a get request and to keep the connection open for 1 min or so waiting for some data to arrive. And then on push request data will be published to the open connection for get and the connection will close upon receipt of first element.
The issue with current approach is that the data may be emitted after get request times out and subscription is canceled, thus losing the data. Is it possible if no subscribers then if I try to emit item throw error or perform another action with data (from the push request side).
Thanks.
I had to read this question multiple times to understand what you are looking for!
I tried something like this and it seems to work.
private final DirectProcessor<String> processor = DirectProcessor.create();
private final FluxSink<String> sink = processor.sink();
// processor has a method to check if there are any active subscribers.
// if it is true, lets push data, otherwise we can throw exception / store it somehwere
#GetMapping("/push/{val}")
public boolean push(#PathVariable String val){
boolean status = processor.hasDownstreams();
if(status)
sink.next(val);
return status;
}
#GetMapping("/get")
public Mono<String> get(){
return processor
.next()
.timeout(Duration.ofMinutes(1));
}
Question:
Will you be running only one instance of this application? What will happen when you run multiple instances of this application?
For ex: User A might push the data to app-instance-1 and User B might subscribe to app-instance-2. In this case, User B might not get data. In this case, you might need something like Redis to store this data and share among all the instances for pub/sub behavior.
I've got a code that looks similar to this:
List<String> ids = expensiveMethod();
List<String> filteredIds = cheapFilterMethod(ids);
if (!filteredIds.isEmpty()) {
List<SomeEntity> fullEntities = expensiveDatabaseCall(filteredIds);
List<SomeEntity> filteredFullEntities = anotherCheapFilterFunction(fullEntities);
if (!filteredFullEntities.isEmpty()) {
List<AnotherEntity> finalResults = stupidlyExpensiveDatabaseCall(filteredFullEntities);
relativelyCheapMethod(finalResults);
}
}
It's basically a waterfall of a couple expensive methods that, on their own, all either grab something from a database or filter previous database results. This is due to stupidlyExpensiveDatabaseCall, which needs as few leftover entities as possible, hence the exhaustive filtering.
My problem is that the other functions aren't all quite cheap either and thus they block the thread for a couple of seconds while stupidlyExpensiveDatabaseCall is waiting and doing nothing until it gets the whole batch at once.
I'd like to process the results from each method as they come in. I know I could write a thread for each individual method and have some concurrent queue working between them, but that's a load of boilerplate that I'd like to avoid. Is there a more elegant solution?
There's a post about different ways to parallelize, not only the parallelStream() way, but also that consecutive steps run in parallel the way you described, linked by queues. RxJava may suit your need in this respect. Its a more complete variety of the rather fragmentary reactive streams API in java9. But I think, you're only really there if you use a reactive db api along with it.
That's the RxJava way:
public class FlowStream {
#Test
public void flowStream() {
int items = 10;
print("\nflow");
Flowable.range(0, items)
.map(this::expensiveCall)
.map(this::expensiveCall)
.forEach(i -> print("flowed %d", i));
print("\nparallel flow");
Flowable.range(0, items)
.flatMap(v ->
Flowable.just(v)
.subscribeOn(Schedulers.computation())
.map(this::expensiveCall)
)
.flatMap(v ->
Flowable.just(v)
.subscribeOn(Schedulers.computation())
.map(this::expensiveCall)
).forEach(i -> print("flowed parallel %d", i));
await(5000);
}
private Integer expensiveCall(Integer i) {
print("making %d more expensive", i);
await(Math.round(10f / (Math.abs(i) + 1)) * 50);
return i;
}
private void await(int i) {
try {
Thread.sleep(i);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private void print(String pattern, Object... values) {
System.out.println(String.format(pattern, values));
}
}
The maven repo:
<!-- https://mvnrepository.com/artifact/io.reactivex.rxjava2/rxjava -->
<dependency>
<groupId>io.reactivex.rxjava2</groupId>
<artifactId>rxjava</artifactId>
<version>2.2.13</version>
</dependency>
You could use CompleteableFuture to divide up each non-CPU-bound step. The usage is similar to the javascript promise API.
public void loadEntities() {
CompletableFuture.supplyAsync(this::expensiveMethod, Executors.newCachedThreadPool())
.thenApply(this::cheapFilterMethod)
.thenApplyAsync(this::expensiveDatabaseCall)
.thenApply(this::anotherCheapFilterFunction)
.thenApplyAsync(this::stupidlyExpensiveDatabaseCall)
.thenAccept(this::relativelyCheapMethod);
}
private List<String> expensiveMethod() { ... }
private List<String> cheapFilterMethod(List<String> ids) { ... }
private List<SomeEntity> expensiveDatabaseCall(List<String> ids) { ... }
private List<SomeEntity> anotherCheapFilterFunction(List<SomeEntity> entities) { ... }
private List<AnotherEntity> stupidlyExpensiveDatabaseCall(List<SomeEntity> entities) { ... }
private void relativelyCheapMethod(List<AnotherEntity> entities) { ... }
You can also pass your own thread pool at each step if you'd like to have more control over execution.
You can use Java 8 Stream API. It's impossible to process a DB query "as they come in" because the result set will come in all at once. You'd have to change your method to handle single entities.
expensiveMethod().parallelStream()
.filter(this::cheapFilterMethod) // Returns Boolean
.map(this::expensiveDatabaseCallSingle) // Returns SomeEntity
.filter(this::anotherCheapFilterFunction) // Returns boolean for filtered entities
.map(this::stupidlyExpensiveDatabaseCallSingle) // Returns AnotherEntity
.forEach(this::relativelyCheapMethod); // void method
I would also suggest using an ExecutorService to manage your threads so you don't consume all resources just creating a bunch of threads:
ExecutorService threadPool = Executors.newFixedThreadPool(8);
threadPool.submit(this::methodForParallelStream);
I am trying to make a reactive pipeline using Java and project-reactor where the use-case is that the application generates flow status(INIT, PROCESSING, SAVED, DONE) at different levels. The status must be emitted asynchronously to a flux which is needed to be handled independently and separately from the main flow. I came across this link:
Spring WebFlux (Flux): how to publish dynamically
My sample flow is something like this:
public class StatusEmitterImpl implements StatusEmitter {
private final FluxProcessor<String, String> processor;
private final FluxSink<String> sink;
public StatusEmitterImpl() {
this.processor = DirectProcessor.<String>create().serialize();
this.sink = processor.sink();
}
#Override
public Flux<String> publisher() {
return this.processor.map(x -> x);
}
public void publishStatus(String status) {
sink.next(status);
}
}
public class Try {
public static void main(String[] args) {
StatusEmitterImpl statusEmitter = new StatusEmitterImpl();
Flux.fromIterable(Arrays.asList("INIT", "DONE")).subscribe(x ->
statusEmitter.publishStatus(x));
statusEmitter.publisher().subscribe(x -> System.out.println(x));
}
}
The problem is that nothing is getting printed on the console. I cannot understand what I am missing.
DirectProcessor passes values to its registered Subscribers directly, without caching the signals. If there is no Subscriber, then the value is "forgotten". If a Subscriber comes in late, then it will only receive signals emitted after it subscribed.
That's what is happening here: because fromIterable works on an in-memory collection, it has time to push all values to the DirectProcessor, which by that time doesn't have a registered Subscriber yet.
If you invert the last two lines you should see something.
The DirectProcessor is hot publishers and don't buffer element,so you should produce element after its subscribe.like is
public static void main(String[] args) {
StatusEmitterImpl statusEmitter = new StatusEmitterImpl();
statusEmitter.publisherA().subscribe(x -> System.out.println(x));
Flux.fromIterable(Arrays.asList("INIT", "DONE")).subscribe(x -> statusEmitter.publishStatus(x));
}
, or use EmitterProcessor,UnicastProcessor instand of DirectProcessor.
I have provided a callback to a third party library that calls the provided method at various times providing me with an object that has changed. I am then carrying out an async web request to get further details and set them on that object, below is a made up similar example;
public void update(Person person) {
if (person.getId() == -1) {
mService.getPersonDetails()
.flatMap(..)
.skip(..)
.subscribe(personResult -> person.setId(personResult.getId()))
}
}
The update is called quite a few times and should only executes the query if the object has no ID. The problem is that at least two requests get sent off as the first query has not yet completed.
How can I synchronise this method call so that only one request is sent for each Object that get passed via the callback? I only want to block requests for that exact Object, so if the update() is supplying different objects it would be ok for new requests to be sent out.
The solution provided by Adam S looks good but soon or later will cause OOM problems. It is due to distinct operator which has to store all unique values.
Other option that comes to my mind is usage of ConcurrentMap to store processed persons and doOnTerminate to clean it.
private Map<Person, Boolean> map = new ConcurrentHashMap<>();
public void update(final Person person) {
if (person.getId() == -1) {
if(map.putIfAbsent(person, true)==null){
mService.getPersonDetails()
.flatMap(..)
.skip(..)
.doOnTerminate(()->map.remove(person))
.subscribe(personResult -> person.setId(personResult.getId()))
}
}
}
You can filter the inputs to your observable using the distinct operator. Here's a general idea of how you could do that using a PublishSubject (JavaDoc) (note this is written from memory, I haven't tested this):
private PublishSubject<Person> personSubject;
public void update(Person person) {
if (personSubject == null) {
personSubject = new PublishSubject();
personSubject
.filter(person -> person.getId() == -1)
.distinct()
.flatMap(person -> mService.getPersonDetails())
.skip(..)
.subscribe(personResult -> person.setId(personResult.getId()));
}
personSubject.onNext(person);
}
You will, of course, have to either implement the equals method on your Person class (which, as Marek points out, will result in all objects passed in being cached in memory) or implement the distinct(Func) variant.
That method takes a 'key selector' function used to differentiate between objects. If your objects are fairly heavy and you're concerned about memory (if you're on Android, for example) this might be a better path. Something like this:
.distinct(new Func1<Person, Integer>() {
#Override
public Integer call(Person person) {
return person.hashCode();
}
})
Well you can just simply synchronize it.
public synchronized void update(Person person)