Combination of RxJava and RxAndroid? - java

My Scenario is very similar to this Image:
Flow of the app will be like this:
View needs to get updated.
Create an observable using RxAndroid to fetch the data from cache / local file.
update the view.
Make another network call using Retrofit and RxJava to update the view again with new data coming from the web services.
Update the local file with the new data.
So, I am updating the view twice(One from local file and just after that through webservices)
How can I achieve the result using RxJava and RxAndroid? What I was thinking is
Create an observable1 to get the data from local file system.
In the onNext method of observable1 I can create another observable2.
observable2.onNext() I can update the local file.
Now How will I update the view with the updated data (loaded in the file)?
What would be the good approach?

I wrote a blog post about exactly this same scenario. I used the merge operator (as suggested by sockeqwe) to address your points '2' and '4' in parallel, and doOnNext to address '5':
// NetworkRepository.java
public Observable<Data> getData() {
// implementation
}
// DiskRepository.java
public Observable<Data> getData() {
// implementation
}
// DiskRepository.java
public void saveData(Data data) {
// implementation
}
// DomainService.java
public Observable<Data> getMergedData() {
return Observable.merge(
diskRepository.getData().subscribeOn(Schedulers.io()),
networkRepository.getData()
.doOnNext(new Action1<Data>() {
#Override
public void call(Data data) {
diskRepository.saveData(data); // <-- save to cache
}
}).subscribeOn(Schedulers.io())
);
}
In my blog post I additionally used filter and Timestamp to skip updating the UI if the data is the same or if cache is empty (you didn't specify this but you will likely run into this issue as well).
Link to the post: https://medium.com/#murki/chaining-multiple-sources-with-rxjava-20eb6850e5d9

Looks like concat or merge operator is what you are looking for
The difference between them is:
Merge may interleave the items emitted by the merged Observables (a similar operator, Concat, does not interleave items, but emits all of each source Observable’s items in turn before beginning to emit items from the next source Observable).
Observable retroFitBackendObservable = retrofitBackend.getFoo().doOnNext(save_it_into_local_file );
Observable mergedObservable = Observable.merge(cacheObservable, retroFitBackendObservable);
mergedObservable.subscribe( ... );
Then subscribe for mergedObservable and update your view in onNext()

Related

Java Reactive. How to wait for all data in flux and then process them

I'm getting data from mongo reactive repository and updating it. Then I have to collect all data in one collection and path it to another service to get more info. Then I should map them in one Flux. My code is:
Flux<Views> views = someRepository.findAllByUsersIn(userId).doOnNext(v -> {
v.setInterlocutor(v.getUsers().stream().filter(u -> !userId.equals(u)).findFirst().orElse(null));
});
return Flux.zip(views.map(view -> conversionService.convert(view, ResponseViewDto.class)), getUserInfo(views).flux())
.flatMap(fZip -> {
ResponseViewDto dto = fZip.getT1();
dto.setInterlocutor(fZip.getT2().get(dto.getInter()));
return Flux.just(dto);
});
getUserInfo does collecting usersId and sends them to another service and returns expanded info.
I found that getting from DB calls 2 times and I can understand why, but is there any solution to do it once and still be not blocking.
Thanks to Adhika Setya Pramudita for help. The way to do what I need is just to use cache() method

Applying a Single to an ObservableSource and not over-reading

I'm pretty new to RX in general, and rxjava in particular, pardon mistakes.
This operation depends on a two async operations.
The first uses a filter function to attempt to get a single entity from a list returned by an async Observable.
The second is an async operation that communicates with a device and produces an Observable of status updates.
I want to take the Single that is created from the filter function, apply that to pairReader(...), and subscribe to its Observable for updates. I can get this to work as shown, but only if I include the take(1) commented, otherwise I get an exception because the chain tries to pull another value from the Single.
Observable<DeviceCredential> getCredentials() {
return deviceCredentialService()
.getCredentials()
.flatMapIterable(event -> event.getData());
}
Single<Organization> getOrgFromCreds(String orgid) {
return getCredentials()
// A device is logically constrained to only have a single cred per org
.map(DeviceCredential::getOrganization)
.filter(org -> org.getId().equals(orgid))
.take(1) // Without this I get an exception
.singleOrError();
}
Function<Organization, Observable<Reader.EnrollmentState>> pairReader(String name) {
return org -> readerService().pair(name, org);
}
getOrgFromCreds(orgid)
.flatMapObservable(pairReader(readerid))
.subscribe(state -> {
switch(state) {
case BEGUN:
LOG.d(TAG, "Pairing begun");
break;
case PAIRED:
LOG.d(TAG, "Pairing success");
callback.success();
break;
case NOTIFIED_SERVER:
LOG.d(TAG, "Pairing server notified");
break;
}},
error -> {
Crashlytics.logException(error);
callback.error(error.getLocalizedMessage());
});
If the source stream emits more than one item, singleOrError() is supposed to emit an error. Doc
For your case, use either first() or firstOrError() instead.
Single<Organization> getOrgFromCreds(String orgid) {
return getCredentials()
.map(DeviceCredential::getOrganization)
.filter(org -> org.getId().equals(orgid))
.firstOrError();
}
If I got you right, you need to make some action using previously retrieved async data. So, you could use .zip() operator.
Here is an example:
Observable.zip(
getOrgFromCreds().toObservable(),
getCredentials(),
(first, second) -> /*create output object here*/
)
.subscribe(
(n) -> /*do onNext*/,
(e) -> /*do onError*/
);
Note, that .zip() operator will wait for both emission from two streams, and then it will create outer emission using the function you provided in "create output object here".
If you don't want to wait for both items - you can use .combineLatest().
The problem here turned out to be that the API was designed in an odd way (and unfortunately has extremely poor documentation). I couldn't figure out why I was getting duplicates, and thought I was using flatMapIterable incorrectly.
What the deviceCredentialService.getCredentials() call actually creates is an observable that emits DataEvent objects which are simple wrappers over a list of results, and with a flag of where the results came from.
The API designer wanted to allow the user to use locally cached data to fill the UI immediately while a longer request to a REST API executes. The DataEvent.from property is an enum that flags the source, either from the local device cache or from the remote API call.
The way I solved this was to simply ignore the results coming from local cache and only emit results from the API:
Observable<DeviceCredential> getCredentials() {
return deviceCredentialService()
.getCredentials()
// Only get creds from network
.filter(e -> e.getFrom() == SyncedDataSourceObservableFactory.From.SOURCE)
.flatMapIterable(e -> e.getData());
}
Single<Organization> getOrgFromCreds(String orgid) {
return getCredentials()
// A device is logically constrained to only have a single cred per org
.map(DeviceCredential::getOrganization)
.filter(org -> org.getId().equals(orgid))
.singleOrError();
}
The plan then is to use memoization to cache entities in a way that gives the implementing app access to cache invalidation. Since the provided interface doesn't allow squelching the API call, there is no way to work only with cache if the app feels its is fresh.

Flux endpoint from infinite java stream

I have an issue while processing a flux that is built from a Stream.generate construct.
The Java stream is fetching some data from a remote source, hence I implemented a custom supplier that has the data fetching logic embedded, and then used it to populate the Stream.
Stream.generate(new SearchSupplier(...))
My idea is to detect an empty list and use the Java9 feature of takeWhile ->
Stream.generate(new SearchSupplier(this, queryBody))
.takeWhile(either -> either.isRight() && either.get().nonEmpty())
(using Vavr's Either construct)
The repositoroy layer flux will then do:
return Flux.fromStream (
this.searchStream(...) //this is where the stream gets generated
)
.map(Either::get)
.flatMap(Flux::fromIterable);
The "service" layer is composed of some transformation steps on the flux, but the method signature is something like Flux<JsonObject> search(...).
Finally, the controller layer has a GetMapping:
#GetMapping(produces = "application/stream+json")
public Flux search(...) {
return searchService.search(...) //this is the Flux<JsonObject> parth
.subscriberContext(...) //stuff I need available during processing
.doOnComplete(() -> log.debug("DONE"));
}
My problem is that the Flux seems to never terminate.
Doing a call from Postman for example just shot the 'Loading...' part in the response section. When I terminate the process from my IDE the results are then flushed to postman and I see what I'm expecting. Also the doOnComplete lambda never gets called
What I noticed is that if I change the source of a Flux:
Flux.fromArray(...) //harcoded array of lists of jsons
the doOnComplete lambda is called and also the http connection closes, and results are displayed in postman.
Any idea of what might be the issue?
Thanks.
You could create the Flux directly using code that looks like this. Note that I'm adding some assumed methods which you would need to implement based on your how your SearchSupplier works:
Flux<SearchResultType> flux = Flux.generate(
() -> new SearchSupplier(this, queryBody),
(supplier, sink) -> {
SearchResultType current = supplier.next();
if (isNotLast(current)) {
sink.next(current);
} else {
sink.complete();
}
return supplier;
},
supplier -> anyCleanupOperations(supplier)
);

Can Spark Streaming do Anything Other Than Word Count?

I'm trying to get to grips with Spark Streaming but I'm having difficulty. Despite reading the documentation and analysing the examples I wish to do something more than a word count on a text file/stream/Kafka queue which is the only thing we're allowed to understand from the docs.
I wish to listen to an incoming Kafka message stream, group messages by key and then process them. The code below is a simplified version of the process; get the stream of messages from Kafka, reduce by key to group messages by message key then to process them.
JavaPairDStream<String, byte[]> groupByKeyList = kafkaStream.reduceByKey((bytes, bytes2) -> bytes);
groupByKeyList.foreachRDD(rdd -> {
List<MyThing> myThingsList = new ArrayList<>();
MyCalculationCode myCalc = new MyCalculationCode();
rdd.foreachPartition(partition -> {
while (partition.hasNext()) {
Tuple2<String, byte[]> keyAndMessage = partition.next();
MyThing aSingleMyThing = MyThing.parseFrom(keyAndMessage._2); //parse from protobuffer format
myThingsList.add(aSingleMyThing);
}
});
List<MyResult> results = myCalc.doTheStuff(myThingsList);
//other code here to write results to file
});
When debugging I see that in the while (partition.hasNext()) the myThingsList has a different memory address than the declared List<MyThing> myThingsList in the outer forEachRDD.
When List<MyResult> results = myCalc.doTheStuff(myThingsList); is called there are no results because the myThingsList is a different instance of the List.
I'd like a solution to this problem but would prefer a reference to documentation to help me understand why this is not working (as anticipated) and how I can solve it for myself (I don't mean a link to the single page of Spark documentation but also section/paragraph or preferably still, a link to 'JavaDoc' that does not provide Scala examples with non-functional commented code).
The reason you're seeing different list addresses is because Spark doesn't execute foreachPartition locally on the driver, it has to serialize the function and send it over the Executor handling the processing of the partition. You have to remember that although working with the code feels like everything runs in a single location, the calculation is actually distributed.
The first problem I see with you code has to do with your reduceByKey which takes two byte arrays and returns the first, is that really what you want to do? That means you're effectively dropping parts of the data, perhaps you're looking for combineByKey which will allow you to return a JavaPairDStream<String, List<byte[]>.
Regarding parsing of your protobuf, looks to me like you don't want foreachRDD, you need an additional map to parse the data:
kafkaStream
.combineByKey(/* implement logic */)
.flatMap(x -> x._2)
.map(proto -> MyThing.parseFrom(proto))
.map(myThing -> myCalc.doStuff(myThing))
.foreachRDD(/* After all the processing, do stuff with result */)

Spark on Java - What is the right way to have a static object on all workers

I need to use a non-serialisable 3rd party class in my functions on all executors in Spark, for example:
JavaRDD<String> resRdd = origRdd
.flatMap(new FlatMapFunction<String, String>() {
#Override
public Iterable<String> call(String t) throws Exception {
//A DynamoDB mapper I don't want to initialise every time
DynamoDBMapper mapper = new DynamoDBMapper(new AmazonDynamoDBClient(credentials));
Set<String> userFav = mapper.load(userDataDocument.class, userId).getFav();
return userFav;
}
});
I would like to have a static DynamoDBMapper mapper which I initialise once for every executor and be able to use it over and over again.
Since it's not a serialisable, I can't initialise it once in the drive and broadcast it.
note: this is an answer here (What is the right way to have a static object on all workers) but it's only for Scala.
You can use mapPartition or foreachPartition. Here is a snippet taken from Learning Spark
By using partition- based operations, we can share a connection pool
to this database to avoid setting up many connections, and reuse our
JSON parser. As Examples 6-10 through 6-12 show, we use the
mapPartitions() function, which gives us an iterator of the elements
in each partition of the input RDD and expects us to return an
iterator of our results.
This allows us to initialize one connection per executor, then iterate over the elements in the partition however you would like. This is very useful for saving data into some external database or for expensive reusable object creation.
Here is a simple scala example taken from the linked book. This can be translated to java if needed. Just here to show a simple use case of mapPartition and foreachPartition.
ipAddressRequestCount.foreachRDD { rdd => rdd.foreachPartition { partition =>
// Open connection to storage system (e.g. a database connection)
partition.foreach { item =>
// Use connection to push item to system
}
// Close connection
}
}
Here is a link to a java example.

Categories