Generating a Spring Flux from a sequence of paged network calls - java

I am calling an API, api.magicthegathering.io/v1/cards, using the Spring reactive WebFlux client. The response is a page of 100 cards, along with headers containing links for the "next" and "last" pages, e.g. "last" is api.magicthegathering.io/v1/cards?page=426 (and "next" is simply n+1). I want to generate a Flux<Card> that feeds out each card individually, with a single entry point, e.g. Flux<Card> getAllCards().
I currently have a CardsClient component that returns a Mono<CardPage>. The CardPage has a cards() method on it that returns all cards therein (this is a 1:1 representation of the API's response model). On top of that, I have a CardCatalog component with that getAllCards() method on it.
I have tried using Flux::expand and Flux::generate, which works somewhat, but these implementations have flaws.
Here is a snippet of my current iteration of CardCatalog::getAllCards(). The problem is that the recursive nature of expand is causing redundant calls to client::getNextPage; clearly I am not using the proper method.
#Override
public Flux<Card> getAllCards() {
return client.getFirstPage().flux().expand(client::getNextPage)
.map(Page::cards)
.flatMap(Flux::fromIterable)
.map(mapper::convert)
.cache();
}
Previously I was using generate, but the issue with this is that it would always grab all pages (pretty slow), even if the subscriber only decides to take(20) cards:
#Override
public Flux<Card> getAllCards() {
final Flux<Page> pageFlux =
generate(client::getFirstPage, (response, sink) -> {
final var page = response.block();
sink.next(page);
if (page.next().isPresent()) {
return client.getNextPage(page);
}
sink.complete();
return null;
});
return pageFlux.flatMapIterable(Page::cards).map(mapper::convert);
}
The full code is here: https://github.com/myersadamk/mtg-api-client
Using expand, I added a print to client::getNextPage(). As you can see, the graph is creates makes redundant calls.
Getting https://api.magicthegathering.io/v1/cards?page=1
Getting https://api.magicthegathering.io/v1/cards?page=7
Getting https://api.magicthegathering.io/v1/cards?page=2
Getting https://api.magicthegathering.io/v1/cards?page=8
Getting https://api.magicthegathering.io/v1/cards?page=3
Getting https://api.magicthegathering.io/v1/cards?page=9
Getting https://api.magicthegathering.io/v1/cards?page=4
Getting https://api.magicthegathering.io/v1/cards?page=10
Getting https://api.magicthegathering.io/v1/cards?page=5
Getting https://api.magicthegathering.io/v1/cards?page=11
Getting https://api.magicthegathering.io/v1/cards?page=6
Getting https://api.magicthegathering.io/v1/cards?page=12
Getting https://api.magicthegathering.io/v1/cards?page=7
I want something more like this:
Getting https://api.magicthegathering.io/v1/cards?page=1
Getting https://api.magicthegathering.io/v1/cards?page=2
Getting https://api.magicthegathering.io/v1/cards?page=3
(Final note: it would certainly be faster to parallelize this and call the URIs directly, but it feels a little silly to bypass the next/last mechanic and hard-code the URI's. I may end up doing that, but still want to crack this nut.)

I think this is the sequential non-blocking way of doing this:
public Flux<Card> getAllCards() {
PaginationParams paginationParams = new PaginationParams();
final Flux<Page> pageFlux = Mono
.defer(() -> client.getPage(paginationParams))
.doOnNext(page -> {
if (page.next().isPresent()) {
paginationParams.setPage(page.next().get());
} else {
paginationParams.setPage(null);
}
})
.repeat(() -> paginationParams.getPage() != null);
return pageFlux.flatMapIterable(Page::cards).map(mapper::convert);
}

Alright, I've come up with something that works. I decided to use the page count approach to try parallelization, although it is admittedly not faster since network IO remains the bottleneck. I'll probably go back to the header link crawling and use caching. Roughly, magic numbers and all, this is what it looks like:
#Override
public Flux<Card> getAllCards() {
return client.getPageCount().flatMapMany(pageCount ->
Flux.concat(
range(1, pageCount)
.parallel(pageCount / 6).runOn(Schedulers.parallel())
.map(client::getPage)
).map(Page::cards).flatMap(Flux::fromIterable).map(mapper::convert)
);
}

Related

IBKR TWS API - How to tell when reqOptionsMktData is complete for all strikes?

I am just getting started with IBKR API on Java. I am following the API sample code, specifically the options chain example, to figure out how to get options chains for specific stocks.
The example works well for this, but I have one question - how do I know once ALL data has been loaded? There does not seem to be a way to tell. The sample code is able to tell when each individual row has been loaded, but there doesn't seem to be a way to tell when ALL strikes have been successfully loaded.
I thought that using tickSnapshotEnd() would be beneficial, but it doesn't not seem to work as I would expect it to. I would expect it to be called once for every request that completes. For example, if I do a query for a stock like SOFI on the 2022/03/18 expiry, I see that there are 35 strikes but tickSnapshotEnd() is called 40+ times, with some strikes repeated more than once.
Note that I am doing requests for snapshot data, not live/streaming data
reqOptionsMktData is obviously a method in the sample code you are using. Not sure what particular code your using, so this is a general response.
Firstly you are correct, there is no way to tell via the API, this must be done by the client. Of course it will provide the requestID that was used when the request was made. The client needs to remember what each requestID was for and decide how to process that information when it is received in the callbacks.
This can be done via a dictionary or hashtable, where upon receiving data in the callback then check if the chain is complete.
Message delivery from the API often has unexpected results, receiving extra messages is common and is something that needs to be taken into account by the client. Consider the API stateless, and track everything in the client.
Seems you are referring to Regulatory Snapshots, I would encourage you to look at the cost. It could quite quickly add up to the price of streaming live data. Add to that the 1/sec limit will make a chain take a long time to load. I wouldn't even recommend using snapshots with live data, cancelling the request yourself is trivial and much faster.
Something like (this is obviously incomplete C#, just a starting point)
class OptionData
{
public int ReqId { get; }
public double Strike { get; }
public string Expiry { get; }
public double? Bid { get; set; } = null;
public double? Ask { get; set; } = null;
public bool IsComplete()
{
return Bid != null && Ask != null;
}
public OptionData(int reqId, double strike, ....
{ ...
}
...
class MyData()
{
// Create somewhere to store our data, indexed by reqId.
Dictionary<int, OptionData> optChain = new();
public MyData()
{
// We would want to call reqSecDefOptParams to get a list of strikes etc.
// Choose which part of the chain you want, likely you'll want to
// get the current price of the underlying to decide.
int reqId = 1;
...
optChain.Add(++reqId, new OptionData(reqId,strike,expiry));
...
// Request data for each contract
// Note the 50 msg/sec limit https://interactivebrokers.github.io/tws-api/introduction.html#fifty_messages
// Only 1/sec for Reg snapshot
foreach(OptionData opt in optChain)
{
Contract con = new()
{
Symbol = "SPY",
Currency = "USD"
Exchange = "SMART",
Right = "C",
SecType = "OPT",
Strike = opt.strike,
Expiry = opt.Expiry
};
ibClient.ClientSocket.reqMktData(opt.ReqId, con, "", false, true, new List<TagValue>());
}
}
...
private void Recv_TickPrice(TickPriceMessage msg)
{
if(optChain.ContainsKey(msg.RequestId))
{
if (msg.Field == 2) optChain[msg.RequestId].Ask = msg.Price;
if (msg.Field == 1) optChain[msg.RequestId].Bid = msg.Price;
// You may want other tick types as well
// see https://interactivebrokers.github.io/tws-api/tick_types.html
if(optChain[msg.RequestId].IsComplete())
{
// This wont apply for reg snapshot.
ibClient.ClientSocket.cancelMktData(msg.RequestId);
// You have the data, and have cancelled the request.
// Maybe request more data or update display etc...
// Check if the whole chain is complete
bool complete=true;
foreach(OptionData opt in optChain)
if(!opt.IsComplete()) complete=false;
if(complete)
// do whatever
}
}
}

Java Reactive stream how to map an object when the object being mapped is also needed on the next step of the stream

I am using Java 11 and project Reactor (from Spring). I need to make a http call to a rest api (I can only make it once in the whole flow).
With the response I need to compute two things:
Check if a document exists in the database (mongodb). If it does not exists then create it and return it. Otherwise just return it.
Compute some logic on the response and we are done.
In pseudo code it is something like this:
public void computeData(String id) {
httpClient.getData(id) // Returns a Mono<Data>
.flatMap(data -> getDocument(data.getDocumenId()))
// Issue here is we need access to the data object consumed in the previous flatMap but at the same time we also need the document object we get from the previous flatMap
.flatMap(document -> calculateValue(document, data))
.subscribe();
}
public Mono<Document> getDocument(String id) {
// Check if document exists
// If not create document
return document;
}
public Mono<Value> calculateValue(Document doc, Data data) {
// Do something...
return value;
}
The issue is that calculateValue needs the return value from http.getData but this was already consumed on the first flatMap but we also need the document object we get from the previous flatMap.
I tried to solve this issue using Mono.zip like below:
public void computeData(String id) {
final Mono<Data> dataMono = httpClient.getData(id);
Mono.zip(
new Mono<Mono<Document>>() {
#Override
public void subscribe(CoreSubscriber<? super Mono<Document>> actual) {
final Mono<Document> documentMono = dataMono.flatMap(data -> getDocument(data.getDocumentId()))
actual.onNext(documentMono);
}
},
new Mono<Mono<Value>>() {
#Override
public void subscribe(CoreSubscriber<? super Mono<Value>> actual) {
actual.onNext(dataMono);
}
}
)
.flatMap(objects -> {
final Mono<Document> documentMono = objects.getT1();
final Mono<Data> dataMono = objects.getT2();
return Mono.zip(documentMono, dataMono, (document, data) -> calculateValue(document, data))
})
}
But this is executing the httpClient.getData(id) twice which goes against my constrain of only calling it once. I understand why it is being executed twice (I subscribe to it twice).
Maybe my solution design can be improved somewhere but I do not see where. To me this sounds like a "normal" issue when designing reactive code but I could not find a suitable solution to it so far.
My question is, how can accomplish this flow in a reactive and non blocking way and only making one call to the rest api?
PS; I could add all the logic inside one single map but that would force me to subscribe to one of the Mono inside the map which is not recommended and I want to avoid following this approach.
EDIT regarding #caco3 comment
I need to subscribe inside the map because both getDocument and calculateValue methods return a Mono.
So, if I wanted to put all the logic inside one single map it would be something like:
public void computeData(String id) {
httpClient.getData(id)
.map(data -> getDocument(data).subscribe(s -> calculateValue(s, data)))
.subscribe();
}
You do not have to subscribe inside map, just continue building the reactive chain inside the flatMap:
getData(id) // Mono<Data>
.flatMap(data -> getDocument(data.getDocumentId()) // Mono<Document>
.switchIfEmpty(createDocument(data.getDocumentId())) // Mono<Document>
.flatMap(document -> calculateValue(document, data)) // Mono<Value>
)
.subscribe()
Boiling it down, your problem is analogous to:
Mono.just(1)
.flatMap(original -> process(original))
.flatMap(processed -> I need access to the original value and the processed value!
System.out.println(original); //Won't work
);
private static Mono<String> process(int in) {
return Mono.just(in + " is an integer").delayElement(Duration.ofSeconds(2));
}
(Silly example, I know.)
The problem is that map() (and by extension, flatMap()) are transformations - you get access to the new value, and the old one goes away. So in your second flatMap() call, you've got access to 1 is an integer, but not the original value (1.)
The solution here is to, instead of mapping to the new value, map to some kind of merged result that contains both the original and new values. Reactor provides a built in type for that - a Tuple. So editing our original example, we'd have:
Mono.just(1)
.flatMap(original -> operation(original))
.flatMap(processed -> //Help - I need access to the original value and the processed value!
System.out.println(processed.getT1()); //Original
System.out.println(processed.getT2()); //Processed
///etc.
);
private static Mono<Tuple2<Integer, String>> operation(int in) {
return Mono.just(in + " is an integer").delayElement(Duration.ofSeconds(2))
.map(newValue -> Tuples.of(in, newValue));
}
You can use the same strategy to "hold on" to both document and data - no need for inner subscribes or anything of the sort :-)

How can I preserve clean code flow after going async with CompletableFuture?

I'm running into an issue with CompletableFutures. I have a JAX RS-based REST endpoint that reaches out to an API, and I need to make 3 sequential calls to this API. Current flow looks like this:
FruitBasket fruitBasket = RestGateway.GetFruitBasket("today");
Fruit chosenFruit = chooseFruitFromBasket(fruitBasket);
Boolean success = RestGateway.RemoveFromBasket(chosenFruit);
if (success) {
RestGateway.WhatWasRemoved(chosenFruit.getName());
} else {
throw RuntimeException("Could not remove fruit from basket.");
}
return chosenFruit
Of course, each of the calls to RestGateway.SomeEndpoint() is blocking because it does not use .async() in building my request.
So now let's add .async() and return a CompletableFuture from each of the RestGateway interactions.
My initial thought is to do this:
Fruit chosenFruit;
RestGateway.GetFruitBasket("today")
.thenCompose(fruitBasket -> {
chosenFruit = chooseFruitFromBasket(fruitBasket);
return RestGateway.RemoveFromBasket(chosenFruit);
})
.thenCompose(success -> {
if(success) {
RestGateway.WhatWasRemoved(chosenFruit);
} else {
throw RuntimeException("Could not remove fruit from basket.");
});
return chosenFruit;
Because this seems to guarantee me that execution will happen sequentially, and if a previous stage fails then the rest will fail.
Unfortunately, this example is simple and has many less stages than my actual use-case. It feels like I'm writing lots of nested conditionals inside of stacked .thenCompose() blocks. Is there any way to write this in a more sectioned/compartmentalized way?
What I'm looking for is something like the original code:
FruitBasket fruitBasket = RestGateway.GetFruitBasket("today").get();
Fruit chosenFruit = chooseFruitFromBasket(fruitBasket);
Boolean success = RestGateway.RemoveFromBasket(chosenFruit).get();
if (success) {
RestGateway.WhatWasRemoved(chosenFruit.getName()).get();
} else {
throw RuntimeException("Could not remove fruit from basket.");
}
return chosenFruit
But the calls to .get() are blocking! So there is absolutely no benefit from the asynchronous re-write of the RestGateway!
TL;DR - Is there any way to preserve the original code flow, while capturing the benefits of asynchronous non-blocking web interactions? The only way I see it is to cascade lots of .thenApply() or .thenCompose methods from the CompletableFuture library, but there must be a better way!
I think this problem is solved by await() in JavaScript, but I don't think Java has anything of that sort.
As the folks in the comments section mentioned, there's not much that can be done.
I will however close this off with a link to another stack that proved tremendously helpful, and solved another problem that I didn't explicitly mention. Namely, how to pass values forward from previous stages while dealing with this warning:
variable used in lambda expression should be final or effectively final.
Using values from previously chained thenCompose lambdas in Java 8
I ended up with something like this:
CompletableFuture<Boolean> stepOne(String input) {
return RestGateway.GetFruitBasket("today")
.thenCompose(fruitBasket -> {
Fruit chosenFruit = chooseFruitFromBasket(fruitBasket);
return stepTwo(chosenFruit);
});
}
CompletableFuture<Boolean> stepTwo(Fruit chosenFruit) {
return RestGateway.RemoveFromBasket(chosenFruit)
.thenCompose(success -> {
if (success) {
return stepThree(chosenFruit.getName());
} else {
throw RuntimeException("Could not remove fruit from basket.");
}
});
}
CompletableFuture<Boolean> stepThree(String fruitName) {
return RestGateway.WhatWasRemoved(fruitName);
}
Assuming that RestGateway.WhatWasRemoved() returns a Boolean.

Method reference and boolean

So I have been having a go with using the method reference in Java 8 (Object::Method). What I am attempting to do, which I have done before but have forgotten (last time I used this method reference was about 4 months ago), is find the amount of players that != online using the Method Reference.
public static Set<Friend> getOnlineFriends(UUID playerUUID)
{
Set<Friend> friends = new HashSet<>(Arrays.asList(ZMFriends.getFriends(playerUUID)));
return friends.stream().filter(Friend::isOnline).collect(Collectors.toSet());
}
public static Set<Friend> getOfflineFriends(UUID playerUUID)
{
Set<Friend> friends = new HashSet<>(Arrays.asList(ZMFriends.getFriends(playerUUID)));
return friends.stream().filter(Friend::isOnline).collect(Collectors.toSet());
As you can see I managed to so it when the player (friend) is online but I cannot figure out how to filter though the Set and collect the offline players. I'm missing something obvious, but what is it?!?!
Thanks,
Duke.
In you code
public static Set<Friend> getOnlineFriends(UUID playerUUID)
{
Set<Friend> friends = new HashSet<>(Arrays.asList(ZMFriends.getFriends(playerUUID)));
return friends.stream().filter(Friend::isOnline).collect(Collectors.toSet());
}
you are creating a List view to the array returned by ZMFriends.getFriends(playerUUID), copy its contents to a HashSet, just to call stream() on it.
That’s a waste of resources, as the source type is irrelevant to the subsequent stream operation. You don’t need to have a Set source to get a Set result. So you can implement your operation simply as
public static Set<Friend> getOnlineFriends(UUID playerUUID)
{
return Arrays.stream(ZMFriends.getFriends(playerUUID))
.filter(Friend::isOnline).collect(Collectors.toSet());
}
Further, you should consider whether you really need both, getOnlineFriends and getOfflineFriends in your actual implementation. Creating utility methods in advance, just because you might need them, rarely pays off. See also “You aren’t gonna need it”.
But if you really need both operations, it’s still an unnecessary code duplication. Just consider:
public static Set<Friend> getFriends(UUID playerUUID, boolean online)
{
return Arrays.stream(ZMFriends.getFriends(playerUUID))
.filter(f -> f.isOnline()==online).collect(Collectors.toSet());
}
solving both tasks. It still wastes resource, if the application really needs both Sets, as the application still has to perform the same operation twice to get both Sets. Consider:
public static Map<Boolean,Set<Friend>> getOnlineFriends(UUID playerUUID)
{
return Arrays.stream(ZMFriends.getFriends(playerUUID))
.collect(Collectors.partitioningBy(Friend::isOnline, Collectors.toSet()));
}
This provides you both Sets at once, the online friends being associated to true, the offline friends being associated to false.
There are 2 ways I can think of:
friends.stream().filter(i -> !i.isOnline()).collect(Collectors.toSet());
But I guess that's not what you want, since it's not using a method reference. So maybe something like this:
public static <T> Predicate<T> negation(Predicate<T> predicate) {
return predicate.negate();
}
...
friends.stream().filter(negation(Friend::isOnline)).collect(Collectors.toSet());

RxJava - Observable chain with concatWith and map

I'm having troubles properly implementing the following scenario using RxJava (v1.2.1):
I need to handle a request for some data object. I have a meta-data copy of this object which I can return immediately, while making an API call to a remote server to retrieve the whole object data. When I receive the data from the API call I need to process the data before emitting it.
My solution currently looks like this:
return Observable.just(localDataCall())
.concatWith(externalAPICall().map(new DataProcessFunction()));
The first Observable, localDataCall(), should emit the local data, which is then concatenated with the remote API call, externalAPICall(), mapped to the DataProcessFunction.
This solution works but it has a behavior that is not clear to me. When the local data call returns its value, this value goes through the DataProcessFunction even though it's not connected to the first call.
Any idea why this is happening? Is there a better implementation for my use case?
I believe that the issue lies in some part of your code that has not been provided. The data returned from localDataCall() is independent of the new DataProcessFunction() object, unless somewhere within localDataCall you use another DataProcessFunction.
To prove this to you I will create a small example using io.reactivex:rxjava:1.2.1:
public static void main(String[] args){
Observable.just(foo())
.concatWith(bar().map(new IntMapper()))
.subscribe(System.out::println);
}
static int foo() {
System.out.println("foo");
return 0;
}
static Observable<Integer> bar() {
System.out.println("bar");
return Observable.just(1, 2);
}
static class IntMapper implements Func1<Integer, Integer>
{
#Override
public Integer call(Integer integer)
{
System.out.println("IntMapper " + integer);
return integer + 5;
}
}
This prints to the console:
foo
bar
0
IntMapper 1
6
IntMapper 2
7
As can be seen, the value 0 created in foo never gets processed by IntMapper; IntMapper#call is only called twice for the values created in bar. The same can be said for the value created by localDataCall. It will not be mapped by the DataProcessFunction object passed to your map call. Just like bar and IntMapper, only values returned from externalAPICall will be processed by DataProcessFunction.
.concatWith() concatenates all items emitted by one observable with all items emitted by the other observable, so no wonder that .map() is being called twice.
But I do not understand why do you need localDataCall() at all in this scenario. Perhaps you might want to use .switchIfEmpty() or .switchOnNext() instead.

Categories