Java WebClient - collects all objects from a paginated website

Java WebClient - collects all objects from a paginated website - java

I want to iterate through all pages of a given url and collect JSON objects. With this code I'm getting list of 10 objects:
List<EzamowieniaDto> ezam = WebClient
.create("https://ezamowienia.gov.pl/mo-board/api/v1/Board/Search?noticeType=ContractNotice&isTenderAmountBelowEU=true" +
"&publicationDateFrom=2022-03-16T00:00:00.000Z&orderType=Delivery&SortingColumnName=PublicationDate&SortingDirection=DESC" +
"&PageNumber=1")
.get()
.retrieve()
.bodyToMono(new ParameterizedTypeReference<List<EzamowieniaDto>>(){})
.block();
I've tried to just delete "PageNumber" from request, but it seems the pagination is hard-coded for this page.
(X-Pagination header from response: [{"TotalCount":88,"PageSize":10,"CurrentPage":1,"TotalPages":9,"HasNext":true,"HasPrevious":false}])
The question is: How can I iterate through number of pages mentioned in response header, and collect the whole data?

Here is the way you could handle paginaged requests with WebClient.
Create a method to retreive a single page of data. Typically you would use bodyToFlux(EzamowieniaDto.class) and return Flux<EzamowieniaDto> but because we need headers we have to use toEntityFlux(EzamowieniaDto.class) to wrap response in Mono<ResponseEntity.
Mono<ResponseEntity<Flux<EzamowieniaDto>>> getPage(String url, int pageNumber) {
return webClient.get()
.uri(url + "&PageNumber={pageNum}", pageNumber)
.retrieve()
.toEntityFlux(EzamowieniaDto.class);
}
Use expand to to fetch data until we reach the end
Flux<EzamowieniaDto> getData(String url) {
return getPage(url, 1)
.expand(response -> {
Pagination pagination = formJson(response.getHeaders().getFirst("X-Pagination"));
if (!pagination.hasNext()) {
// stop
return Mono.empty();
}
// fetch next page
getPage(url, pagination.getCurrentPage() + 1);
})
.flatMap(response -> response.getBody());
}

Firstly, do not use .block() method, because, shortly, it interrupts asynchronous stream and makes it synchronous, so there is no need, actually, in WebClient nowadays (here you can find some brief intro), to use it in such way. You can use, also, RestTemplate implementations like Retrofit. But in your case, to save the asynchronous pattern, you can use next solution:
List<EzamowieniaDto> ezam = WebClient
.create("https://ezamowienia.gov.pl/mo-board/api/v1/Board/Search?noticeType=ContractNotice&isTenderAmountBelowEU=true" +
"&publicationDateFrom=2022-03-16T00:00:00.000Z&orderType=Delivery&SortingColumnName=PublicationDate&SortingDirection=DESC" +
"&PageNumber=1")
.get()
.retrieve()
.bodyToFlux(EzamowieniaDto.class) // here you can just use Flux, it's like List from synchronous Java, simply
.map({body -> /*body = EzamowieniaDto , do any job with this object here*/})
...
Example
...
List<EzamowieniaDto> dtos = new ArrayList<>();
Flux<EzamowieniaDto> fluxDtos = WebClient
.create("http://some-url.com")
.get()
.retrieve()
.bodyToFlux(EzamowieniaDto.class)
.filter({body -> body.getId().equals(1L)}) // here just some filter for emitted elements
.subscribe({dto -> dtos.add(dto)}); // subscribe to end asynchronous flow , in simple words
System.out.println(dtos.get(0).getId().equals(1L); // some simple test or something like this, use dtos as you need.
Additionally
Using synchronous staff (Lists, Mono of List, etc.) mixed with asynchronous, you will always get synchronous behavior at some point of time, in the place in your code where it happens. Reactive programming implies that you use asynchronous programming (mostly, declarative programming) while the whole process from fetching asynchronously response to asynchronously writing to the database (Hibernate Reactive, for example).
Hope it helps somehow and I suggest to start learning reactive programming (Reactor or Spring WebFlux, for example), if you are not started yet to understand basics of asynchronous programming.
Best Regards, Anton.

Related

What is the proper way to wait till all Mono responses are returned from downstream APIs

I'm quite new to Mono and Flux. I'm trying to join several downstream API responses. It's a traditional blocking application. I don't wish to collect a list of Mono, I want a List of the payloads returned from the downstream APIs, which I fetch from the Mono. However the 'result' being returned to the controller at times only has some or none of the downstream API responses. What is the correct way to do this? I've read several posts How to iterate Flux and mix with Mono states
you should not call subscribe anywhere in a web application. If this is bound to an HTTP request, you're basically triggering the
reactive pipeline with no guarantee about resources or completion.
Calling subscribe triggers the pipeline but does not wait until it's
complete
Should I be using CompletableFuture?
In my Service I attempted
var result = new ArrayList<List<>>();
List<Mono<X>> monoList = apiCall();
Flux.fromIterable(monoList)
.flatMap(m -> m.doOnSuccess(
x -> {
result.add(x.getData());
}
)).subscribe();
I also attempted the following in controller, but the method returns without waiting for subscribe to complete
var result = new ArrayList<List<X>>();
Flux.concat(
this.service.callApis(result, ...)
).subscribe();
return result;
In my service
public Mono<Void> callApis(List<List<x>> result, ..) {
...
return Flux.fromIterable(monoList)
.flatMap(m -> m.doOnSuccess(
x -> {
result.add(x.getData()...);
}
)).then();

The Project Reactor documentation (which is very good) has a section called Which operator do I need?. You need to create a Flux from your API calls, combine the results, and then return to the synchronous world.
In your case, it looks like all your downstream services have the same API, so they all return the same type and it doesn't really matter what order those responses appear in your application. Also, I'm assuming that apiCall() returns a List<Mono<Response>>. You probably want something like
Flux.fromIterable(apiCall()) // Flux<Mono<Response>>
.flatMap(mono -> mono) // Flux<Response>
.map(response -> response.getData()) // Flux<List<X>>
.collectList() // Mono<List<List<X>>>
.block(); // List<List<X>>
The fromIterable(...).flatMap(x->x) construct just converts your List<Mono<R>> into a Flux<R>.
map() is used to extract the data part of your response.
collectList() creates a Mono that waits until the Flux completes, and gives a single result containing all the data lists.
block() subscribes to the Mono returned by the previous operator, and blocks until it is complete, which will (in this case) be when all the Monos returned by apiCall() have completed.
There are many possible alternatives here, and which is most suitable will depend on your exact use case.

How to limit concurrent http requests with Mono & Flux

I want to handle Flux to limit concurrent HTTP requests made by List of Mono.
When some requests are done (received responses), then service requests another until the total count of waiting requests is 15.
A single request returns a list and triggers another request depending on the result.
At this point, I want to send requests with limited concurrency.
Because consumer side, too many HTTP requests make an opposite server in trouble.
I used flatMapMany like below.
public Flux<JsonNode> syncData() {
return service1
.getData(param1)
.flatMapMany(res -> {
List<Mono<JsonNode>> totalTask = new ArrayList<>();
Map<String, Object> originData = service2.getDataFromDB(param2);
res.withArray("data").forEach(row -> {
String id = row.get("id").asText();
if (originData.containsKey(id)) {
totalTask.add(service1.updateRequest(param3));
} else {
totalTask.add(service1.deleteRequest(param4));
}
originData.remove(id);
});
for (left) {
totalTask.add(service1.createRequest(param5));
}
return Flux.merge(totalTask);
});
}
void syncData() {
syncDataService.syncData().????;
}
I tried chaining .window(15), but it doesn't work. All the requests are sent simultaneously.
How can I handle Flux for my goal?

I am afraid Project Reactor doesn't provide any implementation of either rate or time limit.
However, you can find a bunch of 3rd party libraries that provide such functionality and are compatible with Project Reactor. As far as I know, resilience4-reactor supports that and is also compatible with Spring and Spring Boot frameworks.
The RateLimiterOperator checks if a downstream subscriber/observer can acquire a permission to subscribe to an upstream Publisher. If the rate limit would be exceeded, the RateLimiterOperator could either delay requesting data from the upstream or it can emit a RequestNotPermitted error to the downstream subscriber.
RateLimiter rateLimiter = RateLimiter.ofDefaults("name");
Mono.fromCallable(backendService::doSomething)
.transformDeferred(RateLimiterOperator.of(rateLimiter))
More about RateLimiter module itself here: https://resilience4j.readme.io/docs/ratelimiter

You can use limitRate on a Flux. you need to probably reformat your code a bit but see docs here: https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#limitRate-int-

flatMap takes a concurrency parameter: https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#flatMap-java.util.function.Function-int-
Mono<User> getById(int userId) { ... }
Flux.just(1, 2, 3, 4).flatMap(client::getById, 2)
will limit the number of concurrent requests to 2.

Retrofit Kotlin - make an API request followed by two more concurrent ones

I want to make an api request, then I need to make two more requests after I receive the data. I found a great SO answer that uses rxjava2 to make two concurrent requests here:
How to make multiple request and wait until data is come from all the requests in retrofit 2.0 - android
I suppose I could just chain the logic for this after the first request, but my intuition tells me thats a bad idea because I'd be duplicating some code (I'd have separate logic for the first request in a function, then some similar logic for the second two requests in a function)
Is there a better way to accomplish this? I'd prefer Kotlin, but Java is ok.
Here is the code for concurrent requests from the SO answer.
val retrofit = Retrofit.Builder()
.baseUrl("https://api.example.com/")
.build()
val backendApi = retrofit.create(MyBackendAPI::class.java)
val requests = ArrayList<Observable<*>>()
requests.add(backendApi.getUser())
requests.add(backendApi.listPhotos())
requests.add(backendApi.listFriends())
Observable
.zip(requests) {
// do something with those results and emit new event
Any() // <-- Here we emit just new empty Object(), but you can emit anything
}
// Will be triggered if all requests will end successfully (4xx and 5xx also are successful requests too)
.subscribe({
//Do something on successful completion of all requests
}) {
//Do something on error completion of requests
}
Thanks

I have a complicated set of tasks using Java Web-Client requests that need to run in parallel and finally block to return a single response

I am new to the Web Client reactive library.
Here is my problem :
It starts with a user submitting a request to post a packet of documents. They wait for a response.
A service consuming this request needs to run several tasks in parallel. Some of the sub-tasks within each task have to finish first ( 2 different Get requests ) before attempting the last sub-task which is the main Post request. Then I want to wait for the collection of all tasks 'Post sub-tasks' to finish (representing the packet), and then collect and reconcile the responses.
I need to reconcile at the end and make sure the entire parallel process succeeds (sending the packet) and then respond to a server (user) indicating that the process was successful or not.
My pseudo flow:
Create a set of documents to post to a server one at a time. A packet can contain up to 10 documents. (List of DocumentAndMetaData). Initially each document would contain some pre-filled known values like file path and document name.
For each document in a packet: (run in parallel)
I need to do file I/O and create a meta data object- call it getDocumentAndMetadata. To create a Metadata object I must do some
steps first within getDocumentAndMetadata:
Do a get Request to get Key A- call it getKeyA(requestA)
Do a get request to get Key B- call it getKeyB(requestB)
Merge Key A and Key B requests and use the responses from those requests to update the metadata object.
Then Read File to get a Byte array - call it getFile
Then pass the byte array (document) and metadata object to a function that:
Does a Http Post to a server sending the byte array and metadata object in the post request.
Accumulate the responses for each Post which are strings.
then block until all the documents are sent.
Finally evaluate all the string responses that are returned from the Post requests and make sure the number of responses match the number of documents posted to a server. Track any errors. If any Get or Post request fails, log the error.
I figured out how to do all these steps running block() on each sub-task 'Get request' and then block() on the main 'Post request', but I am afraid the performance will suffer using this approach.
I need help with how to generate the flow using Web-Client and reactive non blocking parallel processes.
Thanks for any help.

' I am afraid the performance will suffer using this approach.' - You are right. After all, the whole purpose of using WebFlux is to create a non-blocking application.
I have tried to mock most of the logic. I hope you can correlate the solution with your use-case.
#RestController
public class MyController {
#Autowired
private WebClient webClient;
#PostMapping(value = "/postPacketOfDocs")
public Mono<ResponseEntity<String>> upload(#RequestBody Flux<String> documentAndMetaDataList) {
return documentAndMetaDataList
.flatMap(documentAndMetaData -> {
//do File I/O
return getDocumentAndMetadata(documentAndMetaData);
})
.map(String::getBytes) //Read File to get a Byte array
.flatMap(fileBytes -> {
return webClient.post().uri("/send/byte/and/metadata")
.retrieve().bodyToMono(String.class);
})
.collectList()
.flatMap(allResponsesFromEachPOST -> {
//Do some validation
boolean allValidationsSuccessful = true;
if (allValidationsSuccessful) {
return Mono.just("Success");
} else {
return Mono.error(new RuntimeException()); //some custom exception which can be handled by #ExceptionHandler
}
})
.flatMap(msg -> Mono.just(ResponseEntity.ok().body(msg)));
}
private Mono<String> getDocumentAndMetadata(String documentAndMetaData) {
String metadata = "";//get metadata object from documentAndMetaData
Mono<String> keyAResponse = webClient.get().uri("/get/keyA").retrieve().bodyToMono(String.class);
Mono<String> keyBResponse = webClient.get().uri("/get/keyB").retrieve().bodyToMono(String.class);
return keyAResponse.concatWith(keyBResponse)
.collectList()
.flatMap(responses -> updateMetadata(responses, metadata));
}
private Mono<String> updateMetadata(List<String> responses, String metadata) {
String newMedataData = metadata + responses.get(0) + responses.get(1); //some update logic
return Mono.just(newMedataData);
}
}

Webflux Webclient re-try with different URL

I am using webclient for the rest call and what i need is, if the primary URL is failing for the n'th time do the next re-try on Secondary URL . Please find below sample code for the logic which i am using. But it seems we cannot change the URL once he client is created and even if i change the URL its not getting effected and still requests are been fired to the initial URL.
ClientHttpConnector connector;//initiate
WebClient webClient = WebClient.builder().clientConnector(connector).build();
WebClient.RequestBodyUriSpec client = webClient.post();
client.uri("http://primaryUrl/").body(BodyInserters.fromObject("hi")).retrieve().bodyToMono(String.class).retryWhen(Retry.anyOf(Exception.class)
.exponentialBackoff(Duration.ofSeconds(2), Duration.ofSeconds(10)).doOnRetry(x ->
{
if (x.iteration() == 2) {
client.uri("http://fail_over_url/");//this does not work
}
})
.retryMax(2)).subscribe(WebClientTest::logCompletion, WebClientTest::handleError);
Is there any way to change the URL at the middle of re-try cycle ?

But it seems we cannot change the URL once he client is created
You cannot - it's immutable.
even if i change the URL its not getting effected
You're not actually changing the URL. Take a look at the uri() method - it's returning a new instance with a URI set. Since you're not doing anything with that new instance, nothing happens (as expected.)
The route I'd probably suggest is to create a separate method to form & return your basic WebClient publisher:
private Mono<String> fromUrl(String url) {
return WebClient.builder().clientConnector(connector).build()
.post()
.body(BodyInserters.fromValue("hi"))
.uri(url)
.retrieve()
.bodyToMono(String.class);
}
...and then do something like:
fromUrl("https://httpstat.us/400").retryWhen(Retry.backoff(2, Duration.ofSeconds(1)))
.onErrorResume(t -> Exceptions.isRetryExhausted(t), t -> fromUrl("https://httpstat.us/500").retryWhen(Retry.backoff(5, Duration.ofSeconds(1))))
.onErrorResume(t -> Exceptions.isRetryExhausted(t), t -> fromUrl("https://httpstat.us/200").retryWhen(Retry.backoff(7, Duration.ofSeconds(1))))
...which will try /400 3 times, then try /500 5 times, then /200 up to 7 times (but unless it's down, that will of course return on the first try.)
Note that the above example uses the latest version of reactor-core which has the retry functionality built in, rather than the retry functionality in reactor addons. Translating it to the reactor addons functionality should be reasonably straightforward.
This doesn't not strictly changing the URL in the same retry cycle, but instead chaining requests together with configurable retries per request. This then allows you to set different retry strategies on different URLs, which is advantageous if you don't necessarily want the retry to "carry on" from its previous point (It could make sense to set the backoff back to one second for a fresh URL, for example.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.