I have a list of documents that I need to save on mongoDB, after having executing some process on them.
I have a reactive mongoDB dependency so I would like to use it. I would like to avoid the usage of another dependency (the non reactive one) if possible.
The 2nd document process should begin after the 1st document has been saved.
Mono<List<Index>> deferredCreate = Mono.defer(() -> index
.flatMapMany(Flux::fromIterable)
.flatMapSequential(entity -> {
repository.process(entity).subscribe();
return entity;
})
)
.collectList());
with
public Mono<IndexDocument> process(Index index) {
if(someCondition) { return mongoOperations.save(index); }
else { return mongoOperations.findAndReplace(query, index); }
}
Here, the list is process index by index but index n+1 process begins after index n begins and I need it begins after n process is finished.
I cannot put .block() instead of subscriber otherwise I have an error (block cannot be used in parallel stream). I tried with concatMap but this is the same.
Is there any way to do it ?
I use reactive programming because this is part of a larger process that needs to be reactive.
As I mentioned in the comment, you should not subscribe explicitly. flatMapSequential will subscribe behind the scene and preserve the original order
Mono<List<Index>> deferredCreate = Mono.defer(() -> index
.flatMapMany(Flux::fromIterable)
.flatMapSequential(entity ->
repository.process(entity)
.thenReturn(entity)
)
.collectList());
Related
I'm using Mutiny extension (for Quarkus) and I don't know how to manage this problem.
I want to send many request in an async way so I've read about Mutiny extension. But the server closes the connection because it receives thousand of them.
So I need:
Send the request by blocks
After all request are sent, do things.
I've been using Uni object to combine all the responses as this:
Uni<Map<Integer, String>> uniAll = Uni.combine()
.all()
.unis(list)
.combinedWith(...);
And then:
uniAll.subscribe()
.with(...);
This code, send all the request in paralell so the server closes the connection.
I'm using group of Multi objects, but I don't know how to use it (in Mutiny docs I can't found any example).
This is the way I'm doing now:
//Launch 1000 request
for (int i=0;i<1000;i++) {
multi = client.getAbs("https://api.*********.io/jokes/random")
.as(BodyCodec.jsonObject())
.send()
.onItem().transformToMulti(
array -> Multi.createFrom()
.item(array.body().getString("value")))
.group()
.intoLists()
.of(100)
.subscribe()
.with(a->{
System.out.println("Value: "+a);
});
}
I think that the subscription doesn't execute until there are "100" groups of items, but I guess this is not the way because it doesn't work.
Does anybody know how to launch 1000 of async requests in blocks of 100?
Thanks in advance.
UPDATED 2021-04-19
I've tried with this approach:
List<Uni<String>> listOfUnis = new ArrayList<>();
for (int i=0;i<1000;i++) {
listOfUnis.add(client
.getAbs("https://api.*******.io/jokes/random")
.as(BodyCodec.jsonObject())
.send()
.onItem()
.transform(item -> item
.body()
.getString("value")));
}
Multi<Uni<String>> multiFormUnis = Multi.createFrom()
.iterable(listOfUnis);
List<String> listOfResponses = new ArrayList<>();
List<String> listOfValues = multiFormUnis.group()
.intoLists()
.of(100)
.onItem()
.transformToMultiAndConcatenate(listOfOneHundred ->
{
System.out.println("Size: "+listOfOneHundred.size());
for (int index=0;index<listOfOneHundred.size();index++) {
listOfResponses.add(listOfOneHundred.get(index)
.await()
.indefinitely());
}
return Multi.createFrom()
.iterable(listOfResponses);
})
.collectItems()
.asList()
.await()
.indefinitely();
for (String value : listOfValues) {
System.out.println(value);
}
When I put this line:
listOfResponses.add(listOfOneHundred.get(index)
.await()
.indefinitely());
The responses are printed one after each other, and when the first 100s group of items ends, it prints the next group. The problem? There are sequential requests and it takes so much time
I think I am close to the solution, but I need to know, how to send the parallel request only in group of 100s, because if I put:
subscribe().with()
All the request are sent in parallel (and not in group of 100s)
I think you create the multy wrong, it would be much easier to use this:
Multi<String> multiOfJokes = Multi.createFrom().emitter(multiEmitter -> {
for (int i=0;i<1000;i++) {
multiEmitter.emit(i);
}
multiEmitter.complete();
}).onItem().transformToUniAndMerge(index -> {
return Uni.createFrom().item("String" + index);
})
With this approach it should mace the call parallel.
Now is the question of how to make it to a list.
The grouping works fine
I run it with this code:
Random random = new Random();
Multi<Integer> multiOfInteger = Multi.createFrom().emitter(multiEmitter -> {
for (Integer i=0;i<1000;i++) {
multiEmitter.emit(i);
}
multiEmitter.complete();
});
Multi<String> multiOfJokes = multiOfInteger.onItem().transformToUniAndMerge(index -> {
if (index % 10 == 0 ) {
Duration delay = Duration.ofMillis(random.nextInt(100) + 1);
return Uni.createFrom().item("String " + index + " delayed").onItem()
.delayIt().by(delay);
}
return Uni.createFrom().item("String" + index);
}).onCompletion().invoke(() -> System.out.println("Completed"));
Multi<List<String>> multiListJokes = multiOfJokes
.group().intoLists().of(100)
.onCompletion().invoke(() -> System.out.println("Completed"))
.onItem().invoke(strings -> System.out.println(strings));
multiListJokes.collect().asList().await().indefinitely();
You will get a list of your string.
I don't know, how you intend to send the list to backend.
But you can either to it with:
call (executed asynchronously)
write own subscriber (implements Subscriber) the methods are straight forward.
As you need for your bulk request.
I hope you understand it better afterward.
PS: link to guide where I learned all of it:
https://smallrye.io/smallrye-mutiny/guides
So in short you want to batch parallel calls to the server, without hitting it with everything at once.
Could this work for you? It uses merge. In my example, it has a parallelism of 2.
Multi.createFrom().range(1, 10)
.onItem()
.transformToUni(integer -> {
return <<my long operation Uni>>
})
.merge(2) //this is the concurrency
.collect()
.asList();
I'm not sure if merge was added later this year, but this seems to do what you want. In my example, the "long operation producing Uni" is actually a call to the Microprofile Rest Client which produces a Uni, and returns a string. After the merge you can put another onItem to perform something with the response (it's a plain Multi after the merge), instead of collecting everything as list.
for (Issue issue : issues) {
if (issue.getSubtasks().spliterator().estimateSize() > 0) {
alreadyCreated = false;
for (Subtask subtask : issue.getSubtasks()) {
if (subtask.getSummary().contains("sometext")) {
alreadyCreated = true;
break;
}
}
if (!alreadyCreated) {
ids.add(issue.getKey());
}
} else {
ids.add(issue.getKey());
}
}
I'm not so expert with Java stream API, but I'm pretty sure there's some way to simplify the code above with using lambda expressions. Would be a great help to understand it even better!
Some notes:
Issues and getSubtasks() are returning back Iterable<> types.
You can use filter to remove unwanted elements and map to get the key, then collect to a List.
StreamSupport.stream(issues.spliterator(), false).filter(x ->
StreamSupport.stream(x.getSubtasks().spliterator(), false)
.noneMatch(y -> y.getSummary().contains("sometext"))
).map(Issue::getKey).collect(Collectors.toList());
I can't be certain what is streamable in your example so I'm going to provide an alternate solution that doesn't require streams but is at least, if not more, efficient. It just uses a different technique but essentially the same logic.
if size <= 0, then the for loop is skipped and the key is added.
if size > 0 then then for loop is excecuted. Then if any of the summaries contains the text, the outer loop proceeds normally, otherwise, the loop falls thru ant the key is added.
outer:
for (Issue issue : issues) {
if (issue.getSubtasks().spliterator().estimateSize() > 0) {
for (Subtask subtask : issue.getSubtasks()) {
if (subtask.getSummary().contains("sometext")) {
continue outer;
}
}
}
ids.add(issue.getKey());
}
}
You're adding the issue key to ids if there exists no subtask with a summary containing "sometext."
issues.stream()
.filter(i -> i.getSubtasks().stream()
.noneMatch(s -> s.getSummary().contains("sometext")))
.map(Issue::getKey)
.forEach(ids::add);
I think the subtasks size check is at least superfluous and possibly dangerous, if it's possible for it to estimate the size as zero when subtasks exist. But if getSubtasks() returns some non-collection that can't easily be streamed and this estimateSize() call is necessary then that just changes things a little bit:
issues.stream()
.filter(i -> {
Spliterator<Subtask> split = i.getSubtasks().spliterator();
return split.estimateSize() == 0 ||
StreamSupport.stream(split, false)
.noneMatch(s -> s.getSummary().contains("sometext"));
})
.map(Issue::getKey)
.forEach(ids::add);
The following code throws the IllegalArgumentException in every 10-15 try for the same input:
AllDirectedPaths<Vertex, Edge> allDirectedPaths = new AllDirectedPaths<>(graph);
List<GraphPath<Vertex, Edge>> paths = allDirectedPaths.getAllPaths(entry, exit, true, null);
return paths.parallelStream().map(path -> path.getEdgeList().parallelStream()
.map(edge -> {
Vertex source = edge.getSource();
Vertex target = edge.getTarget();
if (source.containsInstruction(method, instructionIndex)) {
return source;
} else if (target.containsInstruction(method, instructionIndex)) {
return target;
} else {
return null;
}
}).filter(Objects::nonNull)).findAny().flatMap(Stream::findAny)
.orElseThrow(() -> new IllegalArgumentException("Given trace refers to no vertex in graph!"));
The idea of the code is to find a vertex that wraps a certain instruction (see containsInstruction()), whereas the vertex is on at least one path from the entry to the exit vertex. I'm aware that the code is not optimal in terms of performance (every intermediate vertex on a path is looked up twice), but that doesn't matter.
The input is simply a trace (String) from which the method and instructionIndex can be derived. All other variables are fixed in that sense. Moreover, the method containsInstruction() doesn't have any side effects.
Does it matter where to put the 'findAny()' stream operation? Should I place it directly following the filter operation? Or are nested parallel streams the problem?
You should use .flatMap(path -> ... ) and remove .flatMap(Stream::findAny).
Your code doesn't work because the first findAny() returns a stream that is always non null, but that might hold null elements.
Then, when you apply the second findAny() by means of the Optional.flatMap(Stream::findAny) call, this last find operation might return an empty Optional, as the result of ending up with a null element of the inner stream.
This is how the code should look:
return paths.stream()
.flatMap(path -> path.getEdgeList().stream()
.map(edge ->
edge.getSource().containsInstruction(method, instructionIndex) ?
edge.getSource() :
edge.getTarget().containsInstruction(method, instructionIndex) ?
edge.getTarget() :
null)
.filter(Objects::nonNull))
.findAny()
.orElseThrow(() -> new IllegalArgumentException("whatever"));
Note aside: why parallel streams? There doesn't seem to be CPU bound tasks in your pipeline. Besides, parallel streams create a lot of overhead. They are useful in very few scenarios, i.e. tens of thousands of elements and intensive CPU operations along the pipeline
EDIT: As suggested in the comments, the map and filter operations of the inner stream could be safely moved to the outer stream. This way, readability is improved and there's no difference performance-wise:
return paths.stream()
.flatMap(path -> path.getEdgeList().stream())
.map(edge ->
edge.getSource().containsInstruction(method, instructionIndex) ?
edge.getSource() :
edge.getTarget().containsInstruction(method, instructionIndex) ?
edge.getTarget() :
null)
.filter(Objects::nonNull)
.findAny()
.orElseThrow(() -> new IllegalArgumentException("whatever"));
Another note: maybe refactoring the code inside map to a method of the Edge class would be better, so that the logic to return either the source, the target or null is in the class that already has all the information.
I am using Java 11 and project Reactor (from Spring). I need to make a http call to a rest api (I can only make it once in the whole flow).
With the response I need to compute two things:
Check if a document exists in the database (mongodb). If it does not exists then create it and return it. Otherwise just return it.
Compute some logic on the response and we are done.
In pseudo code it is something like this:
public void computeData(String id) {
httpClient.getData(id) // Returns a Mono<Data>
.flatMap(data -> getDocument(data.getDocumenId()))
// Issue here is we need access to the data object consumed in the previous flatMap but at the same time we also need the document object we get from the previous flatMap
.flatMap(document -> calculateValue(document, data))
.subscribe();
}
public Mono<Document> getDocument(String id) {
// Check if document exists
// If not create document
return document;
}
public Mono<Value> calculateValue(Document doc, Data data) {
// Do something...
return value;
}
The issue is that calculateValue needs the return value from http.getData but this was already consumed on the first flatMap but we also need the document object we get from the previous flatMap.
I tried to solve this issue using Mono.zip like below:
public void computeData(String id) {
final Mono<Data> dataMono = httpClient.getData(id);
Mono.zip(
new Mono<Mono<Document>>() {
#Override
public void subscribe(CoreSubscriber<? super Mono<Document>> actual) {
final Mono<Document> documentMono = dataMono.flatMap(data -> getDocument(data.getDocumentId()))
actual.onNext(documentMono);
}
},
new Mono<Mono<Value>>() {
#Override
public void subscribe(CoreSubscriber<? super Mono<Value>> actual) {
actual.onNext(dataMono);
}
}
)
.flatMap(objects -> {
final Mono<Document> documentMono = objects.getT1();
final Mono<Data> dataMono = objects.getT2();
return Mono.zip(documentMono, dataMono, (document, data) -> calculateValue(document, data))
})
}
But this is executing the httpClient.getData(id) twice which goes against my constrain of only calling it once. I understand why it is being executed twice (I subscribe to it twice).
Maybe my solution design can be improved somewhere but I do not see where. To me this sounds like a "normal" issue when designing reactive code but I could not find a suitable solution to it so far.
My question is, how can accomplish this flow in a reactive and non blocking way and only making one call to the rest api?
PS; I could add all the logic inside one single map but that would force me to subscribe to one of the Mono inside the map which is not recommended and I want to avoid following this approach.
EDIT regarding #caco3 comment
I need to subscribe inside the map because both getDocument and calculateValue methods return a Mono.
So, if I wanted to put all the logic inside one single map it would be something like:
public void computeData(String id) {
httpClient.getData(id)
.map(data -> getDocument(data).subscribe(s -> calculateValue(s, data)))
.subscribe();
}
You do not have to subscribe inside map, just continue building the reactive chain inside the flatMap:
getData(id) // Mono<Data>
.flatMap(data -> getDocument(data.getDocumentId()) // Mono<Document>
.switchIfEmpty(createDocument(data.getDocumentId())) // Mono<Document>
.flatMap(document -> calculateValue(document, data)) // Mono<Value>
)
.subscribe()
Boiling it down, your problem is analogous to:
Mono.just(1)
.flatMap(original -> process(original))
.flatMap(processed -> I need access to the original value and the processed value!
System.out.println(original); //Won't work
);
private static Mono<String> process(int in) {
return Mono.just(in + " is an integer").delayElement(Duration.ofSeconds(2));
}
(Silly example, I know.)
The problem is that map() (and by extension, flatMap()) are transformations - you get access to the new value, and the old one goes away. So in your second flatMap() call, you've got access to 1 is an integer, but not the original value (1.)
The solution here is to, instead of mapping to the new value, map to some kind of merged result that contains both the original and new values. Reactor provides a built in type for that - a Tuple. So editing our original example, we'd have:
Mono.just(1)
.flatMap(original -> operation(original))
.flatMap(processed -> //Help - I need access to the original value and the processed value!
System.out.println(processed.getT1()); //Original
System.out.println(processed.getT2()); //Processed
///etc.
);
private static Mono<Tuple2<Integer, String>> operation(int in) {
return Mono.just(in + " is an integer").delayElement(Duration.ofSeconds(2))
.map(newValue -> Tuples.of(in, newValue));
}
You can use the same strategy to "hold on" to both document and data - no need for inner subscribes or anything of the sort :-)
So, I have a /download API which returns me a generic Object (based on an index number which is its own parameter) then I have to save it to my database, if the transaction is successful, I have to increase my index and repeat the same process again, otherwise retry().
I'll need to repeat this for about 50 times.
How can I achieve this process using Rx-Java?
I'm stuck right now. Any help would be awesome. Thank You.
Observable.range(1, 50)
.flatMap(index -> // for every index make new request
makeRequest(index) // this shall return Observable<Response>
.retry(N) // on error => retry this request N times
)
.subscribe(response -> saveToDb(response));
Answer to comment (make new request only after previous response is saved to db):
Observable.range(1, 50)
.flatMap(index -> // for every index make new request
makeRequest(index) // this shall return Observable<Response>
.retry(N) // on error => retry this request N times
.map(response -> saveToDb(response)), // save and report success
1 // limit concurrency to single request-save
)
.subscribe();
If I understand you correctly this piece of code should point you to a right direction.
BehaviorSubject<Integer> indexes = BehaviorSubject.createDefault(0);
indexes.flatMap(integer -> Observable.just(integer)) // download operation
.flatMap(downloadedObject -> Observable.just(integer)) // save operation
.doOnNext(ind -> indexes.onNext(ind + 1))
.subscribe(ind -> System.out.println("Index " + ind));
What happens is:
BehaviorSubject is a sort of initiator of whole work, it feeds indexes
to the chain.
First flatMap is where you do a download operation
Second flatMap is where you save it to a DB
In doOnNext you have to
issue onNext or onComplete to the subject to continue with or finish
processing. (This can be done in a subscriber)
Remember to add a stop condition in the onNext to not end up with an infinite loop.
I'll need to repeat this for about 50 times.
You can use range operator and handle each Int emitted.
if the transaction is successful, I have to increase my index
In that case you need to use concatMap operator. It handles each Observable sequentially.
Observable<Response> makeRequest(int i) {...}
Completable saveToDb(Response response) {...}
Observable.range(1, 50)
.concatMap(i -> makeRequest(i)
//I assume that you save your response to DB asynchronously
//if not - use doOnNext operator instead of flatMapCompletable
.flatMapCompletable(response -> saveToDb(response)
.toObservable())
.retry()
...