How to parallelize database queries in Spring Flux? - java
I want to expose aggregated results from a mysql database with a Flux<JSONObject> stream in Spring.
#RestController
public class FluxController {
#GetMapping("/", produces = TEXT_EVENT_STREAM_VALUE)
public Flux<JSONObject> stream() {
return service.getJson();
}
}
#Service
public class DatabaseService {
public List<JSONObject> getJson() {
List<Long> refs = jdbc.queryForList(...);
MapSqlParameterSource params = new MapSqlParameterSource();
params.addValue("refs", refs);
//of course real world sql is much more complex
List<Long, Product> products = jdbc.query(SELECT * from products where ref IN (:refs), params);
List<Long, Item> items = jdbc.query(SELECT * from items where ref IN (:refs), params);
List<Long, Warehouse> warehouses = jdbc.query(SELECT * from warehouses where ref IN (:refs), params);
List<JSONObject> results = new ArrayList<>();
for (Long ref : refs) {
JSONObject json = new JSONObject();
json.put("ref", ref);
json.put("product", products.get(ref));
json.put("item", items.get(ref));
json.put("warehouse", warehouses.get(ref));
results.add(json);
}
return results;
}
Now I want to convert this to a flux, to expose it as an event stream. But how can I parallelize the db lookup and chain it together to a flux?
public Flux<JSONObject> getJsonFlux() {
//I need this as source
List<Long> refs = jdbc.queryForList(...);
return Flux.fromIterable(refs).map(refs -> {
//TODO how to aggregate the different database calls concurrently?
//and then expose each JSONObject one by one into the stream as soon as it is build?
};
}
Sidenote: I know this will still be blocking. But in my real application, I'm applying pagination and chunking, so each chunk will get exposed to the stream when ready.
Then main problem is that I don't know how to parallelize, and then aggregate/merge the results eg in the last flux step.
The idea is to firstly fetch complete list of refs, and then simultaneously fetch Products, Items, and Warehouses - I called this Tuple3 lookups. Then combine each ref with lookups and convert it to JSONObject one by one.
return Mono.fromCallable(jdbc::queryForList) //fetches refs
.subscribeOn(Schedulers.elastic())
.flatMapMany(refList -> { //flatMapMany allows to convert Mono to Flux in flatMap operation
Flux<Tuple3<Map<Long, Product>, Map<Long, Item>, Map<Long, Warehouse>>> lookups = Mono.zip(fetchProducts(refList), fetchItems(refList), fetchWarehouses(refList))
.cache().repeat(); //notice cache - it makes sure that Mono.zip is executed only once, not for each zipWith call
return Flux.fromIterable(refList)
.zipWith(lookups);
}
)
.map(t -> {
Long ref = t.getT1();
Tuple3<Map<Long, Product>, Map<Long, Item>, Map<Long, Warehouse>> lookups = t.getT2();
JSONObject json = new JSONObject();
json.put("ref", ref);
json.put("product", lookups.getT1().get(ref));
json.put("item", lookups.getT2().get(ref));
json.put("warehouse", lookups.getT3().get(ref));
return json;
});
Methods for each database call:
Mono<Map<Long, Product>> fetchProducts(List<Long> refs) {
return Mono.fromCallable(() -> jdbc.query(SELECT * from products where ref IN(:refs),params))
.subscribeOn(Schedulers.elastic());
}
Mono<Map<Long, Item>> fetchItems(List<Long> refs) {
return Mono.fromCallable(() -> jdbc.query(SELECT * from items where ref IN(:refs),params))
.subscribeOn(Schedulers.elastic());
}
Mono<Map<Long, Warehouse>> fetchWarehouses(List<Long> refs) {
return Mono.fromCallable(() -> jdbc.query(SELECT * from warehouses where ref IN(:refs),params))
.subscribeOn(Schedulers.elastic());
}
Why do I need subsribeOn?
I put it because of 2 reasons:
It allows to execute database query on the thread from dedicated
thread pool, which prevents blocking main thread:
https://projectreactor.io/docs/core/release/reference/#faq.wrap-blocking
It allows to truly parallelize Mono.zip. See this one, it's
regarding flatMap, but it's also applicable to zip:
When FlatMap will listen to multiple sources concurrently?
For completeness, the same is possible when using .flatMap() on the zip result. Though I'm not sure if .cache() is still necessary here.
.flatMapMany(refList -> {
Mono.zip(fetchProducts(refList), fetchItems(refList), fetchWarehouses(refList)).cache()
.flatMap(tuple -> Flux.fromIterable(refList).map(refId -> Tuples.of(refId, tuple)));
.map(tuple -> {
String refId = tuple.getT1();
Tuple lookups = tuple.getT2();
}
})
If I understand well you would like to execute queries by passing all refs as parameter.
It will not really be an event stream this way, since it will wait until all queries are finished and all json objects are in memory and just start streaming them after that.
public Flux<JSONObject> getJsonFlux()
{
return Mono.fromCallable(jdbc::queryForList)
.subscribeOn(Schedulers.elastic()) // elastic thread pool meant for blocking IO, you can use a custom one
.flatMap(this::queryEntities)
.map(this::createJsonObjects)
.flatMapMany(Flux::fromIterable);
}
private Mono<Tuple4<List<Long>, List<Product>, List<Item>, List<Warehouse>>> queryEntities(List<Long> refs)
{
Mono<List<Product>> products = Mono.fromCallable(() -> jdbc.queryProducts(refs)).subscribeOn(Schedulers.elastic());
Mono<List<Item>> items = Mono.fromCallable(() -> jdbc.queryItems(refs)).subscribeOn(Schedulers.elastic());
Mono<List<Warehouse>> warehouses = Mono.fromCallable(() -> jdbc.queryWarehouses(refs)).subscribeOn(Schedulers.elastic());
return Mono.zip(Mono.just(refs), products, items, warehouses); // query calls will be concurrent
}
private List<JSONObject> createJsonObjects(Tuple4<List<Long>, List<Product>, List<Item>, List<Warehouse>> tuple)
{
List<Long> refs = tuple.getT1();
List<Product> products = tuple.getT2();
List<Item> items = tuple.getT3();
List<Warehouse> warehouses = tuple.getT4();
List<JSONObject> jsonObjects = new ArrayList<>();
for (Long ref : refs)
{
JSONObject json = new JSONObject();
// build json object here
jsonObjects.add(json);
}
return jsonObjects;
}
The alternative way is to query entities for each ref separately. This way each JSONObject is queried individually and they can interleave in the stream. I'm not sure how the database handles that kind of load. That's something you should consider.
public Flux<JSONObject> getJsonFlux()
{
return Mono.fromCallable(jdbc::queryForList)
.flatMapMany(Flux::fromIterable)
.subscribeOn(Schedulers.elastic()) // elastic thread pool meant for blocking IO, you can use a custom one
.flatMap(this::queryEntities)
.map(this::createJsonObject);
}
private Mono<Tuple4<Long, Product, Item, Warehouse>> queryEntities(Long ref)
{
Mono<Product> product = Mono.fromCallable(() -> jdbc.queryProduct(ref)).subscribeOn(Schedulers.elastic());
Mono<Item> item = Mono.fromCallable(() -> jdbc.queryItem(ref)).subscribeOn(Schedulers.elastic());
Mono<Warehouse> warehouse = Mono.fromCallable(() -> jdbc.queryWarehouse(ref))
.subscribeOn(Schedulers.elastic());
return Mono.zip(Mono.just(ref), product, item, warehouse); // query calls will be concurrent
}
private JSONObject createJsonObject(Tuple4<Long, Product, Item, Warehouse> tuple)
{
Long ref = tuple.getT1();
Product product = tuple.getT2();
Item item = tuple.getT3();
Warehouse warehouse = tuple.getT4();
JSONObject json = new JSONObject();
// build json object here
return json;
}
Related
Java Stream Api - complicated calculation during reduction
is there some me better way how to do more complicated calculation in reduction than this (check getCurrentBalance): #Override public Map<User, BigDecimal> getTotalExpensesForUsers(Group group) { return group.getExpenses().stream() .collect(Collectors.groupingBy( Expense::getUser, Collectors.reducing(BigDecimal.ZERO, Expense::getAmount, BigDecimal::add) )); } #Override public Map<User, BigDecimal> getCurrentBalance(Group group) { final var defaultTotalWeight = BigDecimal.valueOf(group.getDefaultTotalWeight()); var totalExpensesPerUser = getTotalExpensesForUsers(group); final var averageExpensePerUser = totalExpensesPerUser.values().stream() .reduce(BigDecimal.ZERO, BigDecimal::add) .divide(defaultTotalWeight); totalExpensesPerUser.entrySet() .forEach(e -> e.setValue(e.getValue().subtract(averageExpensePerUser))); return totalExpensesPerUser; } I would like to somehow put it to one stream because currently I am iterating over the collection multiple times. Or is this case when I should use for loop instead? P.S. hopefully is the meaning of the code self explaining if not, I will update the question EDITED Because I got a lot of responses that it is not possible to do it in one iteration I am updating this question. I know it is not possible to do it in one iteration, but like this I am able to do it with two: #Override public Map<User, BigDecimal> getCurrentBalanceAlternative(Group group) { var currentBalance = new HashMap<User, BigDecimal>(); final var defaultTotalWeight = BigDecimal.valueOf(group.getDefaultTotalWeight()); var totalExpense = BigDecimal.ZERO; for(Expense expense: group.getExpenses()){ var expenseUser = expense.getUser(); var expenseAmount = expense.getAmount(); totalExpense = totalExpense.add(expenseAmount); currentBalance.put(expenseUser, currentBalance.getOrDefault(expenseUser, BigDecimal.ZERO).add(expenseAmount)); } var averageExpensePerUser = totalExpense.divide(defaultTotalWeight); currentBalance.entrySet() .forEach(e -> e.setValue(e.getValue().subtract(averageExpensePerUser))); return currentBalance; } but with streams I had to calculate the totalExpense separately.
return computed Mono from completing Flux
I am new to spring webflux and have a problem with aggregating a flux to a Mono. ProductController has a method Flux<Product> get(List<UUID> ids) returning a Stream of Products for a given list of ids. When all products have been fetched the flux completes. Aggregator fetches a list of products, computes a new ProductAggregateDTO from the stream and sends it to an accountingService, which then processes them and assigns an UUID to the accounting process. class Aggregator { Mono<UUID> process(List<UUID> ids) { ProductAggregateDTO adto = new ProductAggregateDTO(); productAdapter.getProducts(ids) .doOnNext(e -> { adto.consume(e); }) .doOnComplete(() -> { Mono<UUID> processId = accountAdapter.process(adto); }) .subscribe(); } } I want to return processId from the process function. I don't think thats a big problem. But I can not find how. Thanks for your help! Kind Regards, Andreas
Adding two lists of own type
I have a simple User class with a String and an int property. I would like to add two Lists of users this way: if the String equals then the numbers should be added and that would be its new value. The new list should include all users with proper values. Like this: List1: { [a:2], [b:3] } List2: { [b:4], [c:5] } ResultList: {[a:2], [b:7], [c:5]} User definition: public class User { private String name; private int comments; } My method: public List<User> addTwoList(List<User> first, List<User> sec) { List<User> result = new ArrayList<>(); for (int i=0; i<first.size(); i++) { Boolean bsin = false; Boolean isin = false; for (int j=0; j<sec.size(); j++) { isin = false; if (first.get(i).getName().equals(sec.get(j).getName())) { int value= first.get(i).getComments() + sec.get(j).getComments(); result.add(new User(first.get(i).getName(), value)); isin = true; bsin = true; } if (!isin) {result.add(sec.get(j));} } if (!bsin) {result.add(first.get(i));} } return result; } But it adds a whole lot of things to the list.
This is better done via the toMap collector: Collection<User> result = Stream .concat(first.stream(), second.stream()) .collect(Collectors.toMap( User::getName, u -> new User(u.getName(), u.getComments()), (l, r) -> { l.setComments(l.getComments() + r.getComments()); return l; })) .values(); First, concatenate both the lists into a single Stream<User> via Stream.concat. Second, we use the toMap collector to merge users that happen to have the same Name and get back a result of Collection<User>. if you strictly want a List<User> then pass the result into the ArrayList constructor i.e. List<User> resultSet = new ArrayList<>(result); Kudos to #davidxxx, you could collect to a list directly from the pipeline and avoid an intermediate variable creation with: List<User> result = Stream .concat(first.stream(), second.stream()) .collect(Collectors.toMap( User::getName, u -> new User(u.getName(), u.getComments()), (l, r) -> { l.setComments(l.getComments() + r.getComments()); return l; })) .values() .stream() .collect(Collectors.toList());
You have to use an intermediate map to merge users from both lists by summing their ages. One way is with streams, as shown in Aomine's answer. Here's another way, without streams: Map<String, Integer> map = new LinkedHashMap<>(); list1.forEach(u -> map.merge(u.getName(), u.getComments(), Integer::sum)); list2.forEach(u -> map.merge(u.getName(), u.getComments(), Integer::sum)); Now, you can create a list of users, as follows: List<User> result = new ArrayList<>(); map.forEach((name, comments) -> result.add(new User(name, comments))); This assumes User has a constructor that accepts name and comments. EDIT: As suggested by #davidxxx, we could improve the code by factoring out the first part: BiConsumer<List<User>, Map<String, Integer>> action = (list, map) -> list.forEach(u -> map.merge(u.getName(), u.getComments(), Integer::sum)); Map<String, Integer> map = new LinkedHashMap<>(); action.accept(list1, map); action.accept(list2, map); This refactor would avoid DRY.
There is a pretty direct way using Collectors.groupingBy and Collectors.reducing which doesnt require setters, which is the biggest advantage since you can keep the User immutable: Collection<Optional<User>> d = Stream .of(first, second) // start with Stream<List<User>> .flatMap(List::stream) // flatting to the Stream<User> .collect(Collectors.groupingBy( // Collecting to Map<String, List<User>> User::getName, // by name (the key) // and reducing the list into a single User Collectors.reducing((l, r) -> new User(l.getName(), l.getComments() + r.getComments())))) .values(); // return values from Map<String, List<User>> Unfortunately, the result is Collection<Optional<User>> since the reducing pipeline returns Optional since the result might not be present after all. You can stream the values and use the map() to get rid of the Optional or use Collectors.collectAndThen*: Collection<User> d = Stream .of(first, second) // start with Stream<List<User>> .flatMap(List::stream) // flatting to the Stream<User> .collect(Collectors.groupingBy( // Collecting to Map<String, List<User>> User::getName, // by name (the key) Collectors.collectingAndThen( // reduce the list into a single User Collectors.reducing((l, r) -> new User(l.getName(), l.getComments() + r.getComments())), Optional::get))) // and extract from the Optional .values(); * Thanks to #Aomine
As alternative fairly straight and efficient : stream the elements collect them into a Map<String, Integer> to associate each name to the sum of comments (int) stream the entries of the collected map to create the List of User. Alternatively for the third step you could apply a finishing transformation to the Map collector with collectingAndThen(groupingBy()..., m -> ... but I don't find it always very readable and here we could do without. It would give : List<User> users = Stream.concat(first.stream(), second.stream()) .collect(groupingBy(User::getName, summingInt(User::getComments))) .entrySet() .stream() .map(e -> new User(e.getKey(), e.getValue())) .collect(toList());
Store values of multiple keys while collecting into a map stream
Map<String, String> x = ArrayListMultimap.create(); Map<String, Boolean> results1 = Maps.newHashMap(); Map<String, Boolean> results2 = Maps.newHashMap(); I have a multimap which I need to traverse and make some expensive calls. To save time, I want to do a parallel stream. The results I get can have null values which has to be stored in a map. I know how to do this in a non-parallel way, however I'm not able to do this in a parallel stream without getting into concurrency issues. I realize I need to somehow convert this into a map and collect the results, but I don't know how I can return multiple keys and values. I was thinking of having a temporary single map, but that too will have multiple keys. x.keySet().paralleStream().forEach(req -> { try { Response response = getResponseForRequest(req); if(response.getTitles() != null) { boolean titleAvail = response.getTitles().stream().allMatch(Avalibilty:Status); x.get(req).forEach(y -> results1.put(y, titleAvail); } if(response.getDetails() != null) { boolean detailStatus = //perform some stream operation on getDetails x.get(req).forEach(y -> results2.put(y, detailStatus)); } } catch(TimeoutException e) { x.get(req).forEach(y -> { results1.put(y, null); results2.put(y, null); }) } catch(Exception e) { //log & do nothing } }); Eventually what I am trying to do is call getResponseForRequest which returns me a result. And then based on the response, for each key in the multimap, store the results in 2 maps results1, and results2.
I think you can use flatMap(). Declare enum: Availability {TITLE, DETAIL} Then mapping: x.entrySet().parallelStream() .map(entry -> { // Construct mapping between y and Boolean here return Map<Y, EnumMap<Availability, Boolean>>; }) .flatMap(v -> v.entrySet().stream()) .collect(v -> toMap(v.getKey(), v.getValue(), (v1, v2) -> YOUR_DEFINE_MAPPER, ConcurrentHashMap::new)); Hope this help
RxJava2 combine multiple observables to make them return single result
How to combine multiple results emmited by observables into one result and emit it once? I have a Retrofit service: public interface MyService { #GET("url") Observable<UserPostsResult> getUserPosts(#Query("userId") int id); #GET("url") Observable<UserPostsResult> getUserPosts(#Query("userId") int id, #Query("page") int pageId); } And I have a model: public class UserPostsResult { #SerializedName("posts") List<UserPost> mPosts; #SerializedName("nextPage") int mPageId; } Also I have ids List<Integer> friendsIds; My goal is to have a method like this one: public Observable<Feed> /*or Single<Feed>*/ getFeed(List<Integer> ids) { ... } It returns one Observable or Single that does the following: Combines all getUserPosts(idFromList) to one observable For each UserPostsResult must do: if (userPostResult.mPageId > -1) getUserPosts(currentUserId, userPostResult.mPageId); And merge this result to the previous userPostResult Return one single model as result of all operations. Result class: public class Feed { List<UserPost> mAllPostsForEachUser; } EDIT (More details): My client specifications was that I must take from social network user posts with no logging in, no token requesting. So I must parse HTML pages. That's why I have this complex structure. EDIT (Partial solution) public Single<List<Post>> getFeed(List<User> users) { return Observable.fromIterable(users) .flatMap(user-> mService.getUserPosts(user.getId()) .flatMap(Observable::fromIterable)) .toList() .doOnSuccess(list -> Collections.sort(list, (o1, o2) -> Long.compare(o1.getTimestamp(), o2.getTimestamp()) )); } This solution doesn't include pages problem. Thats why it is only partial solution
There are a number of operators which transform things into other things. fromIterable() will emit each item in the iterable, and flatMap() will convert one type of observable into another type of observable and emit those results. Observable.fromIterable( friendsIds ) .flatMap( id -> getUserPosts( id ) ) .flatMap( userPostResult -> userPostResult.mPageId ? getUserPosts(currentUserId, userPostResult.mPageId) : Observable.empty() ) .toList() .subscribe( posts -> mAllPostsForEachUser = posts);
If you need join two response in one you should use Single.zip Single.zip(firsSingle.execute(inputParams), secondSingle.execute(inputPrams), BiFunction<FirstResponse, SecondResponse, ResponseEmitted> { firstResponse, secondResponse -> //here you put your code return responseEmmitted } }).subscribe({ response -> },{ })