Passing results from expensive methods as they come for multiple layers - java

I've got a code that looks similar to this:
List<String> ids = expensiveMethod();
List<String> filteredIds = cheapFilterMethod(ids);
if (!filteredIds.isEmpty()) {
List<SomeEntity> fullEntities = expensiveDatabaseCall(filteredIds);
List<SomeEntity> filteredFullEntities = anotherCheapFilterFunction(fullEntities);
if (!filteredFullEntities.isEmpty()) {
List<AnotherEntity> finalResults = stupidlyExpensiveDatabaseCall(filteredFullEntities);
relativelyCheapMethod(finalResults);
}
}
It's basically a waterfall of a couple expensive methods that, on their own, all either grab something from a database or filter previous database results. This is due to stupidlyExpensiveDatabaseCall, which needs as few leftover entities as possible, hence the exhaustive filtering.
My problem is that the other functions aren't all quite cheap either and thus they block the thread for a couple of seconds while stupidlyExpensiveDatabaseCall is waiting and doing nothing until it gets the whole batch at once.
I'd like to process the results from each method as they come in. I know I could write a thread for each individual method and have some concurrent queue working between them, but that's a load of boilerplate that I'd like to avoid. Is there a more elegant solution?

There's a post about different ways to parallelize, not only the parallelStream() way, but also that consecutive steps run in parallel the way you described, linked by queues. RxJava may suit your need in this respect. Its a more complete variety of the rather fragmentary reactive streams API in java9. But I think, you're only really there if you use a reactive db api along with it.
That's the RxJava way:
public class FlowStream {
#Test
public void flowStream() {
int items = 10;
print("\nflow");
Flowable.range(0, items)
.map(this::expensiveCall)
.map(this::expensiveCall)
.forEach(i -> print("flowed %d", i));
print("\nparallel flow");
Flowable.range(0, items)
.flatMap(v ->
Flowable.just(v)
.subscribeOn(Schedulers.computation())
.map(this::expensiveCall)
)
.flatMap(v ->
Flowable.just(v)
.subscribeOn(Schedulers.computation())
.map(this::expensiveCall)
).forEach(i -> print("flowed parallel %d", i));
await(5000);
}
private Integer expensiveCall(Integer i) {
print("making %d more expensive", i);
await(Math.round(10f / (Math.abs(i) + 1)) * 50);
return i;
}
private void await(int i) {
try {
Thread.sleep(i);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private void print(String pattern, Object... values) {
System.out.println(String.format(pattern, values));
}
}
The maven repo:
<!-- https://mvnrepository.com/artifact/io.reactivex.rxjava2/rxjava -->
<dependency>
<groupId>io.reactivex.rxjava2</groupId>
<artifactId>rxjava</artifactId>
<version>2.2.13</version>
</dependency>

You could use CompleteableFuture to divide up each non-CPU-bound step. The usage is similar to the javascript promise API.
public void loadEntities() {
CompletableFuture.supplyAsync(this::expensiveMethod, Executors.newCachedThreadPool())
.thenApply(this::cheapFilterMethod)
.thenApplyAsync(this::expensiveDatabaseCall)
.thenApply(this::anotherCheapFilterFunction)
.thenApplyAsync(this::stupidlyExpensiveDatabaseCall)
.thenAccept(this::relativelyCheapMethod);
}
private List<String> expensiveMethod() { ... }
private List<String> cheapFilterMethod(List<String> ids) { ... }
private List<SomeEntity> expensiveDatabaseCall(List<String> ids) { ... }
private List<SomeEntity> anotherCheapFilterFunction(List<SomeEntity> entities) { ... }
private List<AnotherEntity> stupidlyExpensiveDatabaseCall(List<SomeEntity> entities) { ... }
private void relativelyCheapMethod(List<AnotherEntity> entities) { ... }
You can also pass your own thread pool at each step if you'd like to have more control over execution.

You can use Java 8 Stream API. It's impossible to process a DB query "as they come in" because the result set will come in all at once. You'd have to change your method to handle single entities.
expensiveMethod().parallelStream()
.filter(this::cheapFilterMethod) // Returns Boolean
.map(this::expensiveDatabaseCallSingle) // Returns SomeEntity
.filter(this::anotherCheapFilterFunction) // Returns boolean for filtered entities
.map(this::stupidlyExpensiveDatabaseCallSingle) // Returns AnotherEntity
.forEach(this::relativelyCheapMethod); // void method
I would also suggest using an ExecutorService to manage your threads so you don't consume all resources just creating a bunch of threads:
ExecutorService threadPool = Executors.newFixedThreadPool(8);
threadPool.submit(this::methodForParallelStream);

Related

logic using functional-style exception handling with java and Vavr

I'm trying to get into basics of functional programming with Java 8 and I have a simple task which is to set a property on the object and then persist it. The database proper type is ltree so it might fail if it contains not allowed characters. I want to process items one-by-one and log exceptions/successes.
I choose to use the Vavr library because Try.of() exception handling and I want to learn to just use it as it seems very helpful.
here is what I came up with but I'm not satisfied enough:
public class PathHandler {
private final DocVersionDAO dao;
public void processWithHandling() {
Try.of(this::process)
.recover(x -> Match(x).of(
Case($(instanceOf(Exception.class)), this::logException)
));
}
private Stream<Try<DocVersion>> logException(Exception e) {
//log exception now but what to return? also I would like to have DocVersion here too..
return null;
}
public Stream<Try<DocVersion>> process() {
return dao.getAllForPathProcessing() //returns Stream<DocVersion>
.map(this::justSetIt)
.map(this::save);
}
public DocVersion justSetIt(DocVersion v) {
String path = Optional.ofNullable(v.getMetadata().getAdditionals().get(Vedantas.PATH))
.orElse(null);
log.info(String.format("document of uuid %s has matadata path %s; setting it", v.getDocument2().getUUID(), path));
v.getDocument2().setPath(path);
return v;
}
#Transactional(propagation = Propagation.REQUIRES_NEW)
public Try<DocVersion> save(DocVersion v) {
return Try.of(() -> dao.save(v));
}
}
the goal is quite simple so could you teach me proper way to do it?
I'm afraid, this will become highly opinionated. Anyway, I try something.
... which happened before I realized, what Vavr actually provides. It attempts to cover everything mentioned here, like immutable data structures and monad syntax sugaring (with the For statement), and goes beyond that by coming up even with pattern matching. It takes a comprehensive set of FP concepts and rebuilds them using Java and it is no surprise Scala comes into one's mind seeing this ("Vavr is greatly inspired by Scala").
Now the foundations of functional programming can't be covered by a single SO post. And it might be problematic to get familiar with them in a language like Java which isn't geared towards it. So perhaps it is better to approach them in their natural habitat like the Scala language, which is still in some proximity to Java, or Haskell, which is not.
Coming back from this detour applying the features of Vavr may be more straight foward for the initiated. But likelely not for the Java developer sitting next to you in the office, who is less willing to go the extra mile and comes up with arguments that can't be just dismissed, like this one: "If we wanted to it that way, we would be a Scala shop". Therefore I'd say, applying Vavr asks for a pragmatic attitute.
To corroborate the Vavra-Scala argument, let's take Vavra's For construct (all Lists mentioned are io.vavr.collection.List), it looks like this:
Iterator<Tuple2<Integer, String>> tuples =
For(List.of(1, 2, 3), i ->
For(List.of(4, 5, 6))
.yield(a -> Tuple.of(i, String.valueOf(a))));
In Scala you'd encounter For and yield this way.
val tuples = for {
i <- 1 to 3
a <- 4 to 6
} yield (i, String.valueOf(a))
All the monad machinery remains under the hood, where Vavra brings more of an approximation, necessarily leaking some internals. For the purpose of learning it might be puzzling to start with Vavra's hybrid creatures.
So what remains of my post is a small time treatment of some FP basics, using the example of the OP, elaborating on immutability and Try on a trench-level, but omitting pattern matching. Here we go:
One of the defining characteristics of FP are functions free of side effects ("pure functions"), which naturally (so to speak) comes along with immutable data structures/objects, which may sound kind of weird. One obvious pay off is, that you don't have to worry, that your operations create unintended changes at some other place. But Java doesn't enforce that in any way, also its immutable collections are only so on a superficial level. From the FP signature characteristics Java only offers higher order functions with java-lambdas.
I used the functional style quite a bit on the job manipulating complicated structures where I stuck to those 2 principles. E.g. load a tree T of objects from a db, do some transformations on it, which meant producing another tree of objects T', sort of one big map operation, place the changes in front of the user to accept or reject them. If accepted, apply the changes to the related JPA entities and persist them. So after the functional transformation two mutations were applied.
I'd propose, to apply FP in this sense and tried to formulate an according version of your code, using an immutable DocVersion class. I chose to simplify the Metadata part for the sake of the example.
I also tried to highlight, how the "exception-free" Try approach (some of it poached from here) could be formulated and utilized some more. Its a small time version of Vavr's Try, hopefully focusing on the essentials. Note its proximity to Java's Optional and the map and flatMap methods in there, which render it an incarnation of the FP concept called monad. It became notorious in a sweep of highly confusing blog posts some years ago usually starting with "What is a monad?" (e.g. this one). They have cost me some weeks of my life, while it is rather easy to get a good intuition of the issue just by using Java streams or Optionals. Miran Lipovaca's "Learn Yourself a Haskell For Great Good" later made good for it to some extent, and Martin Odersky's Scala language.
Boasting with of, map and flatMap, Try would, roughly speaking, qualify for a syntax-sugaring like you find it in C# (linq-expressions) or Scala for-expressions. In Java there is no equivalent, but some attempts to at least compensate a bit are listed here, and Vavr looks like another one. Personally I use the jool library occasionally.
Passing around streams as function results seems not quite canonical to me, since streams are not supposed to get reused. That's also the reason to create a List as an intermediary result in process().
public class PathHandler {
class DocVersionDAO {
public void save(DocVersion v) {
}
public DocVersion validate(DocVersion v) {
return v;
}
public Stream<DocVersion> getAllForPathProcessing() {
return null;
}
}
class Metadata {
#Id
private final Long id;
private final String value;
Metadata() {
this.id = null;
this.value = null;
}
Metadata(Long id, String value) {
this.id = id;
this.value = value;
}
public Optional<String> getValue() {
return Optional.of(value);
}
public Metadata withValue(String value) {
return new Metadata(id, value);
}
}
public #interface Id {
}
class DocVersion {
#Id
private Long id;
private final Metadata metadatata;
public Metadata getMetadatata() {
return metadatata;
}
public DocVersion(Long id) {
this.id = id;
this.metadatata = new Metadata();
}
public DocVersion(Long id, Metadata metadatata) {
this.id = id;
this.metadatata = metadatata;
}
public DocVersion withMetadatata(Metadata metadatata) {
return new DocVersion(id, metadatata);
}
public DocVersion withMetadatata(String metadatata) {
return new DocVersion(id, this.metadatata.withValue(metadatata));
}
}
private DocVersionDAO dao;
public List<DocVersion> process() {
List<Tuple2<DocVersion, Try<DocVersion>>> maybePersisted = dao.getAllForPathProcessing()
.map(d -> augmentMetadata(d, LocalDateTime.now().toString()))
.map(d -> Tuple.of(d, Try.of(() -> dao.validate(d))
.flatMap(this::trySave)))
.peek(i -> i._2.onException(this::logExceptionWithBadPracticeOfUsingPeek))
.collect(Collectors.toList());
maybePersisted.stream()
.filter(i -> i._2.getException().isPresent())
.map(e -> String.format("Item %s caused exception %s", e._1.toString(), fmtException(e._2.getException().get())))
.forEach(this::log);
return maybePersisted.stream()
.filter(i -> !i._2.getException().isPresent())
.map(i -> i._2.get())
.collect(Collectors.toList());
}
private void logExceptionWithBadPracticeOfUsingPeek(Exception exception) {
logException(exception);
}
private String fmtException(Exception e) {
return null;
}
private void logException(Exception e) {
log(fmtException(e));
}
public DocVersion augmentMetadata(DocVersion v, String augment) {
v.getMetadatata().getValue()
.ifPresent(m -> log(String.format("Doc %d has matadata %s, augmenting it with %s", v.id, m, augment)));
return v.withMetadatata(v.metadatata.withValue(v.getMetadatata().value + augment));
}
public Try<DocVersion> trySave(DocVersion v) {
return new Try<>(() -> {
dao.save(v);
return v;
});
}
private void log(String what) {
}
}
Try looks like this
public class Try<T> {
private T result;
private Exception exception;
private Try(T result, Exception exception) {
this.result = result;
this.exception = exception;
}
public static <T> Try<T> of(Supplier<T> f)
{
return new Try<>(f);
}
T get() {
if (result == null) {
throw new IllegalStateException();
}
return result;
}
public void onException(Consumer<Exception> handler)
{
if (exception != null)
{
handler.accept(exception);
}
}
public <U> Try<U> map(Function<T, U> mapper) {
return exception != null ? new Try<>(null, exception) : new Try<>(() -> mapper.apply(result));
}
public <U> Try<U> flatMap(Function<T, Try<U>> mapper) {
return exception != null ? null : mapper.apply(result);
}
public void onError(Consumer<Exception> exceptionHandler) {
if (exception != null) {
exceptionHandler.accept(exception);
}
}
public Optional<Exception> getException() {
return Optional.of(exception);
}
public Try(Supplier<T> r) {
try {
result = r.get();
} catch (Exception e) {
exception = e;
}
}
}

Consolidate/flatten nested lists in Android with RXJava2

I'm struggling to come up with an RXJava2 Solution to "a simple problem". I am not extremely experienced with RXJava beyond the simple use cases.
Suppose I have a Container that looks like:
class Container {
List<A> listOfA;
}
The rest of the model is a series of nested lists like this model:
class Base {
// irrelevant content
}
class A extends Base {
List<B> listOfB;
}
class B extends Base {
// irrelevant content
}
Somewhere in my code, I obtain a Single<Container> like so:
(note: the code/types/etc have been obfuscated/simplified for an easier reading)
disposables = new CompositeDisposable(); // not important here
disposables.add(
interactor.getTheContainer() // This returns a Single<Container>
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribeWith(new DisposableSingleObserver<Container>() {
// on error ommited for clarity
#Override
public void onSuccess(final Container value) {
process(value);
}
})
);
private void process(final Container container) {
List<Base> items = new ArrayList<>();
List<A> listOfA = container.getListOfA();
for (A a : listOfA) {
items.add(a);
items.addAll(a.getListOfB());
}
// do something with "items" - ommited for clarity
}
I have been unsuccessfully trying to convert the method process(Container) to RXJava (maybe I shouldn't but now I want to know).
I can't even begin to list all the stuff I've experimented with, but I'm really new to RXJava 2 (most usages I've done in the past years with RX were simple Observables from Retrofit and nothing too fancy, or even as an Event Bus to replace Otto/Guava), so I am really not well versed in the arts of making good usage of the RX toolset. I think some sort of map should work, but the whole Java syntax gets confusing really fast for me when it comes to anonymous methods.
The question is:
Where should I read/look for ideas how to perform the same operation of the process method but with RXJava2?
Order is important, the final list looks like this with the current method and I need it this way:
0. A1
1. B1.1
2. B1.2
3. B1.nn…
4. A2
5. B2.1
6. B2.2
7. B2.nn…
8. A3
9. B3.1
…
You get the idea.
Any hints? I do not have Retrolambda or Java 8 (nor can use it, it's not my decision and I can't do anything about it).
You were almost there:
List<Base> process(List<A> list) {
List<Base> result = new ArrayList<>();
for (A a : list) {
result.add(a);
result.addAll(a.getListOfB());
}
return result;
}
interactor.getTheContainer() // This returns a Single<Container>
.subscribeOn(Schedulers.io())
.map(new Function<Container, List<Base>>() {
#Override public List<Base> apply(Container c) {
return process(c.getListOfA());
}
})
.observeOn(AndroidSchedulers.mainThread())
.subscribeWith(new DisposableSingleObserver<List<Base>>() {
#Override public void onSuccess(final List<Base> value) {
/* display the list */
}
})
A more "convoluted" solution could replace the map above with some Iterable transformation via IxJava:
.flatMapIterable(new Function<Container, Iterable<A>>() {
#Override public Iterable<A> apply(Container c) {
return c.getListOfA();
}
})
.flatMapIterable(new Function<Iterable<A>, Iterable<Base>>() {
#Override public Iterable<Base> apply(Iterable<A> a) {
return Ix.<Base>just(a).concatWith(a.getListOfB());
}
})
.toList()

How to check if all needed subscribers finished their work?

I have the following scheme when using rx entities:
Observerable1
|
------------------
| | |
S1(*) S2(*) S3 (onCompleted() starts another observable)
|
Observable2
|
S4(*)
I need to know when all of subscribers with * finish their work (S1,S2,S4). As they can execute in different threads I need some sync mechanism or there is some out of the box solution from rx-java ?
Some sample code to illustrate current design:
#Component
public class BatchReader {
#Autowired List<Subscriber<List<Integer>>> subscribers;
public void start() {
ConnectableObservable<List<Integer>> observable1 = createObservable();
subscribers.forEach(observable1::subscribe);
observable1.connect();
}
private ConnectableObservable<List<Integer>> createObservable() {
return Observable.create((Subscriber<? super List<Integer>> subscriber) -> {
try {
subscriber.onStart();
while (someCondition) {
List<Integer> numbers = ...;
subscriber.onNext(numbers);
}
subscriber.onCompleted();
} catch (Exception ex) {
subscriber.onError(ex);
}
}).observeOn(Schedulers.newThread()).publish();
}
}
In S3 I have following logic:
#Component
public class S3 extends Subscriber<List<Integer>> {
#Autowired AnotherBatchReader anotherBatchReader;
....
#Override
public void onCompleted() {
anotherBatchReader.start();
}
...
}
And S4 subscribes in AnotherBatchReader:
#Component
public class AnotherBatchReader {
#Autowired S4<List<Foo>> subscriber4;
public void start() {
Observable<List<Foo>> observable2 = createObservable();
observable2.subscribe(subscriber4);
}
private Observable<List<Foo>> createObservable() {
return Observable.create(subscriber -> {
try {
subscriber.onStart();
while (someConditionBar) {
List<Foo> foo = ...;
subscriber.onNext(foo);
}
subscriber.onCompleted();
}
} catch (RuntimeException ex) {
subscriber.onError(ex);
}
});
}
}
So is there a way to be properly notified when all subscribers I'm interested in done their work ? Is rx supports it out the box ? Or maybe there is a better design that will support it ?
EDIT:
I have separate subscribers, because each one have different behaviour. At the end subscribers with * (S1,S2,S3) will write their data to xml files. But
S1 receives data in onNext(), doing some work, and writes results directly to files
S2 receives data in onNext(), doing some work, accumulates results in field and then writes it with onCompleted
S3 receives data in onNext, doing some work, writes results to DB and after onCompleted is called starts another observable which begin to get data from db and push it to S4
S4 receives data in onNext(), doing some work and writes to files
The reason why I need to write data to DB in S3 is because the results that is generated from received data in onNext() has to be unique, but as I'm getting data in batches from Observable1 I can't guaranty this uniqueness, so DB take care of it.
And of course in S3 I can't just do the same as in S2 (accumulate all results in memory), because the multiplication of results that exists in S3 is significant comparing to S2.
Thanks for the clarifications. It seems to me that judicious application of existing Operators will minimize your code. Now, I don't have all the details, but what you're doing feels a lot like this:
Observable<T> observable1 = ...
.share();
Observable<?> S1 = S1(observable1);
Observable<?> S2 = S2(observable1);
Observable<?> S3 = S3(observable1);
Observable<?> S4 = defer(() -> readFromDatabase()).compose(S4::generate);
Observable.merge(S1,S2,S3.ignoreElements().switchIfEmpty(S4))
.ignoreElements()
.onComplete(...)
.subscribe();
Of course, the details will be different depending on whether the original observable is hot or cold, and the details of S[1-4].
Also, don't try to drive all Subscribers yourself, let the framework do that for you, and you will get so much more out of it - f.e.:
S4 = Observable.create(SyncOnSubscribe.generateStateless(
observer -> observer.onNext(<get List<Foo>>)
))
.takeWhile(list -> someConditionBar);
Edit: this is a case of the XY problem - we've all gone through it...

How can I use fetchMap() with a RecordMapper?

I know I can fetch a map something like this:
this.ctx.select(
shopSubscription.field(SHOP_SUBSCRIPTION.SHOP_ID),
shopSubscription.field(SHOP_SUBSCRIPTION.PAYMENT_GATEWAY_SUBSCRIPTION_ID),
shopSubscription.field(SHOP_SUBSCRIPTION.ADMIN_TOOL_FEATURE_TYPE_ID),
PAYMENT_GATEWAY_SUBSCRIPTION.SUBSCRIPTION_ID_TOKEN
)
.from(PAYMENT_GATEWAY_SUBSCRIPTION)
.join(shopSubscription)
.on(PAYMENT_GATEWAY_SUBSCRIPTION.ID.eq(shopSubscription.field(SHOP_SUBSCRIPTION.PAYMENT_GATEWAY_SUBSCRIPTION_ID))
.and(PAYMENT_GATEWAY_SUBSCRIPTION.PAYMENT_GATEWAY_TYPE_ID.eq(paymentGatewayType)))
.fetchMap(PAYMENT_GATEWAY_SUBSCRIPTION.PAYMENT_GATEWAY_TYPE_ID, ShopSubscriptionDTO.class);
but to detect issues at compile time I'd prefer if I could additionally add a RecordMapper to this query.
So is there a way to call fetchMap() but also provide a RecordMapper?
What I'm thinking of would look something like this:
this.ctx.select(
shopSubscription.field(SHOP_SUBSCRIPTION.SHOP_ID),
shopSubscription.field(SHOP_SUBSCRIPTION.PAYMENT_GATEWAY_SUBSCRIPTION_ID),
shopSubscription.field(SHOP_SUBSCRIPTION.ADMIN_TOOL_FEATURE_TYPE_ID),
PAYMENT_GATEWAY_SUBSCRIPTION.SUBSCRIPTION_ID_TOKEN
)
.from(PAYMENT_GATEWAY_SUBSCRIPTION)
.join(shopSubscription)
.on(PAYMENT_GATEWAY_SUBSCRIPTION.ID.eq(shopSubscription.field(SHOP_SUBSCRIPTION.PAYMENT_GATEWAY_SUBSCRIPTION_ID))
.and(PAYMENT_GATEWAY_SUBSCRIPTION.PAYMENT_GATEWAY_TYPE_ID.eq(paymentGatewayType)))
// For each record apply the map() function
.map(new RecordMapper<Record<?>, ShopSubscriptionDTO>() {
#Override
public ShopSubscriptionDTO map(Record<?> record) {
ShopSubscriptionDTO shopSubscriptionDto = new ShopSubscriptionDTO();
shopSubscriptionDto.setShopId(record.getValue(SHOP_SUBSCRIPTION.SHOP_ID)
// ...
return shopSubscriptionDto;
}
});
// Fetch the result into a map where the key is SHOP_SUBSCRIPTION.ADMIN_TOOL_FEATURE_TYPE_ID
.fetchMap(SHOP_SUBSCRIPTION.ADMIN_TOOL_FEATURE_TYPE_ID);
Since there are quite a lot of different implementations of fetchMap() I didn't see that there is fetchMap(Field<K>, RecordMapper<? super R, R>) too. So just going with that helps solving this issue:
// ...
.fetchMap(ADMIN_TOOL_ADD_ON.ADMIN_TOOL_ADD_ON_TYPE_ID, new RecordMapper<Record, AdminToolAddOnDTO>() {
#Override
public AdminToolAddOnDTO map(Record record) {
AdminToolAddOnDTO dto = new AdminToolAddOnDTO();
dto.setId(record.getValue(ADMIN_TOOL_ADD_ON.ID));
dto.setAdminToolFeatureTypeId(record.getValue(ADMIN_TOOL_ADD_ON.ADMIN_TOOL_FEATURE_TYPE_ID));
dto.setAdminToolAddOnTypeId(record.getValue(ADMIN_TOOL_ADD_ON.ADMIN_TOOL_ADD_ON_TYPE_ID));
dto.setPrice(record.getValue(ADMIN_TOOL_ADD_ON.PRICE));
dto.setCountryId(record.getValue(ADMIN_TOOL_ADD_ON.COUNTRY_ID));
dto.setAddOnIdToken(record.getValue(ADMIN_TOOL_ADD_ON_TYPE.ADD_ON_ID_TOKEN));
return dto;
}
});
java 8 or higher
.fetchMap(CN_TASKS.AGENTID,
r -> new CnTaskMessage(r.getValue(CN_TASKS.CN_TASKID), r.getValue(CN_TASKS.TASK_TYPE),
r.getValue(CN_TASKS.STATUS)));

Thread-safe cache of one object in java

let's say we have a CountryList object in our application that should return the list of countries. The loading of countries is a heavy operation, so the list should be cached.
Additional requirements:
CountryList should be thread-safe
CountryList should load lazy (only on demand)
CountryList should support the invalidation of the cache
CountryList should be optimized considering that the cache will be invalidated very rarely
I came up with the following solution:
public class CountryList {
private static final Object ONE = new Integer(1);
// MapMaker is from Google Collections Library
private Map<Object, List<String>> cache = new MapMaker()
.initialCapacity(1)
.makeComputingMap(
new Function<Object, List<String>>() {
#Override
public List<String> apply(Object from) {
return loadCountryList();
}
});
private List<String> loadCountryList() {
// HEAVY OPERATION TO LOAD DATA
}
public List<String> list() {
return cache.get(ONE);
}
public void invalidateCache() {
cache.remove(ONE);
}
}
What do you think about it? Do you see something bad about it? Is there other way to do it? How can i make it better? Should i look for totally another solution in this cases?
Thanks.
google collections actually supplies just the thing for just this sort of thing: Supplier
Your code would be something like:
private Supplier<List<String>> supplier = new Supplier<List<String>>(){
public List<String> get(){
return loadCountryList();
}
};
// volatile reference so that changes are published correctly see invalidate()
private volatile Supplier<List<String>> memorized = Suppliers.memoize(supplier);
public List<String> list(){
return memorized.get();
}
public void invalidate(){
memorized = Suppliers.memoize(supplier);
}
Thanks you all guys, especially to user "gid" who gave the idea.
My target was to optimize the performance for the get() operation considering the invalidate() operation will be called very rare.
I wrote a testing class that starts 16 threads, each calling get()-Operation one million times. With this class I profiled some implementation on my 2-core maschine.
Testing results
Implementation Time
no synchronisation 0,6 sec
normal synchronisation 7,5 sec
with MapMaker 26,3 sec
with Suppliers.memoize 8,2 sec
with optimized memoize 1,5 sec
1) "No synchronisation" is not thread-safe, but gives us the best performance that we can compare to.
#Override
public List<String> list() {
if (cache == null) {
cache = loadCountryList();
}
return cache;
}
#Override
public void invalidateCache() {
cache = null;
}
2) "Normal synchronisation" - pretty good performace, standard no-brainer implementation
#Override
public synchronized List<String> list() {
if (cache == null) {
cache = loadCountryList();
}
return cache;
}
#Override
public synchronized void invalidateCache() {
cache = null;
}
3) "with MapMaker" - very poor performance.
See my question at the top for the code.
4) "with Suppliers.memoize" - good performance. But as the performance the same "Normal synchronisation" we need to optimize it or just use the "Normal synchronisation".
See the answer of the user "gid" for code.
5) "with optimized memoize" - the performnce comparable to "no sync"-implementation, but thread-safe one. This is the one we need.
The cache-class itself:
(The Supplier interfaces used here is from Google Collections Library and it has just one method get(). see http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/base/Supplier.html)
public class LazyCache<T> implements Supplier<T> {
private final Supplier<T> supplier;
private volatile Supplier<T> cache;
public LazyCache(Supplier<T> supplier) {
this.supplier = supplier;
reset();
}
private void reset() {
cache = new MemoizingSupplier<T>(supplier);
}
#Override
public T get() {
return cache.get();
}
public void invalidate() {
reset();
}
private static class MemoizingSupplier<T> implements Supplier<T> {
final Supplier<T> delegate;
volatile T value;
MemoizingSupplier(Supplier<T> delegate) {
this.delegate = delegate;
}
#Override
public T get() {
if (value == null) {
synchronized (this) {
if (value == null) {
value = delegate.get();
}
}
}
return value;
}
}
}
Example use:
public class BetterMemoizeCountryList implements ICountryList {
LazyCache<List<String>> cache = new LazyCache<List<String>>(new Supplier<List<String>>(){
#Override
public List<String> get() {
return loadCountryList();
}
});
#Override
public List<String> list(){
return cache.get();
}
#Override
public void invalidateCache(){
cache.invalidate();
}
private List<String> loadCountryList() {
// this should normally load a full list from the database,
// but just for this instance we mock it with:
return Arrays.asList("Germany", "Russia", "China");
}
}
Whenever I need to cache something, I like to use the Proxy pattern.
Doing it with this pattern offers separation of concerns. Your original
object can be concerned with lazy loading. Your proxy (or guardian) object
can be responsible for validation of the cache.
In detail:
Define an object CountryList class which is thread-safe, preferably using synchronization blocks or other semaphore locks.
Extract this class's interface into a CountryQueryable interface.
Define another object, CountryListProxy, that implements the CountryQueryable.
Only allow the CountryListProxy to be instantiated, and only allow it to be referenced
through its interface.
From here, you can insert your cache invalidation strategy into the proxy object. Save the time of the last load, and upon the next request to see the data, compare the current time to the cache time. Define a tolerance level, where, if too much time has passed, the data is reloaded.
As far as Lazy Load, refer here.
Now for some good down-home sample code:
public interface CountryQueryable {
public void operationA();
public String operationB();
}
public class CountryList implements CountryQueryable {
private boolean loaded;
public CountryList() {
loaded = false;
}
//This particular operation might be able to function without
//the extra loading.
#Override
public void operationA() {
//Do whatever.
}
//This operation may need to load the extra stuff.
#Override
public String operationB() {
if (!loaded) {
load();
loaded = true;
}
//Do whatever.
return whatever;
}
private void load() {
//Do the loading of the Lazy load here.
}
}
public class CountryListProxy implements CountryQueryable {
//In accordance with the Proxy pattern, we hide the target
//instance inside of our Proxy instance.
private CountryQueryable actualList;
//Keep track of the lazy time we cached.
private long lastCached;
//Define a tolerance time, 2000 milliseconds, before refreshing
//the cache.
private static final long TOLERANCE = 2000L;
public CountryListProxy() {
//You might even retrieve this object from a Registry.
actualList = new CountryList();
//Initialize it to something stupid.
lastCached = Long.MIN_VALUE;
}
#Override
public synchronized void operationA() {
if ((System.getCurrentTimeMillis() - lastCached) > TOLERANCE) {
//Refresh the cache.
lastCached = System.getCurrentTimeMillis();
} else {
//Cache is okay.
}
}
#Override
public synchronized String operationB() {
if ((System.getCurrentTimeMillis() - lastCached) > TOLERANCE) {
//Refresh the cache.
lastCached = System.getCurrentTimeMillis();
} else {
//Cache is okay.
}
return whatever;
}
}
public class Client {
public static void main(String[] args) {
CountryQueryable queryable = new CountryListProxy();
//Do your thing.
}
}
Your needs seem pretty simple here. The use of MapMaker makes the implementation more complicated than it has to be. The whole double-checked locking idiom is tricky to get right, and only works on 1.5+. And to be honest, it's breaking one of the most important rules of programming:
Premature optimization is the root of
all evil.
The double-checked locking idiom tries to avoid the cost of synchronization in the case where the cache is already loaded. But is that overhead really causing problems? Is it worth the cost of more complex code? I say assume it is not until profiling tells you otherwise.
Here's a very simple solution that requires no 3rd party code (ignoring the JCIP annotation). It does make the assumption that an empty list means the cache hasn't been loaded yet. It also prevents the contents of the country list from escaping to client code that could potentially modify the returned list. If this is not a concern for you, you could remove the call to Collections.unmodifiedList().
public class CountryList {
#GuardedBy("cache")
private final List<String> cache = new ArrayList<String>();
private List<String> loadCountryList() {
// HEAVY OPERATION TO LOAD DATA
}
public List<String> list() {
synchronized (cache) {
if( cache.isEmpty() ) {
cache.addAll(loadCountryList());
}
return Collections.unmodifiableList(cache);
}
}
public void invalidateCache() {
synchronized (cache) {
cache.clear();
}
}
}
I'm not sure what the map is for. When I need a lazy, cached object, I usually do it like this:
public class CountryList
{
private static List<Country> countryList;
public static synchronized List<Country> get()
{
if (countryList==null)
countryList=load();
return countryList;
}
private static List<Country> load()
{
... whatever ...
}
public static synchronized void forget()
{
countryList=null;
}
}
I think this is similar to what you're doing but a little simpler. If you have a need for the map and the ONE that you've simplified away for the question, okay.
If you want it thread-safe, you should synchronize the get and the forget.
What do you think about it? Do you see something bad about it?
Bleah - you are using a complex data structure, MapMaker, with several features (map access, concurrency-friendly access, deferred construction of values, etc) because of a single feature you are after (deferred creation of a single construction-expensive object).
While reusing code is a good goal, this approach adds additional overhead and complexity. In addition, it misleads future maintainers when they see a map data structure there into thinking that there's a map of keys/values in there when there is really only 1 thing (list of countries). Simplicity, readability, and clarity are key to future maintainability.
Is there other way to do it? How can i make it better? Should i look for totally another solution in this cases?
Seems like you are after lazy-loading. Look at solutions to other SO lazy-loading questions. For example, this one covers the classic double-check approach (make sure you are using Java 1.5 or later):
How to solve the "Double-Checked Locking is Broken" Declaration in Java?
Rather than just simply repeat the solution code here, I think it is useful to read the discussion about lazy loading via double-check there to grow your knowledge base. (sorry if that comes off as pompous - just trying teach to fish rather than feed blah blah blah ...)
There is a library out there (from atlassian) - one of the util classes called LazyReference. LazyReference is a reference to an object that can be lazily created (on first get). it is guarenteed thread safe, and the init is also guarenteed to only occur once - if two threads calls get() at the same time, one thread will compute, the other thread will block wait.
see a sample code:
final LazyReference<MyObject> ref = new LazyReference() {
protected MyObject create() throws Exception {
// Do some useful object construction here
return new MyObject();
}
};
//thread1
MyObject myObject = ref.get();
//thread2
MyObject myObject = ref.get();
This looks ok to me (I assume MapMaker is from google collections?) Ideally you wouldn't need to use a Map because you don't really have keys but as the implementation is hidden from any callers I don't see this as a big deal.
This is way to simple to use the ComputingMap stuff. You only need a dead simple implementation where all methods are synchronized, and you should be fine. This will obviously block the first thread hitting it (getting it), and any other thread hitting it while the first thread loads the cache (and the same again if anyone calls the invalidateCache thing - where you also should decide whether the invalidateCache should load the cache anew, or just null it out, letting the first attempt at getting it again block), but then all threads should go through nicely.
Use the Initialization on demand holder idiom
public class CountryList {
private CountryList() {}
private static class CountryListHolder {
static final List<Country> INSTANCE = new List<Country>();
}
public static List<Country> getInstance() {
return CountryListHolder.INSTANCE;
}
...
}
Follow up to Mike's solution above. My comment didn't format as expected... :(
Watch out for synchronization issues in operationB, especially since load() is slow:
public String operationB() {
if (!loaded) {
load();
loaded = true;
}
//Do whatever.
return whatever;
}
You could fix it this way:
public String operationB() {
synchronized(loaded) {
if (!loaded) {
load();
loaded = true;
}
}
//Do whatever.
return whatever;
}
Make sure you ALWAYS synchronize on every access to the loaded variable.

Categories