Closing external process in CompletableFuture chain - java

I'm looking for better way to "close" some resource, here destroy external Process, in CompletableFuture chain. Right now my code looks roughly like this:
public CompletableFuture<ExecutionContext> createFuture()
{
final Process[] processHolder = new Process[1];
return CompletableFuture.supplyAsync(
() -> {
try {
processHolder[0] = new ProcessBuilder(COMMAND)
.redirectErrorStream(true)
.start();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return PARSER.parse(processHolder[0].getInputStream());
}, SCHEDULER)
.applyToEither(createTimeoutFuture(DURATION), Function.identity())
.exceptionally(throwable -> {
processHolder[0].destroyForcibly();
if (throwable instanceof TimeoutException) {
throw new DatasourceTimeoutException(throwable);
}
Throwables.propagateIfInstanceOf(throwable, DatasourceException.class);
throw new DatasourceException(throwable);
});
}
The problem I see is a "hacky" one-element array which holds reference to the process, so that it can be closed in case of error. Is there some CompletableFuture API which allows to pass some "context" to exceptionally (or some other method to achieve that)?
I was considering custom CompletionStage implementation, but it looks like a big task to get rid of "holder" variable.

There is no need to have linear chain of CompletableFutures. Well actually, you already haven’t due to the createTimeoutFuture(DURATION) which is quite convoluted for implementing a timeout. You can simply put it this way:
public CompletableFuture<ExecutionContext> createFuture() {
CompletableFuture<Process> proc=CompletableFuture.supplyAsync(
() -> {
try {
return new ProcessBuilder(COMMAND).redirectErrorStream(true).start();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}, SCHEDULER);
CompletableFuture<ExecutionContext> result
=proc.thenApplyAsync(process -> PARSER.parse(process.getInputStream()), SCHEDULER);
proc.thenAcceptAsync(process -> {
if(!process.waitFor(DURATION, TimeUnit.WHATEVER_DURATION_REFERS_TO)) {
process.destroyForcibly();
result.completeExceptionally(
new DatasourceTimeoutException(new TimeoutException()));
}
});
return result;
}
If you want to keep the timout future, perhaps you consider the process startup time to be significant, you could use
public CompletableFuture<ExecutionContext> createFuture() {
CompletableFuture<Throwable> timeout=createTimeoutFuture(DURATION);
CompletableFuture<Process> proc=CompletableFuture.supplyAsync(
() -> {
try {
return new ProcessBuilder(COMMAND).redirectErrorStream(true).start();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}, SCHEDULER);
CompletableFuture<ExecutionContext> result
=proc.thenApplyAsync(process -> PARSER.parse(process.getInputStream()), SCHEDULER);
timeout.exceptionally(t -> new DatasourceTimeoutException(t))
.thenAcceptBoth(proc, (x, process) -> {
if(process.isAlive()) {
process.destroyForcibly();
result.completeExceptionally(x);
}
});
return result;
}

I've used the one item array myself to emulate what would be proper closures in Java.
Another option is using a private static class with fields. The advantages are that it makes the purpose clearer and has a bit less impact on the garbage collector with big closures, i.e. an object with N of fields versus N arrays of length 1. It also becomes useful if you need to close over the same fields in other methods.
This is a de facto pattern, even outside the scope of CompletableFuture and it has been (ab)used long before lambdas were a thing in Java, e.g. anonymous classes. So, don't feel so bad, it's just that Java's evolution didn't provide us with proper closures (yet? ever?).
If you want, you may return values from CompletableFutures inside .handle(), so you can wrap the completion result in full and return a wrapper. In my opinion, this is not any better than manual closures, added the fact that you'll create such wrappers per future.
Subclassing CompletableFuture is not necessary. You're not interested in altering its behavior, only in attaching data to it, which you can do with current Java's final variable capturing. That is, unless you profile and see that creating these closures is actually affecting performance somehow, which I highly doubt.

Related

Java CompletableFuture.allOf().whenComplete() with multiple exceptions

Problem
In Java Official Doc, it says
public static CompletableFuture<Void> allOf(CompletableFuture<?>... cfs)
Returns a new CompletableFuture that is completed when all of the given CompletableFutures complete. If any of the given CompletableFutures complete exceptionally, then the returned CompletableFuture also does so, with a CompletionException holding this exception as its cause.
The doc doesn't specify the case when multiple given CompletableFutures complete exceptionally. For example, in the following code snippet, what will the exception and its cause be if c1, c2, c3 all complete exceptionally?
CompletableFuture.allOf(c1, c2, c3)
.whenComplete((result, exception) -> {
if (exception != null) {
System.out.println("exception occurs");
System.err.println(exception);
} else {
System.out.println("no exception, got result: " + result);
}
})
My experiment 1
Create a completableFuture signal_1 and signal_2 that both completes exceptionally fast. The output shows signal_1 gets passed to .whenComplete() as the cause of exception.
package com.company;
import java.util.concurrent.*;
public class Main {
private static void runTasks(int i) {
System.out.printf("-- input: %s --%n", i);
CompletableFuture<Void> signal_1 = new CompletableFuture<>();
signal_1.completeExceptionally(new RuntimeException("Oh noes!"));
CompletableFuture<Integer> signal_2 = CompletableFuture.supplyAsync(() -> 16 / i);
CompletableFuture.allOf(signal_1, signal_2)
.thenApplyAsync(justVoid -> {
final int num = signal_2.join();
System.out.println(num);
return num;
})
.whenComplete((result, exception) -> {
if (exception != null) {
System.out.println("exception occurs");
System.err.println(exception);
} else {
System.out.println("no exception, got result: " + result);
}
})
.thenApplyAsync(input -> input * 3)
.thenAccept(System.out::println);
}
public static void main(String[] args) {
runTasks(0);
}
}
Output
-- input: 0 --
exception occurs
java.util.concurrent.CompletionException: java.lang.RuntimeException: Oh noes!
Process finished with exit code 0
My experiment 2
Added a 3 second sleep before signal_1 completes exceptionally, so signal_1 should completes after signal_2. However, the output still shows signal_1 gets passed to .whenComplete() as the cause of exception.
package com.company;
import java.util.concurrent.*;
public class Main {
static ExecutorService customExecutorService = Executors.newSingleThreadExecutor();
private static void runTasks(int i) {
System.out.printf("-- input: %s --%n", i);
CompletableFuture<Void> signal_1 = CompletableFuture.supplyAsync(() -> {
try {
TimeUnit.SECONDS.sleep(3);
} catch (InterruptedException e) {
e.printStackTrace();
}
throw new RuntimeException("Oh noes!");
}, customExecutorService);
CompletableFuture<Integer> signal_2 = CompletableFuture.supplyAsync(() -> 16 / i);
CompletableFuture.allOf(signal_1, signal_2)
.thenApplyAsync(justVoid -> {
final int num = signal_2.join();
System.out.println(num);
return num;
})
.whenComplete((result, exception) -> {
if (exception != null) {
System.out.println("exception occurs");
System.err.println(exception);
} else {
System.out.println("no exception, got result: " + result);
}
})
.thenApplyAsync(input -> input * 3)
.thenAccept(System.out::println);
}
public static void main(String[] args) {
runTasks(0);
customExecutorService.shutdown();
}
}
Output
-- input: 0 --
exception occurs
java.util.concurrent.CompletionException: java.lang.RuntimeException: Oh noes!
Process finished with exit code 0
This is largely a repeat of what VGR said in the comments, but it is an important rule of thumb that deserves a full write-up.
In Java, there is an important concept called Unspecified Behaviour. In short, if the docs do not explicitly define what happens in a specific scenario, then the implementation is free to do literally whatever it chooses to, within reason and within the bounds of the other rules that are explicitly defined. This is important because there are several different manifestations of that.
For starters, the result could be platform specific. For some machines, leaving the behaviour undefined allows for some special optimizations that still return a "correct" result. And since Java prioritizes both similar/same behaviour on all platforms as well as performance, choosing not to specify certain aspects of execution allows them to stay true to that promise while still getting the optimization benefits that come with specific platforms.
Another example is when the act of unifying behaviour into a specific action is not currently feasible. If I had to guess, this is most likely what Java is actually doing. In certain instances, Java will design a component with the potential for certain functionality, but will stop short of actually defining and implementing it. This is usually done in instances where building out a full blown solution would be more effort than it is worth, amongst other reasons. Ironically enough, CompletableFuture itself is a good example of this. Java 5 introduced Future's, but only as an interface with generic functionality. CompletableFuture, the implmementation which came in Java 8, later fleshed out and defined all the unspecified behaviour left over from the Java 5 interface.
And lastly, they may avoid defining specified behaviour if choosing specified behaviour would stifle the flexibility of possible implementations. Currently, the method you showed does not have any specified behaviour about which exception will be thrown when the futures fail. This allows any class that later extends CompletableFuture to be able to specify that behaviour for themselves while still maintaining Liskov's Substitution Principle. If you don't already know, LSP says that if a Child class extends a Parent class, then that Child class must follow all the rules of the Parent class. As a result, if the rules (specified behaviour) of the class are too restrictive, then you prevent future implementations/extensions of this class from being able to function without breaking LSP. There are likely some extensions for CompletableFuture that allow you to define exactly what type of Exception is thrown when calling the method. But that's the point - they are extensions that you can choose to opt-in to. If they define it for you, then you are stuck with it unless you implement it yourself, or you go outside the languages standard library.

Completable futures. What's the best way to handle business "exceptions"?

I'm just starting to get familiar with the CompletableFuture tool from Java. I've created a little toy application to model some recurrent use case almost any dev would face.
In this example I simply want to save a thing in a DB, but before doing so I want to check if the thing was already saved.
If the thing is already in the DB the flow (the chain of completable futures) should stop and not save the thing. What I'm doing is throwing an exception so eventually I can handle it and give a good message to the client of the service so he can know what happened.
This is what I've tried so far:
First the code that try to save the thing or throw an error if the thing is already in the table:
repository
.query(thing.getId())
.thenCompose(
mayBeThing -> {
if (mayBeThing.isDefined()) throw new CompletionException(new ThingAlreadyExists());
else return repository.insert(new ThingDTO(thing.getId(), thing.getName()));
And this is the test I'm trying to run:
CompletableFuture<Integer> eventuallyMayBeThing =
service.save(thing).thenCompose(i -> service.save(thing));
try {
eventuallyMayBeThing.get();
} catch (CompletionException ce) {
System.out.println("Completion exception " + ce.getMessage());
try {
throw ce.getCause();
} catch (ThingAlreadyExist tae) {
assert (true);
} catch (Throwable t) {
throw new AssertionError(t);
}
}
This way of doing it I took it from this response: Throwing exception from CompletableFuture ( the first part of the most voted answer ).
However, this is not working. The ThingAlreadyExist is being thrown indeed but it's never being handled by my try catch block.
I mean, this:
catch (CompletionException ce) {
System.out.println("Completion exception " + ce.getMessage());
try {
throw ce.getCause();
} catch (ThingAlreadyExist tae) {
assert (true);
} catch (Throwable t) {
throw new AssertionError(t);
}
is never executed.
I have 2 questions,
Is there a better way?
If not, am I missing something? Why can't I handle the exception in my test?
Thanks!
Update(06-06-2019)
Thanks VGR you are right. This is the code working:
try {
eventuallyMayBeThing.get();
} catch (ExecutionException ce) {
assertThat(ce.getCause(), instanceOf(ThingAlreadyExists.class));
}
By unit testing your code wrapped up in a Future, you’re testing java’s Future framework. You shouldn’t test libraries - you either trust them or you don’t.
Instead, test that your code, in isolation, throws the right exceptions when it should. Break out the logic and test that.
You can also integration test your app to assert that your entire app behaves correctly (regardless of implementation).
You have to be aware of the differences between get() and join().
The method get() is inherited from the Future interface and will wrap exceptions in an ExecutionException.
The method join() is specific to CompletableFuture and will wrap exceptions in a CompletionException, which is an unchecked exception, which makes it more suitable for the functional interfaces which do not declare checked exceptions.
That being said, the linked answer addresses use cases where the function has to do either, return a value or throw an unchecked exception, whereas your use case involves compose, where the function will return a new CompletionStage. This allows an alternative solution like
.thenCompose(mayBeThing -> mayBeThing.isDefined()?
CompletableFuture.failedFuture​(new ThingAlreadyExists()):
repository.insert(new ThingDTO(thing.getId(), thing.getName())))
CompletableFuture.failedFuture has been added in Java 9. If you still need Java 8 support, you may add it to your code base
public static <T> CompletableFuture<T> failedFuture(Throwable t) {
final CompletableFuture<T> cf = new CompletableFuture<>();
cf.completeExceptionally(t);
return cf;
}
which allows an easy migration to a newer Java version in the future.

Java REST optimize data structure access

I have a Java REST application where one endpoint always deals with a ConcurrentMap. I am doing load tests and it is really bad when the load test starts to increase.
What strategies can I implement in order to improve the efficiency of the application?
Should I play around with Jetty threads, as it is the server I'm using? Or is it mainly code? Or both?
The method that becomes the bottleneck is the one below.
Basically I need to read some line from a given file. I can't store it on a DB, so I came up with this processing with a Map. However, I'm aware that for large files it will take long not only to get to the line and I risk the fact that the Map will consume much memory when it has many entries...
dict is the ConcurrentMap.
public String getLine(int lineNr) throws IllegalArgumentException {
if (lineNr > nrLines) {
throw new IllegalArgumentException();
}
if (dict.containsKey(lineNr)) {
return dict.get(lineNr);
}
synchronized (this) {
try (Stream<String> st = Files.lines(doc.toPath())
Optional<String> optionalLine = st.skip(lineNr - 1).findFirst();
if (optionalLine.isPresent()) {
dict.put(lineNr, optionalLine.get());
} else {
nrLines = nrLines > lineNr ? lineNr : nrLines;
throw new IllegalArgumentException();
}
} catch (IOException e) {
e.printStackTrace();
}
return cache.get(lineNr);
}
Mixing up ConcurrentMap with synchronized(this) is probably not the right approach. Classes from java.util.concurrent package are designed for specific use cases and try to optimize synchronization internally.
Instead I'd suggest to first try a well designed caching library and see if the performance is good enough. One example would be Caffeine. As per Population docs it gives you a way to declare how to load the data, even asynchronously:
AsyncLoadingCache<Key, Graph> cache = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(10, TimeUnit.MINUTES)
// Either: Build with a synchronous computation that is wrapped as asynchronous
.buildAsync(key -> createExpensiveGraph(key));
// Or: Build with a asynchronous computation that returns a future
.buildAsync((key, executor) -> createExpensiveGraphAsync(key, executor));
This solution is based on ConcurrentHashMap#computeIfAbsent, with two assumptions:
Multiple threads reading the same file is not a problem.
While the documentations says the computation should be simple and short because of blocking, I believe it is only a problem for same key (or bucket/stripe) access and only for updates (not reads)? In this scenario, it is not a problem, as we either succesfully compute the value or throw IllegalArgumentException.
Using this, we achieve only opening the file once per key, by placing that as the computation required to put a key.
public String getLine(int lineNr) throws IllegalArgumentException {
if (lineNr > nrLines) {
throw new IllegalArgumentException();
}
return cache.computeIfAbsent(lineNr, (l) -> {
try (Stream<String> st = Files.lines(path)) {
Optional<String> optionalLine = st.skip(lineNr - 1).findFirst();
if (optionalLine.isPresent()) {
return optionalLine.get();
} else {
nrLines = nrLines > lineNr ? lineNr : nrLines;
throw new IllegalArgumentException();
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
});
}
I "verified" the second assumption by spawning 3 threads, where:
Thread1 computes key 0 by looping infinitely (blocks forever).
Thread2 attempts to put at key 0, but never does because Thread1 blocks.
Thread3 attempts to put at key 1, and does so immediately.
Try it out, maybe it works or maybe assumptions are wrong and it sucks. The Map uses buckets internally, so the computation may become a bottleneck even with different keys, as it locks the bucket/stripe.

Is it possible to write a Java Collector that does early exit when it has a result?

Is it possible to implement a Collector that stops processing of the stream as soon as an answer is available?
For example, if the Collector is computing an average, and one of the values is NaN, I know the answer is going to be NaN without seeing any more values, so further computation is pointless.
Thanks for the responses. The comments pointed the way to a solution, which I will describe here. It's very much inspired by StreamEx, but adapted to my particular situation.
Firstly, I define an implementation of Stream called XdmStream which in general delegates all methods to an underlying Stream which it wraps.
This immediately gives me the opportunity to define new methods, so for example my users can do stream.last() instead of stream.reduce((first,second)->second), which is a useful convenience.
As an example of a short-circuiting method I have implemented XdmStream.untilFirst(Predicate) as follows (base is the wrapped Stream). The idea of this method is to return a stream that delivers the same results as the original stream, except that when a predicate is satisfied, no more results are delivered.
public XdmStream<T> untilFirst(Predicate<? super XdmItem> predicate) {
Stream<T> stoppable = base.peek(item -> {
if (predicate.test(item)) {
base.close();
}
});
return new XdmStream<T>(stoppable);
}
When I first create the base Stream I call its onClose() method so that a call on close() triggers the supplier of data to stop supplying data.
The close() mechanism doesn't seem particularly well documented (it relies on the concept of a "stream pipeline" and it's not entirely clear when a new stream returned by some method is part of the same pipeline as the original stream) - but it's working for me. I guess I should probably ensure that this is only an optimization, so that the results will still be correct even if the flow of data isn't immediately turned off (e.g. if there is any buffering in the stream).
In addition to Federico's comment, it is possible to emulate a short-circuiting Collector by ceasing accumulation once a certain condition has been met. Though, this method will only be beneficial if accumulation is expensive. Here's an example, but keep in mind that there are flaws with this implementation:
public class AveragingCollector implements Collector<Double, double[], Double> {
private final AtomicBoolean hasFoundNaN = new AtomicBoolean();
#Override
public Supplier<double[]> supplier() {
return () -> new double[2];
}
#Override
public BiConsumer<double[], Double> accumulator() {
return (a, b) -> {
if (hasFoundNaN.get()) {
return;
}
if (b.equals(Double.NaN)) {
hasFoundNaN.set(true);
return;
}
a[0] += b;
a[1]++;
};
}
#Override
public BinaryOperator<double[]> combiner() {
return (a, b) -> {
a[0] += b[0];
a[1] += b[1];
return a;
};
}
#Override
public Function<double[], Double> finisher() {
return average -> average[0] / average[1];
}
#Override
public Set<Characteristics> characteristics() {
return new HashSet<>();
}
}
The following use-case returns Double.NaN, as expected:
public static void main(String args[]) throws IOException {
DoubleStream.of(1, 2, 3, 4, 5, 6, 7, Double.NaN)
.boxed()
.collect(new AveragingCollector()));
}
Instead of using a Collector, you could use Stream.allMatch(..) to terminate the Stream early and use the util classes like LongSummaryStatistics directly. If all values (and at least one) were present, you return them, e.g.:
Optional<LongSummaryStatistics> toLongStats(Stream<OptionalLong> stream) {
LongSummaryStatistics stat = new LongSummaryStatistics();
boolean allPresent = stream.allMatch(opt -> {
if (opt.isEmpty()) return false;
stat.accept(opt.getAsLong());
return true;
});
return allPresent && stat.getCount() > 0 ? Optional.of(stat) : Optional.empty();
}
Instead of a Stream<OptionalLong> you might use a DoubleStream and check for your NaN case.
For the case of NaN, it might be acceptable to consider this an Exceptional outcome, and so throw a custom NaNAverageException, short circuiting the collection operation. Normally using exceptions for normal control flow is a bad practice, however, it may be justified in this case.
Stream<String> s = Stream.of("1","2","ABC", "3");
try
{
double result = s.collect(Collectors.averagingInt(n -> Integer.parseInt(n)));
System.err.println("Average :"+ result);
}
catch (NumberFormatException e)
{
// exception will be thrown it encounters ABC and collector won't go for "3"
e.printStackTrace();
}

Per-key blocking Map in Java

I'm dealing with some third-party library code that involves creating expensive objects and caching them in a Map. The existing implementation is something like
lock.lock()
try {
Foo result = cache.get(key);
if (result == null) {
result = createFooExpensively(key);
cache.put(key, result);
}
return result;
} finally {
lock.unlock();
}
Obviously this is not the best design when Foos for different keys can be created independently.
My current hack is to use a Map of Futures:
lock.lock();
Future<Foo> future;
try {
future = allFutures.get(key);
if (future == null) {
future = executorService.submit(new Callable<Foo>() {
public Foo call() {
return createFooExpensively(key);
}
});
allFutures.put(key, future);
}
} finally {
lock.unlock();
}
try {
return future.get();
} catch (InterruptedException e) {
throw new MyRuntimeException(e);
} catch (ExecutionException e) {
throw new MyRuntimeException(e);
}
But this seems... a little hacky, for two reasons:
The work is done on an arbitrary pooled thread. I'd be happy to have the work
done on the first thread that tries to get that particular key, especially since
it's going to be blocked anyway.
Even when the Map is fully populated, we still go through Future.get() to get
the results. I expect this is pretty cheap, but it's ugly.
What I'd like is to replace cache with a Map that will block gets for a given key until that key has a value, but allow other gets meanwhile. Does any such thing exist? Or does someone have a cleaner alternative to the Map of Futures?
Creating a lock per key sounds tempting, but it may not be what you want, especially when the number of keys is large.
As you would probably need to create a dedicated (read-write) lock for each key, it has impact on your memory usage. Also, that fine granularity may hit a point of diminishing returns given a finite number of cores if concurrency is truly high.
ConcurrentHashMap is oftentimes a good enough solution in a situation like this. It provides normally full reader concurrency (normally readers do not block), and updates can be concurrent up to the level of concurrency level desired. This gives you pretty good scalability. The above code may be expressed with ConcurrentHashMap like the following:
ConcurrentMap<Key,Foo> cache = new ConcurrentHashMap<>();
...
Foo result = cache.get(key);
if (result == null) {
result = createFooExpensively(key);
Foo old = cache.putIfAbsent(key, result);
if (old != null) {
result = old;
}
}
The straightforward use of ConcurrentHashMap does have one drawback, which is that multiple threads may find that the key is not cached, and each may invoke createFooExpensively(). As a result, some threads may do throw-away work. To avoid this, you would want to use the memoizer pattern that's mentioned in "Java Concurrency in Practice".
But then again, the nice folks at Google already solved these problems for you in the form of CacheBuilder:
LoadingCache<Key,Foo> cache = CacheBuilder.newBuilder().
concurrencyLevel(32).
build(new CacheLoader<Key,Foo>() {
public Foo load(Key key) {
return createFooExpensively(key);
}
});
...
Foo result = cache.get(key);
You can use funtom-java-utils - PerKeySynchronizedExecutor.
It will create a lock for each key but will clear it for you immediately when it becomes unused.
It will also grantee memory visibility between invocations with the same key, and is designed to be very fast and minimize the contention between invocations off different keys.
Declare it in your class:
final PerKeySynchronizedExecutor<KEY_CLASS> executor = new PerKeySynchronizedExecutor<>();
Use it:
Foo foo = executor.execute(key, () -> createFooExpensively());
public class Cache {
private static final Set<String> lockedKeys = new HashSet<>();
private void lock(String key) {
synchronized (lockedKeys) {
while (!lockedKeys.add(key)) {
try {
lockedKeys.wait();
} catch (InterruptedException e) {
log.error("...");
throw new RuntimeException(e);
}
}
}
}
private void unlock(String key) {
synchronized (lockedKeys) {
lockedKeys.remove(key);
lockedKeys.notifyAll();
}
}
public Foo getFromCache(String key) {
try {
lock(key);
Foo result = cache.get(key);
if (result == null) {
result = createFooExpensively(key);
cache.put(key, result);
}
return result;
//For different keys it is executed in parallel.
//For the same key it is executed synchronously.
} finally {
unlock(key);
}
}
}
key can be not only a 'String' but any class with correctly overridden 'equals' and 'hashCode' methods.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
It will not work if your back-end is distributed across multiple servers/JVMs.

Categories