I'm looking for what the equivalent to Scala's futures is in Java.
I'm looking for a type of construct that allows me to submit tasks (Runnables / Callables) to a specific thread-pool of my choice, returning futures allowing me to chain some logic (in a non-blocking way) to it when it gets completed. Something like this:
var executor = Executors.newCachedThreadPool();
executor.submit(() -> {
Thread.sleep(5000);
return 666;
}.onComplete(v -> System.out.println(v));
Java's thread-pools (available through the Executors singleton) seem to always return standard Java Futures, which only allow me to call a blocking get(). On the other hand, the CompletableFuture, from what I can understand, is more akin to Scala's promises, and is not tied to a thread-pool.
Does Java provide what I'm looking for? How do people in Java-land deal with these kinds of operations?
You want callback hell? That's a new one.
from what I can understand, is more akin to Scala's promises, and is not tied to a thread-pool.
Incorrect. I think CompletableFuture is precisely what you want :)
There is a default executor that will be used, but you can also specify one explicitly if you prefer - the supplyAsync and runAsync methods have overloads where you can pass in an explicit executor instead, and all the stuff in the chain uses whatever the future you're chaining off of uses.
CompletableFuture.supplyAsync(() -> {
Thread.sleep(5000);
System.out.println("Helloooo there, from stage 1!");
return 666;
}).whenCompleteAsync((result, exception) -> {
Thread.sleep(5000);
System.out.println("Coming to you live from stage 2: " + result);
// result is null if an error has occurred in stage 1.
// exception is null if an error did not occur.
});
NB: If you toss this in a psv main, make sure to add a .get() at the very end; without that, the VM will exit before the future gets a shot to actually do the work. Then, you will see:
> Helloooo there, from stage 1!
> Coming to you live from stage 2: 666
The first string appears after ~5 seconds. the second after 10, and then the VM exits.
Note that it looks like java's future is not, heh, futures. Futures in java suffer from callback hell, and do not solve the 'red / blue' methods issue (that's the issue where invoking anything that (potentially) blocks from async code is a very problematic bug: It is hard to detect statically, almost impossible to test for, and nevertheless will completely ruin your performance in production. Unfortunately, it is relatively hard to realize you've accidentally done something that blocks, and few existing APIs and libraries both document whether they do or not, and then commit to never changing this without considering that change a backwards incompatible update and managing their versions appropriately to reflect this).
These are solvable problems, but it won't be easy: Java will need 'async' or similar, and also a serious effort to add documentation and possibly something that can be compile-time checked, e.g. with an annotation.
But none of that is anywhere on the horizon of java's future. What IS on the horizon, however (real soon now, timespan of a year or 2 at the most) is 'Project Loom' - which adds lightweight threads ('fibers') to java: They represent execution state but cannot, themselves, run on another core. You can make millions of em, no problem. Then you just write:
int priceyOp1 = doSomethingThatTakesLong();
int priceyOp2 = thisIsAlsoSlow(priceyOp1);
and shove the whole thing into a fiber, and even have a hopper concept where a thread pool will realize the fiber it is currently running is now blocked and go fish another fiber out of the pool and run that for a while. THat doesn't make futures completely pointless, but that does probably mean that futures will remain niche, and async/callback hell will not be solved anytime soon.
Related
I am using AWS Lambda with Java programming language, due to some requirement I have to give sleep in my lambda function for 2-3 or in some cases upto 12 seconds, is it good idea to put Thread.sleep() in lambda function or it has any technical consequences.
There are few cases in which doing Thread.sleep is justified.
Polling every few seconds and checking if certain status, which is not in control of your code has changed. E.g. think of checking if remote process somewhere has finished.
You want to mock certain piece of code, so that it "takes" more time than it actually does.
Throttling down piece of code that does multiple operations per second. E.g. requesting multiple resources from a remote server, but throttling down your requests so that you don't overload it.
I'm sure there are quite a few more justifiable reasons. Don't be afraid to sleep your code. Make sure you're sleeping for a justifiable reason. Also make sure your thread model, in which you indeed need to sleep in your code, does not cause deadlocks.
Note that running in AWS Lambda you should optimize your sleeps to as little amount as possible, as you pay for that sweet, sweet CPU time.
If your Lambda use a high amount of memory would be better (and cheaper) to start two different Lambda than wait for 12 seconds.
If you have a sort of workflow, or you need to wait for a specific condition you could evaluate the introduction of AWS Step Functions or (maybe better) send context to an SQS queue with visibility timeout set to twelve second. In this way, the second lambda will wait, at least, 12 seconds before starts.
Basically you can do whatever you want, in this case you will just pay more :-)
The whole idea of Lambda function is to have a function that takes input and produces output and have a single responsibility, similar to plain old functions.
Let's think why you need to use Thread#sleep:
You perform action #1.
Wait until this action is completed.
Perform action #2.
These are 3 different responsibilities. It's too much for any function, including Lambda :-)
Both actions can be separate Lambda functions. With recent addition of Destination, your Lambda #1 can trigger Lambda #2.
In this case there is no need in polling at all.
I am very new to RxJava, although I'm familiar with streams and somewhat with Javascript promises. I'm working with some existing code using RxJava, and some comments that some people have made about that code. I'd like to get more background on this, while I continue to absorb the documentation.
The block in question is this (some names changed):
public ShippingMethodHolder callClientsAsync(ShippingMethodContext shippingContext) {
Single<ShippingMethodResponse> productOneResponseEntity = Single.<ShippingMethodResponse>create(source -> {
source.onSuccess(getProductOneowShippingMethodResponse(shippingContext));
}).subscribeOn(Schedulers.io());
Single<ShippingMethodResponse> productTwoResponseEntity = Single.<ShippingMethodResponse>create(source -> {
source.onSuccess(getProductTwoShippingMethodResponse(shippingContext));
}).subscribeOn(Schedulers.io());
Single<ShippingMethodHolder> singleProductCartResponseHolder = Single.zip(productOneResponseEntity, productTwoResponseEntity,
(dtvResponse, productTwoResponse) -> {
return new ShippingMethodHolder(dtvResponse, productTwoResponse);
});
return singleProductCartResponseHolder.blockingGet();
}
The comment made about this code from people more informed about this essentially says that this is missing RxJava exception handling "and will cause blocking or crashing of the stream". I imagine this refers to the fact that the two async calls have "onSuccess()" calls, but no "onError()" calls.
However, this seems odd to me. The scope that "onSuccess()" is being called isn't for business logic success or failure, but seemingly on RxJava's attempt to make an asynchronous call.
I could use some advice on whether this is really a problem from RxJava's point of view.
create is there mainly to bridge an asynchronous source with the reactive world, but your code seems to call something blockingly just to signal its value. For that, fromCallable is more appropriate and communicates the intent to the reader much better:
Single<ShippingMethodResponse> productOneResponseEntity =
Single.<ShippingMethodResponse>fromCallable(() ->
getProductOneowShippingMethodResponse(shippingContext)
)
.subscribeOn(Schedulers.io());
Depending on your type of application, blockingly wait for the result may not be desirable, especially if the method is called from the UI thread. You could return the zipped Single and keep composing until a final subscribe() can be issued.
The comment made about this code from people more informed about this essentially says that this is missing RxJava exception handling "and will cause blocking or crashing of the stream"
The original create and fromCallable will catch your exception and will try to signal it to the consumer. In this case, blockingGet will rethrow one of the source exceptions on the caller thread and the other (if any) will be routed to the global RxJavaPlugins.onError handler. They probably meant that the caller of your method generally doesn't expect it to throw so they may omit a try-catch around it and fail badly at runtime. Resolving it really depends on what kind or error management you intended in the application.
I need to implement an infinite operation in a Clojure program. I hope the syntax is reasonably clear for Java programmers too:
(defn- my-operation
(future
(while #continue
(do things)
(Thread/sleep polling-time)))
That gives me exactly what I want, the operation in a different thread so it does not block the main one, and also a pretty clear and straightforward syntax without having to deal with native Java functions which would force me to use the dot special form.
But the definition of a Java future is a "representation of the result of an asynchronous computation" and in this case, well, I'm not really doing anything with the result.
Is it wrong to use them this way?.
Is there any technical difference that should worry me compared to starting my own Thread?.
Is this semantically wrong?.
This will tie up a thread in the thread pool that Clojure manages, and uses for such things. The intended use of that thread pool is for short running operations, so it's kind-of a misuse. Also, the intention behind a future is to calculate a value once so it can be dereferenced multiple times, so that's kinds-of a misuse too.
There are lots of other options for long running tasks, including using Java's Executor framework, core-async or starting your own thread.
(defonce background-thread (doto (Thread. (fn []
(while #continue
(do things)
(Thread/sleep polling-time))))
(.setDaemon true)
(.start)))
As others have mentioned, it might be worth thinking about unhandled exceptions. Stuart Sierra's blog post is a great place to start.
The docstring for future says that it returns something you can call deref on. It is a reference type in the same way that delay or atom is, with it's own semantics of how it get's a value and while it's true you could just create a future and let it run forever, if you see a future in some code, it implies that you care about the value it produces.
(clojure.repl/doc future)
-------------------------
clojure.core/future
([& body])
Macro
Takes a body of expressions and yields a future object that will
invoke the body in another thread, and will cache the result and
return it on all subsequent calls to deref/#. If the computation has
not yet finished, calls to deref/# will block, unless the variant of
deref with timeout is used. See also - realized?.
It should also be noted that Futures will swallow all exceptions that take place in them, until they're dereferenced. If you never plan on dereferencing them, be prepared for tears.
I've been bitten multiple times by this, where suddenly everything just stops working for no apparent reason. It's only later that I realize that an exception occurred that I had no idea about.
Along the same vein as what #l0st suggests, this is what I've been using for simple cases:
(defmacro thread [& body]
`(doto (Thread. (fn [] ~#body)
(.start))))
Everything you then pass to thread will be implicitly run in a new thread, which allows you to write:
(thread
(while #continue
(stuff)))
Which is basically what you had before in terms of syntax. Note though that of course this will shadow the macro with the same name if you ever decide to use Clojure.Async.
Ya, dealing with Java interop is a pain, but if you just tuck it away somewhere, it can lead to some nice custom code.
For some reason I can't wrap my head around implementing this. I've got an application running with Play that calls out to Elastic Search. As part of my design, my service uses the Java API wrapped with scala future's as shown in this blog post. I've updated the code from that post to hint to the ExecutionContext that it will be doing some blocking I/O like so:
import scala.concurent.{blocking, Future, Promise}
import org.elasticsearch.action.{ActionRequestBuilder, ActionListener, ActionResponse }
def execute[RB <: ActionRequestBuilder[_, T, _, _]](request: RB): Future[T] = {
blocking {
request.execute(this)
promise.future
}
}
My actual service that constructs the queries to send to ES takes an executionContext as a constructor parameter that it then uses for calls to elastic search. I did this so that the global execution context that play uses won't have it's threads tied down by the blocking calls to ES. This S.O. comment mentions that only the global context is blocking aware, so that leaves me to have to create my own. In that same post/answer there's a lot of information about using a ForkJoin pool, but I'm not sure how to take what's written in those docs and combine it with the hints in the blocking documentation to create an execution context that responds to blocking hints.
I think one of the issues I have is that I'm not sure exactly how to respond to the blocking context in the first place? I was reading the best practices and the example it uses is an unbounded cache of threads:
Note that here I prefer to use an unbounded "cached thread-pool", so it doesn't have a limit. When doing blocking I/O the idea is that you've got to have enough threads that you can block. But if unbounded is too much, depending on use-case, you can later fine-tune it, the idea with this sample being that you get the ball rolling.
So does this mean that with my ForkJoin backed thread pool, that I should try to use a cached thread when dealing with non-blocking I/O and create a new thread for blocking IO? Or something else? Pretty much every resource I find online about using seperate thread pools tends to do what the Neophytes guide does, which is to say:
How to tune your various thread pools is highly dependent on your individual application and beyond the scope of this article.
I know it depends on your application, but in this case if I just want to create some type of blocking aware ExecutionContext and understand a decent strategy for managing the threads. If the Context is specifically for a single part of the application, should I just make a fixed thread pool size and not use/ignore the blocking keyword in the first place?
I tend to ramble, so I'll try to break down what I'm looking for in an answer:
Code! Reading all these docs still leave me like I'm feeling just out of reach of being able to code a blocking-aware context, and I'd really appreciate an example.
Any links or tips on how to handle blocking threads, i.e. make a new thread for them endlessly, check the number of threads available and reject if too many, some other strategy
I'm not looking for performance tips here, I know I'll only get that with testing, but I can't test if I can't figure out how to code the context's in the first place! I did find an example of ForkJoins vs threadpools but I'm missing the crucial part about blocking there.
Sorry for the long question here, I'm just trying to give you a sense of what I'm looking at and that I have been trying to wrap my head around this for over a day and need some outside help.
Edit: Just to make this clear, the ElasticSearch Service's constructor signature is:
//Note that these are not implicit parameters!
class ElasticSearchService(otherParams ..., val executionContext: ExecutionContext)
And in my application start up code I have something like this:
object Global extends GlobalSettings {
val elasticSearchContext = //Custom Context goes here
...
val elasticSearchService = new ElasticSearchService(params, elasticSearchContext);
...
}
I am also reading through Play's recommendations for contexts, but have yet to see anything about blocking hints yet and I suspect I might have to go look into the source to see if they extend the BlockContext trait.
So I dug into the documentation and Play's best practices for the situation I'm dealing with is to
In certain circumstances, you may wish to dispatch work to other thread pools. This may include CPU heavy work, or IO work, such as database access. To do this, you should first create a thread pool, this can be done easily in Scala:
And provides some code:
object Contexts {
implicit val myExecutionContext: ExecutionContext = Akka.system.dispatchers.lookup("my-context")
}
The context is from Akka, so I ran down there searching for the defaults and types of Contexts they offer, which eventually led me to the documentation on dispatchers. The default is a ForkJoinPool whose default method for managing a block is to call the managedBlock(blocker). This led me to reading the documentation that stated:
Blocks in accord with the given blocker. If the current thread is a ForkJoinWorkerThread, this method possibly arranges for a spare thread to be activated if necessary to ensure sufficient parallelism while the current thread is blocked.
So it seems like if I have a ForkJoinWorkerThread then the behavior I think I want will take place. Looking at the source of ForkJoinPool some more I noted that the default thread factory is:
val defaultForkJoinWorkerThreadFactory: ForkJoinWorkerThreadFactory = juc.ForkJoinPool.defaultForkJoinWorkerThreadFactory
Which implies to me that if I use the defaults in Akka, that I'll get a context which handles blocking in the way I expect.
So reading the Akka documentation again it would seem that specifying my context something like this:
my-context {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 8
parallelism-factor = 3.0
parallelism-max = 64
task-peeking-mode = "FIFO"
}
throughput = 100
}
would be what I want.
While I was searching in the source code I did some looking for uses of blocking or of calling managedBlock and found an example of overriding the ForkJoin behavior in ThreadPoolBuilder
private[akka] class AkkaForkJoinWorkerThread(_pool: ForkJoinPool) extends ForkJoinWorkerThread(_pool) with BlockContext {
override def blockOn[T](thunk: ⇒ T)(implicit permission: CanAwait): T = {
val result = new AtomicReference[Option[T]](None)
ForkJoinPool.managedBlock(new ForkJoinPool.ManagedBlocker {
def block(): Boolean = {
result.set(Some(thunk))
true
}
def isReleasable = result.get.isDefined
})
result.get.get // Exception intended if None
}
}
Which seems like what I originally asked for as an example of how to make something that implements the BlockContext. That file also has code showing how to make an ExecutorServiceFactory, which is what I believe
is reference by the executor part of the configuration. So I think what I would do if I wanted to have
a totally custom context would be extend some type of WorkerThread and write my own ExecutorServiceFactory that uses the custom workerthread and then specify the fully qualified class name in the property like this post advises.
I'm probably going to go with using Akka's forkjoin :)
I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.