I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.
Related
I recently read about Quasar which provides "lightweight" / Go-like "user mode" threads to the JVM (it also has an Erlang inspired Actor system like Akka but that's not the main question)
For example:
package jmodern;
import co.paralleluniverse.fibers.Fiber;
import co.paralleluniverse.strands.Strand;
import co.paralleluniverse.strands.channels.Channel;
import co.paralleluniverse.strands.channels.Channels;
public class Main {
public static void main(String[] args) throws Exception {
final Channel<Integer> ch = Channels.newChannel(0);
new Fiber<Void>(() -> {
for (int i = 0; i < 10; i++) {
Strand.sleep(100);
ch.send(i);
}
ch.close();
}).start();
new Fiber<Void>(() -> {
Integer x;
while((x = ch.receive()) != null)
System.out.println("--> " + x);
}).start().join(); // join waits for this fiber to finish
}
}
As far as I understand the code above doesn't spawn any JVM / Kernel threads, all is done in user mode threads (or so they claim) which is supposed to be cheaper (just like Go co-routines if I understood correctly)
My question is this - as far as I understand, in Akka, everything is still based on JVM Threads which is most of the time maps to native OS kernel threads (e.g. pthreads in POSIX systems), e.g. to the best of my understanding there are no user-mode threads / go like co-routines / lightweight threads in Akka, did I understand correctly?
If so, then do you know if it's a design choice? or there is a plan for go-like lightweight threads in Akka in the future?
My understanding is that if you have a million Actors but most of them are blocking then Akka can handle it with much less physical threads, but if most of them are non blocking and you still need some responsiveness from the system (e.g. service million of small requests for streaming some data) then I can see the benefits of a user mode threading implementation, which can allow many more "threads" to be alive with a lower cost of creating switching and terminating (of course the only benefit is evenly dividing responsiveness for many clients, but responsiveness is still important)
Is my understanding correct more or less? please correct me if I'm wrong.
*I might be completely confusing user-mode threads with go/co-routines and lightweight threads, the question above relies on my poor understanding that they are all one of the same.
Akka is a very flexible library and it allows you to schedule actors using (essentially it boils down to that through a chain of traits) a simple trait ExecutionContext, which, as you can see, accepts Runnables and somehow executes them. So, as far as I can see, it is likely that it is possible to write a binding to something like Quasar and use it as a "backend" for Akka actors.
However, Quasar and similar libraries are likely to provide special utilities for communication between fibers. I also don't know how they would handle blocking tasks like I/O, probably they would have a mechanism for that too. I'm not sure if Akka will be able to run correctly over green threads because of this. Quasar also seems to rely on bytecode instrumentation, and this is a rather advanced technique which can have a lot of implications preventing it from backing Akka.
However, you shouldn't really worry about lightweightness of threads when using Akka actors. In fact, Akka is perfectly able to create millions of actors on the single system (see here), and all these actors will work just fine.
This is achieved via clever scheduling over special kinds of thread pools, like fork-join thread pool. This means that unless actors are blocked on some long-running computation they can run over a number of threads significantly less than the number of these actors. For example, you can create a thread pool which will use at most 8 threads (one for each core of 8-core processor), and all actors activities will be scheduled on these threads.
Akka flexibility allows you to configure exact dispatcher to use for specific actors, if it is needed. You can create dedicated dispatchers for actors which stay in long-running tasks, like database access. See here for more information.
So, in short, no, you don't need userland threads for actors, because actors don't map one-to-one to native threads (unless you force them to, that is, but this should be avoided at all costs).
Akka actors are essentially asynchronous and that's why you can have a lot of them, while Quasar actors (yes, Quasar offers an actors implementation too), that are very close to Erlang's, are synchronous or blocking but they use fibers rather than Java (i.e. at present OS) threads, so still you can have a lot of them with the same performance as async but a programming style just as straightforward as when using regular threads and blocking calls.
I/O is handled through integrations: Quasar includes a framework to convert both async and sync APIs into fiber-blocking and Comsat include many such integrationsalready (one with Java NIO is included in Quasar directly).
My reply to another question contains further info.
I have a web-service that write files to disk and other stuff to database. The entire operation takes 1-2 seconds for each write.
The service can, bur that is unlikely, be called from several clients at the same time. Let´s assume that 20 clients call the webservice at the same time, the write operations must be synchronized. In that case, some clients can get a time out exception because they have to wait to many seconds.
Are there any good practices to solve these kind of situations? As it is now, the methods are synchronized (and that can cause the starvation/timeouts).
Should I let all threads get into the write method by removing the synchronized keyword and put their task into a task queue to avoid a timeout? Is that the correct way to get arount this?
Removing the synchronized and putting it into a task queue by itself will not help you (because that's effectively what the synchronized is doing for you). However if you respond to the web request as soon as you put it on the queue, then you will reduce your response fime. But at the cost of some reliability as the user will get a confirmation that the work is done and the work will not really have been done (the system could crash before the work is done).
Francis Upton's practice is indeed an accepted practice.
Another one, is making more fine grained synchronization. Instead of synchronizing all read/write methods of a class, you can synchronize access of the exact invariants that should be synchronized.
And yet even better, is to get rid of synchronization altogether. This is possible using the java.util.concurrent package. This package introduce new collections that use Non-Blocking Algorithms (implemented in java using Compare-Ans-Swap atomic instructions). These collections, such as ConcurrentHashMap, enable much better throughput when scaling.
You can read more about it in this article.
In this type of implementation (slow service under increasing load) you want to make as much as possible async, including the timeout processing (if server-based) and the required I/O. Don't hold up your client response threads waiting for either of these time-consuming operations, to preserve the server's responsiveness to new requests, but instead fire off the required operations (maybe to a dynamic thread pool) and let callbacks process the results, whether timeout, complete I/O, or errors.
Send the appropriate response depending on what happens first, but be prepared to roll back I/O if you send an error/timeout message and then a completed I/O arrives (due to a race condition between I/O and timer). This implies transactional semantics are required in the server.
This is an area that get increasingly complex as your load grows but good design early on should allow you to scale as load grows. Ideally the client servicing threads should not block at all.
I am also thinking of integrating the disruptor pattern in our application. I am a bit unsure about a few things before I start using the disruptor
I have 3 producers, mainly a FIX thread which de-serialises the requests. Another thread which continously modifies order price as the market moves. Also we have one more thread which is responsible for de-serialising the requests sent from a GUI application. All three threads currently write to a Blocking Queue (hence we see a lot of contention on the queue)
The disruptor talks about a Single writer principle and from what I have read that approach scales the best. Is there any way we could make the above three threads obey the single writer principle?
Also in a typical request/response application, specially in our case we have contention on an in memory cache, as we need to lock the cache when we update the cache with the response, whilst a request might be happening for the same order. How do we handle this through the disruptor, i.e. how do I tie up a response to a particular request? Can I eliminate the lock on the cache if yes how?
Any suggestions/pointers would be highly appreciated. We are currently using Java 1.6
I'm new to distruptor and am trying to understand as much usecases as possible. I have tried to answer your questions.
Yes, Disruptor can be used to sequence calls from multiple
producers. I understand that all 3 threads try to update the state
of a shared object. And a single consumer which takes necessary action on the shared object. Internally you can have the single consumer delegate calls to the appropriate single threaded handler based on responsibility. The
The Disruptor exactly does this. It sequences the calls such that
the state is accessed only by a thread at a time. If there's a specific order in which the event handlers are to be invoked, set up the memory barrier. The latest version of Disruptor has a DSL that lets you setup the order easily.
The Cache can be abstracted and accessed through the Disruptor. At a time, only a
Reader or a Writer would get access to the cache, since all calls to
the cache are sequential.
Let's say I'm running a server, and set client SocketChannels that I accept as non blocking, and read them through a thread pool's threads. But what does that buy me? I anyway need to read the full client request before processing it, which means I need to make multiple read calls.
I've also come across articles saying that threads should block naturally so it gives a chance to other threads to run. However this won't happen in the aforementioned case as these threads will not block.
So how would non blocking IO be efficient? How to make sense of this all? Some multi-core CPU angle to it perhaps? But how?
EDIT: found a pretty good link that explains it programmatically:
http://rox-xmlrpc.sourceforge.net/niotut/
The problem using blocking IO starts when you want to scale your server program. You'd have to hold a blocking thread-per-request. Many many requests will introduce man many threads. This might make some hard time for a server application that serves thousands and more of IO involving concurrent requests.
Using nio non-blocking IO, this request-to-thread coupling is redundant. You can use any thread to complete the IO operation of any request. This lets you use the great pooling pattern for your IO handling threads, and decrease significantly the thread creation and management overhead. On the other hand, you'd have to work harder to sustain data consistency, but that would be the price of scalability.
Unless you want to use busy waiting (which sounds unlikely) if you want to use non-blocking you usually use a small number of threads (may be only one) and a Selector.
If you are going to use blocking IO, that is when you dedicate one or two threads per connection.
I have multiple threads each one with its own private concurrent queue and all they do is run an infinite loop retrieving messages from it. It could happen that one of the queues doesn't receive messages for a period of time (maybe a couple seconds), and also they could come in big bursts and fast processing is necessary.
I would like to know what would be the most appropriate to do in the first case: use a blocking queue and block the thread until I have more input or do a Thread.yield()?
I want to have as much CPU resources available as possible at a given time, as the number of concurrent threads may increase with time, but also I don't want the message processing to fall behind, as there is no guarantee of when the thread will be reescheduled for execution when doing a yield(). I know that hardware, operating system and other factors play an important role here, but setting that aside and looking at it from a Java (JVM?) point of view, what would be the most optimal?
Always just block on the queues. Java yields in the queues internally.
In other words: You cannot get any performance benefit in the other threads if you yield in one of them rather than just block.
You certainly want to use a blocking queue - they are designed for exactly this purpose (you want your threads to not use CPU time when there is no work to do).
Thread.yield() is an extremely temperamental beast - the scheduler plays a large role in exactly what it does; and one simple but valid implementation is to simply do nothing.
Alternatively, consider converting your implementation to use one of the managed ExecutorService implementations - probably ThreadPoolExecutor.
This may not be appropriate for your use case, but if it is, it removes the whole burden of worrying about thread management from your own code - and these questions about yielding or not simply vanish.
In addition, if better thread management algorithms emerge in future - for example, something akin to Apple's Grand Central Dispatch - you may be able to convert your application to use it with almost no effort.
Another thing that you could do is use the concurrent hash map for your queue. When you do a read it gives you a reference of the object you were looking for, so it is possible you my miss a message that was just put into the queue. But if all this is doing is listening for a message you will catch it the next iteration. It would be different if the messages could be updated by other threads. But there doesn't really seem to be a reason to block that I can see.