I've searched the site a bit for help understanding this, but haven't found anything super clear, so I thought I'd post my use case and see if anybody could shed some light.
I have a question about the scaling of jvm threads vs os threads when used in akka for io operations. From the akka site:
Akka supports dispatchers for both event-driven lightweight threads, allowing creation of millions threads on a single workstation, and thread-based Actors, where each dispatcher is bound to a dedicated OS thread.
The event-based Actors currently consume ~600 bytes per Actor which means that you can create more than 6.5 million Actors on 4 G RAM.
In this context, can you all help me understand how that matters on a workstation with only 1 processor (for simplicity). So, for my example use case, I want to take a list of say 1000 'Users' and then go query a database (or several) for various information about each user. So if I were to dispatch each of these 'get' tasks to an actor, and that actor is going to do IO, wouldn't that actor block based on the os thread limit for the workstation?
How does the akka actor model give me lift in a scenario like this? I know that I am probably missing something as I am not wildly knowledgeable on the interworkings of vm threads vs os threads, so if one of the smart folks here could spell it out for me, that would be great.
If I use Futures, don't I need to use await() or get() to block and wait for the reply?
In my use case, regardless of actors, would it end up just 'feeling' like I'm making 1000 sequential database requests?
If code snips are useful in helping me understand this, Java would be preferred as I am still coming up to speed on scala syntax - but a nice clear textual explanation of how these millions of threads can interoperate on a single processor machine while doing database IO would be fine too.
It is really hard to figure out what you are actually asking here, but here are some pointers:
If you are running on a modern JVM, there is typically a one-to-one relationship between Java threads and OS threads. (IIRC, Solaris allows you to do this differently ... but that's the exception.)
The amount of real parallelism you will get using threads, or anything built on top of threads is limited by the number of processors / cores that are available to the application. Beyond that, you will find that not all threads are actually executing at any given instant.
If you have 1000 Actors all trying to access the database "at the same time", then most of them will actually be waiting on the database itself, or on the thread scheduler. Whether this amounts to making 1000 sequential requests (i.e. strict serialization) will depend on the database and the queries / updates that the actors are doing.
The bottom line is that a computer system has hard limits on the resources available for doing stuff; e.g. number of processors, speed of processors, memory bandwidth, disc access times, network bandwidth, etc. You can design an application to be smart about the way it uses available resources, but you can't get it to use more resources than there actually are.
On reading the text that you quoted, it seems to me that it is talking about two different kinds of actors:
Thread-based actors have a 1 to 1 relationship with threads. There's no way you could have millions of this kind of actor in 4Gb memory.
Event-based actors work differently. Instead of having a thread at all times, they would mostly be sitting in a queue waiting for an event to happen. When that happened, an event processing thread would grab the actor from the queue and execute the "action" associated with the event. When the action finished, the thread moves onto another actor / event pair.
The quoted text is saying that the memory overhead of an event-based actor is ~600 bytes. They don't include the event thread ... because the event thread is shared by multiple actors.
Now I'm not an expert on Scala / Actors, but it is pretty obvious that there are certain things that you should avoid when using event-based actors. For instance, you should probably avoid talking directly to an external database because that is liable to block the event processing thread.
I think there may be a typo there. I think they meant to say:
Akka supports dispatchers for both event-driven lightweight actors,
allowing creation of millions actors on a single workstation, and thread-based Actors, where each actor is bound to a dedicated OS thread.
The event-driven actors use a thread pool - all of the (potentially millions of) actors share the same pool of threads. I'm not that familiar with Akka actors but generally you would not want to do blocking I/O with event-driven actors, otherwise you could cause starvation.
Related
I am in the process of designing a system where there's a main stream of objects and there are multiple workers which produces some result from that object. Finally, there is some special/unique worker (sort of a "sink", in terms of graph theory) which takes all the results, and process them to some final object which is written to some DB.
It is possible for a worker to be dependent on the result of some other workers (hence, waiting for their results)
Now, I'm facing several problems:
It could be that one worker is much slower than another. How do you deal with that? Adding more workers (= scaling) of the slower type? (maybe dynamically)
Suppose W_B is dependent on W_A. If W_B is down for some reason then the flow will stop and the system will stop working. So I'd like the system to bypass this worker, somehow.
Moreover, how do the final worker decide when to operate on the set of results? Suppose it has the results of A and B but lacking the result of C. It may be that C is down or it's just very slow at the moment. How can it make a decision?
It is worth mentioning that it's not a realtime application but rather an offline processing system (i.e. you may access the DB and alter a record), but at the same time, it has to deal with relatively large amount of objects in an "high pace".
Regarding technologies,
I'm developing the system with Java but I'm not bounded to a specific technology.
I'd be glad if you could help me with the general design of the system.
Thanks a lot!
As Peter said, it really depends on the use case. Some general remarks though:
If a worker is slower than the other, maybe create more instances of that type; eg Kubernetes allows dynamic Node creation, and Kafka allows to partition a topic so more than one instance can read off and process it.
If B depends on A and A is down, B can't work and that's it. Maybe restart A? Maybe you can do a regular health check on it.
If the final worker needs the results of A, B and C, how would it process without C being available? If it can, it can store the results of A and B, install a timer, and if that goes off without C having arrived, continue.
Some additional thoughts:
If you mean to say that some subtasks of the overall application are quicker to execute than others, then it can be a good idea to slice up the application so that each worker is doing a bit of everything -- in other words, a share of the quick work and a share of the slow work. But if you mean to say that some machines are slower than others, then you could run fewer workers on the slow machines, and more on the faster ones, so as to balance things so that each worker has roughly the same resources.
You might want to decouple your architecture with some sort of durable queueing between the workers.
It's common to use heartbeats with timeouts and restarts.
Distributed stream processing quickly becomes very complex. Your life will be much easier if you build on top a stream processing framework that provides high availability and exactly-once semantics out of the box.
This question already has answers here:
How does Actors work compared to threads?
(2 answers)
Closed 5 years ago.
I'm trying to understand the difference in definition of Actors and threads. Some articles say that actors are lightweight threads, while other articles states that actors are not threads. I also understand that a thread can run multiple actors. I'm confused as to how to exactly differentiate actors from threads. Please help me understand thanks!
Actors are at a higher level of abstraction when compared to Threads.
Thread is a JVM concept, whereas an Actor is a normal java class that runs in the JVM and hence the question is not so much about Actor vs Thread, its more about how Actor uses Threads.
What is an Actor?
At a very simple level, an Actor is an entity that receives messages, one at a time, and reacts to those messages.
How does Actor use Threads?
When Actor receives a message, it performs some action in response. How does the action code run in the JVM? Again, if you simplify the situation, you could imagine the Action executing the action task on the current thread. Also, it is possible that the Actor decides to perform the action task on a thread pool. It does not really matter as long as the Actor makes sure that only one message is processed at a time.
I'm trying to understand the difference in definition of Actors and threads.
Honestly, they really don't have much to do with each other, so it is kinda hard to talk about their difference. It's like talking about the difference between a Toyota Camry and the color blue.
An Actor is a self-contained, encapsulated entity that communicates with other self-contained, encapsulated entities purely by sending messages.
Now, if that sounds exactly like the definition of Object to you, you are right! There is a deep connection between Actors and Objects: the Actor Model of Computation was conceived by Carl Hewitt, partially based on the Message-Directed Execution of early Smalltalks (Smalltalk-71, Smalltalk-72). Alan Kay, in turn, cites PLANNER' Goal-Directed Execution as a heavy influence on Smalltalk's Message-Directed Execution … PLANNER in turn had been designed by Carl Hewitt. (PLANNER is also the precursor to Prolog, which in turn was the precursor to Erlang; Erlang is based on Prolog, started out as a library in Prolog, and the first implementations were written in Prolog.) Also, both Carl Hewitt and Alan Kay cite the early work of Vint Cerf and Bob Kahn on what would later become the Internet as inspiration, and both of them were heavily inspired by nature, Carl Hewitt by physics and Alan Kay by microbiology. So, the deep connections and similarities are very much not surprising.
Actors and Objects are pretty much the same thing. We usually expect Message sends in Object-Oriented Systems to be synchronous, instantaneous, reliable, and ordered, while no such guarantees exist for Actors. That is the main difference between Actors and Objects: Actors are Objects where Messages may get lost, re-ordered, take a long time, and are asynchronous. (The only guarantee is that a Message may get delivered At Most Once.) Also, the Receiver of a Message typically has no implicit reference to the Sender, so if the Receiver wants to reply to the Message, the Sender needs to pass its own reference along with the Message (unlike typical Object-Oriented Systems, where I can always return something to the Sender).
An Actor can do three things, and exactly those three things and nothing more:
Create new Actors
Send Messages to Addresses of Actors it already knows (i.e. Addresses of Actors it created, and Addresses it got sent in a Message)
Designate how to handle the next Message (this is essentially how to model mutable state)
Note that in order to send a Message to an Actor, you need to have an Address for the Actor. You can only send Messages to Addresses. An Actor may have multiple Addresses, and multiple Actors may be hidden behind a single Address. Addresses are unforgeable, you cannot construct one yourself, you have to have it handed to you. (This is an important difference to pointers in C, and more like object references in Java.)
There is a very nice video of a whiteboarding session (mostly) between Carl Hewitt himself and Erik Meijer about Actors on Microsoft's Channel9 community website: Hewitt, Meijer and Szyperski: The Actor Model (everything you wanted to know, but were afraid to ask). It explains among other things the fundamental ideas behind Actors as well as the fundamental Axioms of Actors in an accessible way.
Carl Hewitt was also invited to give a lecture at Stanford's EE380 Computer Systems Colloquium, where he talked about How to Program the Many Cores For Inconsistency Robustness, including an introduction to Actors.
Threads, OTOH are just strands of execution. They are essentially nothing but an instruction pointer and a call stack. In particular, Threads don't have memory (of their own). All Threads of a Process share the same memory. Which means among other things, that a single misbehaving Thread can crash all Threads of the Process, and that all access to this shared memory must be carefully controlled.
Some articles say that actors are lightweight threads, while other articles states that actors are not threads. I also understand that a thread can run multiple actors.
You are confusing the concept of an Actor with a possible implementation.
A very simple implementation of Actors in an existing environment would be to make every Actor a Process. That's how BEAM (the dominant implementation of Erlang) works, for example. This works, because on BEAM, Processes are extremely cheap and lightweight; a process only weighs about 300 Byte, and there can be tens of millions of Processes on a cheap 32 Bit single-core laptop from 10 years ago. This would not work with Windows Processes, for example, which are much more expensive.
There is also an implementation of Erlang on the JVM (called Erjang) which uses Kilim, an implementation of extremely lightweight Actors on the JVM using bytecode re-writing. (Note that both Erjang and Kilim appear to be dead.)
If you want to implement Actors on Threads, you have the problem that Actors are completely isolated, whereas Threads are completely shared. So, you need to provide this isolation somehow.
A common way to implement Actors on systems such as the JVM, is to implement Actors as Objects, and then schedule them to run on a Thread Pool to get the asynchronous properties. That's how Akka works.
I highly recommend to read through official documentation first:
https://doc.akka.io/docs/akka/current/scala/general/actors.html
Behind the scenes Akka will run sets of actors on sets of real threads, where typically many actors share one thread, and subsequent invocations of one actor may end up being processed on different threads. Akka ensures that this implementation detail does not affect the single-threadedness of handling the actor’s state.
A thread is a "sequence of activity" - independent of specific objects. It can execute code that belongs to arbitrary objects.
Whereas an actor is about a specific object that "owns" its own "sequence of activity". But please note that this actor sequence isn't necessarily a thread. To the contrary!
Looking at this from the implementation side, one could use one thread to drive many actor objects.
I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.
I recently read about Quasar which provides "lightweight" / Go-like "user mode" threads to the JVM (it also has an Erlang inspired Actor system like Akka but that's not the main question)
For example:
package jmodern;
import co.paralleluniverse.fibers.Fiber;
import co.paralleluniverse.strands.Strand;
import co.paralleluniverse.strands.channels.Channel;
import co.paralleluniverse.strands.channels.Channels;
public class Main {
public static void main(String[] args) throws Exception {
final Channel<Integer> ch = Channels.newChannel(0);
new Fiber<Void>(() -> {
for (int i = 0; i < 10; i++) {
Strand.sleep(100);
ch.send(i);
}
ch.close();
}).start();
new Fiber<Void>(() -> {
Integer x;
while((x = ch.receive()) != null)
System.out.println("--> " + x);
}).start().join(); // join waits for this fiber to finish
}
}
As far as I understand the code above doesn't spawn any JVM / Kernel threads, all is done in user mode threads (or so they claim) which is supposed to be cheaper (just like Go co-routines if I understood correctly)
My question is this - as far as I understand, in Akka, everything is still based on JVM Threads which is most of the time maps to native OS kernel threads (e.g. pthreads in POSIX systems), e.g. to the best of my understanding there are no user-mode threads / go like co-routines / lightweight threads in Akka, did I understand correctly?
If so, then do you know if it's a design choice? or there is a plan for go-like lightweight threads in Akka in the future?
My understanding is that if you have a million Actors but most of them are blocking then Akka can handle it with much less physical threads, but if most of them are non blocking and you still need some responsiveness from the system (e.g. service million of small requests for streaming some data) then I can see the benefits of a user mode threading implementation, which can allow many more "threads" to be alive with a lower cost of creating switching and terminating (of course the only benefit is evenly dividing responsiveness for many clients, but responsiveness is still important)
Is my understanding correct more or less? please correct me if I'm wrong.
*I might be completely confusing user-mode threads with go/co-routines and lightweight threads, the question above relies on my poor understanding that they are all one of the same.
Akka is a very flexible library and it allows you to schedule actors using (essentially it boils down to that through a chain of traits) a simple trait ExecutionContext, which, as you can see, accepts Runnables and somehow executes them. So, as far as I can see, it is likely that it is possible to write a binding to something like Quasar and use it as a "backend" for Akka actors.
However, Quasar and similar libraries are likely to provide special utilities for communication between fibers. I also don't know how they would handle blocking tasks like I/O, probably they would have a mechanism for that too. I'm not sure if Akka will be able to run correctly over green threads because of this. Quasar also seems to rely on bytecode instrumentation, and this is a rather advanced technique which can have a lot of implications preventing it from backing Akka.
However, you shouldn't really worry about lightweightness of threads when using Akka actors. In fact, Akka is perfectly able to create millions of actors on the single system (see here), and all these actors will work just fine.
This is achieved via clever scheduling over special kinds of thread pools, like fork-join thread pool. This means that unless actors are blocked on some long-running computation they can run over a number of threads significantly less than the number of these actors. For example, you can create a thread pool which will use at most 8 threads (one for each core of 8-core processor), and all actors activities will be scheduled on these threads.
Akka flexibility allows you to configure exact dispatcher to use for specific actors, if it is needed. You can create dedicated dispatchers for actors which stay in long-running tasks, like database access. See here for more information.
So, in short, no, you don't need userland threads for actors, because actors don't map one-to-one to native threads (unless you force them to, that is, but this should be avoided at all costs).
Akka actors are essentially asynchronous and that's why you can have a lot of them, while Quasar actors (yes, Quasar offers an actors implementation too), that are very close to Erlang's, are synchronous or blocking but they use fibers rather than Java (i.e. at present OS) threads, so still you can have a lot of them with the same performance as async but a programming style just as straightforward as when using regular threads and blocking calls.
I/O is handled through integrations: Quasar includes a framework to convert both async and sync APIs into fiber-blocking and Comsat include many such integrationsalready (one with Java NIO is included in Quasar directly).
My reply to another question contains further info.
I have a problem which I believe is the classic master/worker pattern, and I'm seeking advice on implementation. Here's what I currently am thinking about the problem:
There's a global "queue" of some sort, and it is a central place where "the work to be done" is kept. Presumably this queue will be managed by a kind of "master" object. Threads will be spawned to go find work to do, and when they find work to do, they'll tell the master thing (whatever that is) to "add this to the queue of work to be done".
The master, perhaps on an interval, will spawn other threads that actually perform the work to be done. Once a thread completes its work, I'd like it to notify the master that the work is finished. Then, the master can remove this work from the queue.
I've done a fair amount of thread programming in Java in the past, but it's all been prior to JDK 1.5 and consequently I am not familiar with the appropriate new APIs for handling this case. I understand that JDK7 will have fork-join, and that that might be a solution for me, but I am not able to use an early-access product in this project.
The problems, as I see them, are:
1) how to have the "threads doing the work" communicate back to the master telling them that their work is complete and that the master can now remove the work from the queue
2) how to efficiently have the master guarantee that work is only ever scheduled once. For example, let's say this queue has a million items, and it wants to tell a worker to "go do these 100 things". What's the most efficient way of guaranteeing that when it schedules work to the next worker, it gets "the next 100 things" and not "the 100 things I've already scheduled"?
3) choosing an appropriate data structure for the queue. My thinking here is that the "threads finding work to do" could potentially find the same work to do more than once, and they'd send a message to the master saying "here's work", and the master would realize that the work has already been scheduled and consequently should ignore the message. I want to ensure that I choose the right data structure such that this computation is as cheap as possible.
Traditionally, I would have done this in a database, in sort of a finite-state-machine manner, working "tasks" through from start to complete. However, in this problem, I don't want to use a database because of the high volume and volatility of the queue. In addition, I'd like to keep this as light-weight as possible. I don't want to use any app server if that can be avoided.
It is quite likely that this problem I'm describing is a common problem with a well-known name and accepted set of solutions, but I, with my lowly non-CS degree, do not know what this is called (i.e. please be gentle).
Thanks for any and all pointers.
As far as I understand your requirements, you need ExecutorService. ExecutorService have
submit(Callable task)
method which return value is Future. Future is a blocking way to communicate back from worker to master. You could easily expand this mechanism to work is asynchronous manner. And yes, ExecutorService also maintaining work queue like ThreadPoolExecutor. So you don't need to bother about scheduling, in most cases. java.util.concurrent package already have efficient implementations of thread safe queue (ConcurrentLinked queue - nonblocking, and LinkedBlockedQueue - blocking).
Check out java.util.concurrent in the Java library.
Depending on your application it might be as simple as cobbling together some blocking queue and a ThreadPoolExecutor.
Also, the book Java Concurrency in Practice by Brian Goetz might be helpful.
First, why do you want to hold the items after a worker started doing them? Normally, you would have a queue of work and a worker takes items out of this queue. This would also solve the "how can I prevent workers from getting the same item"-problem.
To your questions:
1) how to have the "threads doing the
work" communicate back to the master
telling them that their work is
complete and that the master can now
remove the work from the queue
The master could listen to the workers using the listener/observer pattern
2) how to efficiently have the master
guarantee that work is only ever
scheduled once. For example, let's say
this queue has a million items, and it
wants to tell a worker to "go do these
100 things". What's the most efficient
way of guaranteeing that when it
schedules work to the next worker, it
gets "the next 100 things" and not
"the 100 things I've already
scheduled"?
See above. I would let the workers pull the items out of the queue.
3) choosing an appropriate data
structure for the queue. My thinking
here is that the "threads finding work
to do" could potentially find the same
work to do more than once, and they'd
send a message to the master saying
"here's work", and the master would
realize that the work has already been
scheduled and consequently should
ignore the message. I want to ensure
that I choose the right data structure
such that this computation is as cheap
as possible.
There are Implementations of a blocking queue since Java 5
Don't forget Jini and Javaspaces. What you're describing sounds very like the classic producer/consumer pattern that space-based architectures excel at.
A producer will write the jobs into the space. 1 or more consumers will take out jobs (under a transaction) and work on that in parallel, and then write the results back. Since it's under a transaction, if a problem occurs the job is made available again for another consumer .
You can scale this trivially by adding more consumers. This works especially well when the consumers are separate VMs and you scale across the network.
If you are open to the idea of Spring, then check out their Spring Integration project. It gives you all the queue/thread-pool boilerplate out of the box and leaves you to focus on the business logic. Configuration is kept to a minimum using #annotations.
btw, the Goetz is very good.
This doesn't sound like a master-worker problem, but a specialized client above a threadpool. Given that you have a lot of scavenging threads and not a lot of processing units, it may be worthwhile simply doing a scavaging pass and then a computing pass. By storing the work items in a Set, the uniqueness constraint will remove duplicates. The second pass can submit all of the work to an ExecutorService to perform the process in parallel.
A master-worker model generally assumes that the data provider has all of the work and supplies it to the master to manage. The master controls the work execution and deals with distributed computation, time-outs, failures, retries, etc. A fork-join abstraction is a recursive rather than iterative data provider. A map-reduce abstraction is a multi-step master-worker that is useful in certain scenarios.
A good example of master-worker is for trivially parallel problems, such as finding prime numbers. Another is a data load where each entry is independant (validate, transform, stage). The need to process a known working set, handle failures, etc. is what makes a master-worker model different than a thread-pool. This is why a master must be in control and pushes the work units out, whereas a threadpool allows workers to pull work from a shared queue.