Implementing event-driven lightweight threads - java

Inspired by libraries like Akka and Quasar I started wondering how these actually work "under the hood". I'm aware that it is most likely very complex and that they all work quite different from each other.
I would still like to learn how I would go to implement a (at most) very basic version of my own "event-driven lightweight threads" using Java 8.
I'm quite familiar with Akka as a library, and I have an intermediate understanding about concurrency on the JVM.
Could anyone point me to some literature covering this, or try to describe the concepts involved?

In Akka it works like this:
An actor is a class that bundles a mailbox with the behavior to handle messages
When some code calls ActorRef.tell(msg), the msg is put into the mailbox of the referenced actor (though, this wouldn't be enough to run anything)
A task is queued on the dispatcher (a thread pool basically) to handle messages in the mailbox
When another message comes in and the mailbox is already queued, it doesn't need to be scheduled again
When the dispatcher is executing the task to handle the mailbox, the actor is called to handle one message after the other
Messages in this mailbox up to the count specified in akka.actor.throughput are handled by this one task in one go. If the mailbox still has messages afterwards, another task is scheduled on the dispatcher to handle the remaining messages. Afterwards the tasks exits. This ensures fairness, i.e. that the thread this mailbox is run on isn't indefinitely blocked by one actor.
So, there are basically two work queues:
The mailbox of an actor. These messages need to be processed sequentially to ensure the contract of actors.
The queue of the dispatcher. All of the tasks in here can be processed concurrently.
The hardest part of writing this efficiently is the thread pool. In the thread pool a bunch of worker threads need to access their task queue in an efficient way. By default, Akka uses JDK's ForkJoinPool under-the-hood which is a very sophisticated work-stealing thread pool implementation.

Could anyone point me to some literature covering this,
I am the architect for Chronicle Queue and you can read how it is used and works here on my blog https://vanilla-java.github.io/tag/Microservices/
try to describe the concepts involved?
You have;
above all, make your threads faster and light weight by doing less work.
try to deal with each event as quickly as possible to keep latency low.
batch when necessary but keep it to a minimum. Batching add latency but can help improve maximum throughput.
Identify the critical path. Keep this as short as possible, moving anything blocking or long running to asynchronous thread/processes.
keep hops to a minimum, either between threads, processes or machines.
keep allocation rates down to improve throughput between GCs, and reduce the impact of GCs.
For some of the systems I work on you can achieve latencies of 30 micro-seconds in Java (network packet in to network packet out)

In Akka,
1.Actor system allocates the threads from thread pool to actors that have messages to process.
2.When the actor has no messages to process,thread is released and allocated to other actors that have messages to process
This way asynchronous actor systems can handle many
more concurrent requests with the same amount of resources since
the limited number of threads(thread pool) never sit idle while waiting for I/O
operations to complete.
For more information you can download & check this e-book https://info.lightbend.com/COLL-20XX-Designing-Reactive-Systems_RES-LP.html?lst=BL&_ga=1.214533079.1169348714.1482593952

Related

Message latencies with CPU under-utilization

We've got a Java app where we basically use three dispatcher pools to handle processing tasks:
Convert incoming messages (from RabbitMQ queues) into another format
Serialize messages
Push serialized messages to another RabbitMQ server
The thing, where we don't know how to start fixing it, is, that we have latencies at the first one. In other words, when we measure the time between "tell" and the start of doing the conversion in an actor, there is (not always, but too often) a delay of up to 500ms. Especially strange is that the CPUs are heavily under-utilized (10-15%) and the mailboxes are pretty much empty all of the time, no huge amount of messages waiting to be processed. Our understanding was that Akka typically would utilize CPUs much better than that?
The conversion is non-blocking and does not require I/O. There are approx. 200 actors running on that dispatcher, which is configured with throughput 2 and has 8 threads.
The system itself has 16 CPUs with around 400+ threads running, most of the passive, of course.
Interestingly enough, the other steps do not see such delays, but that can probably explained by the fact that the first step already "spreads" the messages so that the other steps/pools can easily digest them.
Does anyone have an idea what could cause such latencies and CPU under-utilization and how you normally go improving things there?

Threads at a distrubuted system in Java

I am trying to see whether there is an existing implementation for "distributed threads" in Java.
In our days almost everything is moved to cloud. So to say when I have a queue full o messages i can use a simple ThreadPoolExecutor and spawn various threads to take over. Off course all of them belong to the same VM (virtual machine).
What about when i have a system with 3 VMs ? Is there any framework that will support such a scaling without caring where the threads belong ?
Let's say something like a distributed ThreadPool executor so the treads might belong to multipe VMs ?
You can set up a messaging queue. and a simple (scalable) application that listens to that queue. you can then monitor the queue and scale up if things get busy.

Multithreading with websockets

This is more a design question. I have the following implementation
Multiple Client connections -----> Server ------> Corresponding DB conns
The client/server communication is done using web sockets. It's a single threaded application currently. Evidently, this design does not scale as the the load on the server is too high and response back to the clients takes too long.
Back end operations involve handling large amounts of data.
My question: is it a good idea to create a new thread for every web socket connection? This would imply 500 threads for 500 clients (the number of web sockets would be the same whether it's multi-threading or single threaded). This would ease the load on the server and hence would make life a lot more easier.
or
Is there a better logic to attain scalability? One of them could be create threads on the merit of the job and get the rest processed by the main thread. This somehow seems to be going back to the same problem again in the future.
Any help here would be greatly appreciated.
There are two approaches to this kind of problem
one thread per request
a fixed number of threads to manage all requests
Actually you are using the second approach but using only 1 thread.
You can improve it using a pool of thread to handle your requests instead of only one.
The number of threads to use for the second approach depends on your application. If you have a strong use of cpu and a certain number of long I/O operations (read or write to disk or network) you can increase this number.
If you haven't I/O operations the number of thread should be closer to the number of cpu cores.
Note: existing web servers use this two approaches for http requests. Just as an example Apache use the first (one thread for one request) and NodeJs use the second (it is event driven).
In any case use a system of timeout to unblock very long requests before server crashes.
You can have a look at two very good scalable web servers, Apache and Node.js.
Apache, when operating in multi-threaded (worker) mode, will create new threads for new connections (note that requests from the same browser are served from the same thread, via keep-alive).
Node.js is vastly different, and uses an asynschronous workflow by delegating tasks.
Consequently, Apache scales very well for computationally intensive tasks, while Node.js scales well for multiple (huge) small, event based requests.
You mention that you do some heavy tasks on the backend. This means that you should create multiple threads. How? Create a thread queue, with a MAX_THREADS limit, and a MAX_THREADS_PER_CLIENT limit, serving repeated requests by a client using the same thread. Your main thread must only spawn new threads.
If you can, you can incorporate some good Node.js features as well. If some task on the thread is taking too long, kill that thread with a callback for the task to create a new one when the job is done. You can do a benchmark to even train a NN to find out when to do this!
Have a blast!

Java: How to build a scalable Job processing mechanism

I need to build a job processing module wherein the incoming rate of jobs is in the order of millions. I have a multiprocessor machine to run these jobs on. In my current solution for Java, I use Java's ThreadPoolExecutor framework to create a job queue, a LinkedListBlockingQueue, and the number of threads equals the available processor on the system. This design is not able to sustain the incoming rate as the job queue keeps growing and within seconds it reports the overflow even though the CPU utilization is not maxed out. The CPU utilization remains somewhere in the range of 30-40 percent.
It means that most of the time is going away in thread contention where other CPU remains idle. Is there any better way of processing the jobs so that CPUs are utilized better so that job queue does not overflow?
I suggest you look at Disruptor first. This provides a high performance in memory ring buffer. This works best if you can slow the producer(s) if the consumers cannot keep up.
If you need a persisted or unbounded queue I suggest using Chronicle (which I wrote) This has the advantage that the producer is not slowed by the producer (and the queue is entirely off heap)
Both of these are designed to handle millions of messages per second.
Hi You could use a queuing system like RabbitMQ to hold messages for processing. If you combine this with Spring AMQP you can have easy (one line of config) multi-threading and the messages would be stored on disk until they are ready to be processed by your application.
Your analysis is probably wrong. If the CPU was busy switching jobs, then the CPU utilization would be 100% - anything that the CPU does for a process counts.
My guess is that you have I/O where you could run more jobs. Try to run 4 or 8 times as many threads as you have CPU cores.
If that turns out to be too slow, use a framework like Akka which can process 10 million messages in 23 seconds without any special tuning.
If that's not enough, then look at Disruptor.
Magical libraries are very tempting but often mislead you to wrong direction and makes your solution day by day more complex...Disruptor people LMAX says this too :) .. I think you should take a step back and understand the root cause of Job queue depth. In your case looks to me its same type of consumers so i don't think disruptor is going to help.
You mentioned about thread contention.
I would suggest first try to see if you can reduce the contention. Not sure if all your jobs are related but if not may be can use some portioning technique of queue and reduce unrelated jobs contention. Then you need to know why your consumers are slow. Can you improve your locking strategy by using ReadWrite locks or NonBlocking collections in consumers.

A question about Thread and Process

I read some tutorial about threads and processes, it is said that the processes be scheduled by operating system kernel, and the threads can be managed and scheduled in a user mode.
I do not understand the saying "threads can be managed and scheduled in a user mode",
for example: the producer and consumer problem? is this a example for "scheduled in a user mode"? or can anyone explain me?
Not sure what tutorial you're looking at, but there are two ways that threads can be scheduled.
The first is user-mode scheduling, which basically mean that one process, using Green threads or perhaps fibers, schedules different threads to run without involving the operating system in its decision. This can be more portable across operating systems, but usually doesn't allow you to take advantage of multiple processors.
The second is kernel scheduling, which means that the various threads are visible to the kernel and are scheduled by it, possibly simultaneously on different processors. This can make thread creation and scheduling more expensive, however.
So it doesn't really depend on the problem that you are trying to solve. User-mode just means that the scheduling of threads happens without involving the operating system. Some early Java versions used Green/user-mode threads, but I believe most now use native/kernel threads.
EDIT:
Coding Horror has a nice overview of the difference between user and kernel mode.
Get a better tutorial? The official Java tutorials are quite good, and contain a lesson on concurrency, that also defines what process and thread mean.
PS: Whether threads are managed/scheduled in user mode is an implementation detail of the Java Virtual Machine that generally need not concern the application programmer.
Scheduled in user mode means you have control over the threads of your software but they are managed by the operating system kernel. So yes, the producer consumer problem is an example you normally handle yourself (but it is not directly related to user mode scheduling) by having two threads, a producer thread and a consumer thread. Both threads access the same shared recource. This resource has to be thread-safe, this means you have to make sure the shared resource does not get corrupted becouse both threads access it at the same time. Thread safety can either be guaranteed by using thread-safe data types or by manually locking or synchronizing your resource.
However, even if you have some control over your threads e.g. starting threads, stopping threads, make threads sleep etc. you do not have full control. The operating system is still managing which threads are allowed cpu time etc.

Categories