I am trying to see whether there is an existing implementation for "distributed threads" in Java.
In our days almost everything is moved to cloud. So to say when I have a queue full o messages i can use a simple ThreadPoolExecutor and spawn various threads to take over. Off course all of them belong to the same VM (virtual machine).
What about when i have a system with 3 VMs ? Is there any framework that will support such a scaling without caring where the threads belong ?
Let's say something like a distributed ThreadPool executor so the treads might belong to multipe VMs ?
You can set up a messaging queue. and a simple (scalable) application that listens to that queue. you can then monitor the queue and scale up if things get busy.
Related
We can use nodejs cluster to run multiple processes...
While the equivalent in java is multi-thread...
I have a http listener running on nodejs (without clustering), and I'm using Java to call this nodejs http (using java.lang.Thread class)
If I have concurrently 300 request, will it create multiple instances of nodejs? Will nodejs be a bottle neck?
NodeJS is single-threaded. It means that whatever number of http calls you make, it will queue them and process them. You'll have a longer response time thought if you overload Node JS with hundreds on call in a few seconds.
See this guide about the event loop for further informations
Edit : I did not see the cluster part. It'll allow you to use multiple instances, hence using more cores in your processor and processing more actions at the same time. I would say that the best thing to do is to benchmark a lot of operations to see if it's enough to process hundreds of call in a few seconds
Even though NodeJS is single-threaded, asynchronous operations are run in separate threads thanks to its Event Loop architecture.
If I have concurrently 300 request, will it create multiple instances of nodejs?
No, unless you are running a node cluster, only a single Node proccess (and thread) will handle the requests.
Will nodejs be a bottle neck?
If most of your work is asynchronous, then it will be able to perform those tasks in parallel and shouldn't be a bottleneck. Also, you can scale the application by creating a node process for each available core in the CPU and/or by deploying the process in multiple computer instances.
However, it's important to note the distinctions between a Java multithread application and a Node cluster application (or multiproccess).
processes are typically independent, while threads exist as subsets of a process
processes carry considerably more state information than threads, whereas multiple
threads within a process share process state as well as memory and other resources
processes have separate address spaces, whereas threads share their address space
processes interact only through system-provided inter-process communication mechanisms
context switching between threads in the same process is typically faster than context switching between processes.
Therefore, if memory is scarce in your context, and if your instance has a multi-core processor, then NodeJS might indeed become a bottleneck.
I have a Java application named 'X'. In Windows environment, at a given point of time there might be more than one instance of the application.
I want a common piece of code to be executed sequentially in the Application 'X' no matter how many instances of the application are running. Is that something possible and can be achieved ? Any suggestions will help.
Example :- I have a class named Executor where a method execute() will be invoked. Assuming there might be two or more instances of the application at any given point of time, how can i have the method execute() run sequential from different instances ?
Is there something like a lock which can be accessed from two instances and see if the lock is currently active or not ? Any help ?
I think what you are looking for is a distributed lock (i.e. a lock which is visible and controllable from many processes). There are quite a few 3rd party libraries that have been developed with this in mind and some of them are discussed on this page.
Distributed Lock Service
There are also some other suggestions in this post which use a file on the underlying system as a synchornization mechanism.
Cross process synchronization in Java
To my knowledge, you cannot do this that easily. You could implement TCP calls between processes... but well I wouldn't advice it.
You should better create an external process in charge of executing the task and a request all the the tasks to execute by sending a message to a JMS queue that your executor process would consume.
...Or maybe you don't really need to have several processes running in the same time but what you might require is just an application that would have several threads performing things in the same time and having one thread dedicated to the Executor. That way, synchronizing the execute()method (or the whole Executor) would be enough and spare you some time.
You cannot achieve this with Executors or anything like that because Java virtual machines will be separate.
If you really need to synchronize between multiple independent instances, one of the approaches would be to dedicate internal port and implement a simple internal server within the application. Look into ServerSocket or RMI is full blown solution if you need extensive communications. First instance binds to the dedicated application port and becomes the master node. All later instances find the application port taken but then can use it to make HTTP (or just TCP/IP) call to the master node reporting about activities they need to do.
As you only need to execute some action sequentially, any slave node may ask master to do this rather than executing itself.
A potential problem with this approach is that if the user shuts down the master node, it may be complex to implement approach how another running node could take its place. If only one node is active at any time (receiving input from the user), it may take a role of the master node after discovering that the master is not responding and then the port is not occupied.
A distributed queue, could be used for this type of load-balancing. You put one or more 'request messages' into a queue, and the next available consumer application picks it up and processes it. Each such request message could describe your task to process.
This type of queue could be implemented as JMS queue (e.g. using ActiveMQ http://activemq.apache.org/), or on Windows there is also MSMQ: https://msdn.microsoft.com/en-us/library/ms711472(v=vs.85).aspx.
If performance is an issue and you can have C/C++ develepors, also the 'shared memory queue' could be interesting: shmemq API
Inspired by libraries like Akka and Quasar I started wondering how these actually work "under the hood". I'm aware that it is most likely very complex and that they all work quite different from each other.
I would still like to learn how I would go to implement a (at most) very basic version of my own "event-driven lightweight threads" using Java 8.
I'm quite familiar with Akka as a library, and I have an intermediate understanding about concurrency on the JVM.
Could anyone point me to some literature covering this, or try to describe the concepts involved?
In Akka it works like this:
An actor is a class that bundles a mailbox with the behavior to handle messages
When some code calls ActorRef.tell(msg), the msg is put into the mailbox of the referenced actor (though, this wouldn't be enough to run anything)
A task is queued on the dispatcher (a thread pool basically) to handle messages in the mailbox
When another message comes in and the mailbox is already queued, it doesn't need to be scheduled again
When the dispatcher is executing the task to handle the mailbox, the actor is called to handle one message after the other
Messages in this mailbox up to the count specified in akka.actor.throughput are handled by this one task in one go. If the mailbox still has messages afterwards, another task is scheduled on the dispatcher to handle the remaining messages. Afterwards the tasks exits. This ensures fairness, i.e. that the thread this mailbox is run on isn't indefinitely blocked by one actor.
So, there are basically two work queues:
The mailbox of an actor. These messages need to be processed sequentially to ensure the contract of actors.
The queue of the dispatcher. All of the tasks in here can be processed concurrently.
The hardest part of writing this efficiently is the thread pool. In the thread pool a bunch of worker threads need to access their task queue in an efficient way. By default, Akka uses JDK's ForkJoinPool under-the-hood which is a very sophisticated work-stealing thread pool implementation.
Could anyone point me to some literature covering this,
I am the architect for Chronicle Queue and you can read how it is used and works here on my blog https://vanilla-java.github.io/tag/Microservices/
try to describe the concepts involved?
You have;
above all, make your threads faster and light weight by doing less work.
try to deal with each event as quickly as possible to keep latency low.
batch when necessary but keep it to a minimum. Batching add latency but can help improve maximum throughput.
Identify the critical path. Keep this as short as possible, moving anything blocking or long running to asynchronous thread/processes.
keep hops to a minimum, either between threads, processes or machines.
keep allocation rates down to improve throughput between GCs, and reduce the impact of GCs.
For some of the systems I work on you can achieve latencies of 30 micro-seconds in Java (network packet in to network packet out)
In Akka,
1.Actor system allocates the threads from thread pool to actors that have messages to process.
2.When the actor has no messages to process,thread is released and allocated to other actors that have messages to process
This way asynchronous actor systems can handle many
more concurrent requests with the same amount of resources since
the limited number of threads(thread pool) never sit idle while waiting for I/O
operations to complete.
For more information you can download & check this e-book https://info.lightbend.com/COLL-20XX-Designing-Reactive-Systems_RES-LP.html?lst=BL&_ga=1.214533079.1169348714.1482593952
This is more a design question. I have the following implementation
Multiple Client connections -----> Server ------> Corresponding DB conns
The client/server communication is done using web sockets. It's a single threaded application currently. Evidently, this design does not scale as the the load on the server is too high and response back to the clients takes too long.
Back end operations involve handling large amounts of data.
My question: is it a good idea to create a new thread for every web socket connection? This would imply 500 threads for 500 clients (the number of web sockets would be the same whether it's multi-threading or single threaded). This would ease the load on the server and hence would make life a lot more easier.
or
Is there a better logic to attain scalability? One of them could be create threads on the merit of the job and get the rest processed by the main thread. This somehow seems to be going back to the same problem again in the future.
Any help here would be greatly appreciated.
There are two approaches to this kind of problem
one thread per request
a fixed number of threads to manage all requests
Actually you are using the second approach but using only 1 thread.
You can improve it using a pool of thread to handle your requests instead of only one.
The number of threads to use for the second approach depends on your application. If you have a strong use of cpu and a certain number of long I/O operations (read or write to disk or network) you can increase this number.
If you haven't I/O operations the number of thread should be closer to the number of cpu cores.
Note: existing web servers use this two approaches for http requests. Just as an example Apache use the first (one thread for one request) and NodeJs use the second (it is event driven).
In any case use a system of timeout to unblock very long requests before server crashes.
You can have a look at two very good scalable web servers, Apache and Node.js.
Apache, when operating in multi-threaded (worker) mode, will create new threads for new connections (note that requests from the same browser are served from the same thread, via keep-alive).
Node.js is vastly different, and uses an asynschronous workflow by delegating tasks.
Consequently, Apache scales very well for computationally intensive tasks, while Node.js scales well for multiple (huge) small, event based requests.
You mention that you do some heavy tasks on the backend. This means that you should create multiple threads. How? Create a thread queue, with a MAX_THREADS limit, and a MAX_THREADS_PER_CLIENT limit, serving repeated requests by a client using the same thread. Your main thread must only spawn new threads.
If you can, you can incorporate some good Node.js features as well. If some task on the thread is taking too long, kill that thread with a callback for the task to create a new one when the job is done. You can do a benchmark to even train a NN to find out when to do this!
Have a blast!
I read some tutorial about threads and processes, it is said that the processes be scheduled by operating system kernel, and the threads can be managed and scheduled in a user mode.
I do not understand the saying "threads can be managed and scheduled in a user mode",
for example: the producer and consumer problem? is this a example for "scheduled in a user mode"? or can anyone explain me?
Not sure what tutorial you're looking at, but there are two ways that threads can be scheduled.
The first is user-mode scheduling, which basically mean that one process, using Green threads or perhaps fibers, schedules different threads to run without involving the operating system in its decision. This can be more portable across operating systems, but usually doesn't allow you to take advantage of multiple processors.
The second is kernel scheduling, which means that the various threads are visible to the kernel and are scheduled by it, possibly simultaneously on different processors. This can make thread creation and scheduling more expensive, however.
So it doesn't really depend on the problem that you are trying to solve. User-mode just means that the scheduling of threads happens without involving the operating system. Some early Java versions used Green/user-mode threads, but I believe most now use native/kernel threads.
EDIT:
Coding Horror has a nice overview of the difference between user and kernel mode.
Get a better tutorial? The official Java tutorials are quite good, and contain a lesson on concurrency, that also defines what process and thread mean.
PS: Whether threads are managed/scheduled in user mode is an implementation detail of the Java Virtual Machine that generally need not concern the application programmer.
Scheduled in user mode means you have control over the threads of your software but they are managed by the operating system kernel. So yes, the producer consumer problem is an example you normally handle yourself (but it is not directly related to user mode scheduling) by having two threads, a producer thread and a consumer thread. Both threads access the same shared recource. This resource has to be thread-safe, this means you have to make sure the shared resource does not get corrupted becouse both threads access it at the same time. Thread safety can either be guaranteed by using thread-safe data types or by manually locking or synchronizing your resource.
However, even if you have some control over your threads e.g. starting threads, stopping threads, make threads sleep etc. you do not have full control. The operating system is still managing which threads are allowed cpu time etc.