parallel application checkpointing in java

parallel application checkpointing in java - java

does anyone know a parallel aplication/benchmark in java that simulates checkpointing? i mean, in my cluster there are running parallel processes in diferent nodes and i want to make them concurrently do some specific action (to have a checkpoint for example). this synchronization how is it achieved?
thanks

If the concurrent threads run in the same VM, just use a CyclicBarrier or a Latch. If they run in different VMs, you can use Terracotta to share a Latch or CyclicBarrier across JVMs, on which all your servers can then synchronize.
Works great, but it needs some work.

You can try hazelcast which offer this functionality but has a lighter touch on the rest of your system than terracotta (more than you need just for this)

Related

"In a distributed environment, one does not use multithreding" - Why?

I am working on a platfor that hosts small Java applications, all of which currently uses a single thread, living inside a Docker engine, consuming data from a Kafka server and logging to a central DB.
Now, I need to put another Java application to this platform. This app at hand uses multithreading relatively heavily, I already tested it inside a Docker container and it works perfectly there, so I'm ready to deploy it on the platform where it would be scaled manually, that is, some human would define the number of containers that would be started, each of them containing an instance of this app.
My Architect has an objection, saying that "In a distributed environment we never use multithreading". So now, I have to refactor my application eliminating any thread related logic from it, making it single threaded. I requested a more detailed reasoning from him, but he yells "If you are not aware of this principle, you have no place near Java".
Is it really a mistake to use a multithreaded Java application in a distributed system - a simple cluster with ten or twenty physical machines, each hosting a number of virtual machines, which then runs Docker containers, with Java applications inside them.
Honestly, I don't see the problem of multithreading inside a container.
Is it really a mistake or somehow "forbidden"?
Thanks.

When you write for example a web application that will run in a Java EE application server, then normally you should not start up your own threads in your web application. The application server will manage threads, and will allocate threads to process incoming requests on the server.
However, there is no hard rule or reason why it is never a good idea to use multi-threading in a distributed environment.
There are advantages to making applications single-threaded: the code will be simpler and you won't have to deal with difficult concurrency issues.
But "in a distributed environment we never use multithreading" is not necessarily always true and "if you are not aware of this principle, you have no place near Java" sounds arrogant and condescending.

I guess he only tells you this as using a single thread eliminates multi threading and data ordering issues.
There is nothing wrong with multithreading though.

Distributed systems usually have tasks that are heavily I/O bound.
If I/O calls are blocking in your system
The only way to achieve concurrency within the process is spawning new threads to do other useful work. (Multi-threading).
The caveat with this approach is that, if they are too many threads
in flight, the operating system will spend too much time context
switching between threads, which is wasteful work.
If I/O calls are Non-Blocking in your system
Then you can avoid the Multi-threading approach and use a single thread to service all your requests. (read about event-loops or Java's Netty Framework or NodeJS)
The upside for single thread approach
The OS does not any wasteful thread context switches.
You will NOT run into any concurrency problems like dead locks or race conditions.
The downside is that
It is often harder to code/think in a non-blocking fashion
You typically end up using more memory in the form of blocking queues.

What? We use RxJava and Spring Reactor pretty heavily in our application and it works pretty fine. You can't work with threads across two JVMs anyway. So just make sure that your logic is working as you expect on a single JVM.

How does multi core performance relate to execution contexts and thread pools in Scala

So i have a existing Spring library that performs some blocking tasks(exposed as services) that i intend to wrap using Scala Futures to showcase multi processor capabilities. The intention is to get people interested in the Scala/Akka tech stack.
Here is my problem.
Lets say i get two services from the existing Spring library. These services perform different blocking tasks(IO,db operations).
How do i make sure that these tasks(service calls) are carried out across multiple cores ?
For example how do i make use of custom execution contexts?
Do i need one per service call?
How does the execution context(s) / thread pools relate to multi core operations ?
Appreciate any help in this understanding.

You cannot ensure that tasks will be executed on different cores. The workflow for the sample program would be as such.
Write a program that does two things on two different threads (Futures, Java threads, Actors, you name it).
JVM sees that you want two threads so it starts two JVM threads and submits them to the OS process dispatcher (or the other
way round, doesn't matter).
OS decides on which core to execute each thread. Usually, it will try to put threads on different cores to maximize the overall efficiency but it is not guaranteed; you might have a situation that your 10 JVM threads will be executed on one core, although this is extreme.
Rule of the thumb for writing concurrent and seemingly parallel applications is: "Here, take my e.g. 10 threads and TRY to split them among the cores."
There are some tricks, like tuning CPU affinity (low-level, very risky) or spawning a plethora of threads to make sure that they are parallelized (a lot of overhead and work for the GC). However, in general, OS is usually not that overloaded and if you create two e.g. actors, one for db one for network IO they should work well in parallel.
UPDATE:
The global ExecutionContext manages the thread pool. However, you can define your own and submit runnables to it myThreadPool.submit(runnable: Runnable). Have a look at the links provided in the comment.

Inter-Process Synchronization with Barrier

I have a program in Java that performs some computation in parallel. I can either run it on a single machine or using multiple different machines.
When executing on a single machine, thread synchronization is successfully achieved by using CyclicBarrier class from java.util.concurrent.CyclicBarrier package. The idea is that all the threads must wait for the other threads to arrive at the same point before proceeding with the computation.
When executing on multiple different machines, inter process communication is implemented via RMI (Remote Method Invocation). I have the same problem on this situation and I want the threads of these processes to wait for the others to arrive at the same point before continuing. I cannot use a shared CyclicBarrier object between the different processes because this class is not serializable.
What are my alternatives to get this barrier behavior on threads executing on different processes on multiple machines?
Thanks

You don't need to pass a CyclicBarrier between processes. You can do an RMI call which in turn uses a CyclicBarrier. I suggest you look at HazelCast at it support distributed Lock and many other collections.
IMHO I would reconsider whether you really need all the processes to check point and find a way to avoid needing this in the first place.

Java synchronization between different JVMs

The project I am working on would trigger various asynchronous jobs to do some work. As I look into it more these asynchronous jobs are actually being run as separate JVMs (separate java processes). Does it mean I would not be able to use any of the following if I need to synchronize between these processes:
synchronized methods/blocks
any lock that implements java.util.concurrent.locks
Because it seems to me they are all thread-level?
Does Java provide support for IPC like semaphores between processes?

That's right. You can not use any standard synchronization mechanisms because they are working into one JVM.
Solutions
You can use file locks introduced in java 7.
You can use synchronization via database entities.
One of already implemented solutions like Terracota may be helpful
Re-think your design. If you are beginner in java world try to talk in details with more experienced engineers. Your question shows that IMHO you are just on wrong way.

You can use synchronized keyword, locks, atomic objects, etc. - but they are local to the JVM. So if you have two JVMs running the same program, they can still e.g. run the same synchronized method at the same time - one on each JVM, but not more.
Solutions:
terracotta provides distributed locking
hazelcast as well
you can use manual synchronization on file system or database

I'm using distributed lock provided by Redisson to synchronize work of different JVMs

they are all thread-level?
That's correct, synchronized etc only work within the context of a single process.
Does Java provide support for IPC like semaphores between processes?
One way to implement communication between Java processes is using RMI.

I have implemented a java IPC Lock implementation using files: FileBasedLock and a IPC Semaphore implementation using a shared DB (jdbc): JdbcSemaphore. Both implementations are part of spf4j.
If you have a zookeeper instance take a look at the Zookeeper based Lock recipes from Apache Curator

Lock across several jvm?

this is a bit related to this question.
I'm using make to extract some information concerning some C programs. I'm wrapping the compilation using a bash script that runs my java program and then gcc. Basically, i'm doing:
make CC=~/my_script.sh
I would like to use several jobs (-j option with make). It's running several processes according to the dependency rules.
If i understood well, I would have as many instances of the jvm as jobs, right ?
The thing is that i'm using sqlite-jdb to collect some info. So the problem is how to avoid several processes trying to modify the db at the same time ?
It seems that the sqlite lock is jvm-dependant (i mean one lock can be "see" only inside the locking jvm), and that this is the same for RandomAccessFile.lock().
Do you have any idea how to do that ? (creating a tmp file and then looking if it exists or not seems to be one possibility but may be expensive. A locking table in the dB ? )
thanks

java.nio.channels.FileLock allows OS-level cross-process file locking.
However, using make to start a bash scripts that runs several JVMs in parallel before calling gcc sounds altogether too Rube-Goldbergian and brittle to me.

there are several solutions for this.
if your lock should be within the same machine, you can use a server socket to implement it (The process that manages to bind to the port first owns the lock, other processes waits for the port to become available).
if you need a lock that span across multiple machines you can use a memcached lock. this will require a memcached server running. I can paste some code if you are interested in this solution.
you can get Java library to connect to memcached here.

You may try Terracotta for sharing objects between various JVM instances. It may appear as a too heavy solution for your needs, but at least worth considering.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

parallel application checkpointing in java - java

If the concurrent threads run in the same VM, just use a CyclicBarrier or a Latch. If they run in different VMs, you can use Terracotta to share a Latch or CyclicBarrier across JVMs, on which all your servers can then synchronize. Works great, but it needs some work.

You can try hazelcast which offer this functionality but has a lighter touch on the rest of your system than terracotta (more than you need just for this)

Related

"In a distributed environment, one does not use multithreding" - Why?

How does multi core performance relate to execution contexts and thread pools in Scala

Inter-Process Synchronization with Barrier

Java synchronization between different JVMs

Lock across several jvm?

Categories

Resources