Does anybody know a mechanism that can capture the state of a running thread and serialize that for further resume?
Is there anything available for the JVM?
How about pthreads?
My main goal is to be able to migrate a running thread to a remote machine.
With the cooperation of that thread, you can do it by any mechanism that thread supports. Without the cooperation of that thread, it is impossible. What happens if that thread holds a lock that your serialize code needs?
What happens if you migrate a running thread that is currently using some kernel resource such as a pipe. Will you migrate that resource?
The right solution to your problem may be to have the thread support a migration mechanism. How you do that depends on precisely what that thread is doing. You'll get answers that are more likely to help you solve your actual problem if you explain precisely what is.
The answer to this is really going to depend on what constitutes the state of the running thread.
If the state is local thread data which allows for the thread state to be copied and saved and then inserted back into a new thread, then the mechanism is basically to just save the state with some kind of a serializable object which is then used to create a new thread with the saved state and to then begin it running.
However if the thread state depends on external objects or entities, the problem is much tougher. For instance if you have a thread which is acting as a server using TCP and you want to save its state then restart it later, the socket is going to change and the client which was accessing the server thread will know that the server thread stopped communicating for a while.
This means that for any external entities that are depending on the thread, will need to know that the thread is being saved and frozen, they will need to have something that allows them to either fall over to an alternative or to save and freeze themselves, and there will need to be some kind of protocol so that the restarted thread can let the other entities know that it is back in business and its current state.
Also if the thread is depending on some external entities then those entities must be able to deal with the thread being frozen. There may need to be some kind of a mechanism in place so that the thread can release various resources, whose states are saved, and then when restarted, be able to reclaim those resources or comparable resources and then reset those resources to the saved state.
If you want to move a running JVM from one machine to another, you will most likely not do it by yourself but instead use the live migration functionality of a VM manager.
The VM managers will move entire virtual machines from one physical machine to another without stopping the virtual machine or processes, but it's quite a bit higher level than serializing/deserializing a thread. Since a thread may use resources that are local to the operating system such as file systems or sockets, the whole operating system needs to follow the thread to the other physical machine.
I'm not aware of any way that you can send a thread, per se. However, you could use a pattern such as the memento pattern to save the state of your thread.
See these references before continuing so you know the terminology:
Memento pattern, oodesign.com
Memento pattern, Wikipedia
Basically, you'll have this:
Design a job (thread) that can run with any starting state, including a state from mid-execution.
When it needs migrated, get the state of that thread.
In Java, you could use ThreadLocal variables to store the thread state.
Serialize that state to the other machine.
Use the state to start a new thread with the state you deserialized.
This is a better approach then actually migrating a thread, its state, stack, etc. since you can pick and choose what absolutely needs to be moved instead of moving everything no matter what.
Related
I have a Java application named 'X'. In Windows environment, at a given point of time there might be more than one instance of the application.
I want a common piece of code to be executed sequentially in the Application 'X' no matter how many instances of the application are running. Is that something possible and can be achieved ? Any suggestions will help.
Example :- I have a class named Executor where a method execute() will be invoked. Assuming there might be two or more instances of the application at any given point of time, how can i have the method execute() run sequential from different instances ?
Is there something like a lock which can be accessed from two instances and see if the lock is currently active or not ? Any help ?
I think what you are looking for is a distributed lock (i.e. a lock which is visible and controllable from many processes). There are quite a few 3rd party libraries that have been developed with this in mind and some of them are discussed on this page.
Distributed Lock Service
There are also some other suggestions in this post which use a file on the underlying system as a synchornization mechanism.
Cross process synchronization in Java
To my knowledge, you cannot do this that easily. You could implement TCP calls between processes... but well I wouldn't advice it.
You should better create an external process in charge of executing the task and a request all the the tasks to execute by sending a message to a JMS queue that your executor process would consume.
...Or maybe you don't really need to have several processes running in the same time but what you might require is just an application that would have several threads performing things in the same time and having one thread dedicated to the Executor. That way, synchronizing the execute()method (or the whole Executor) would be enough and spare you some time.
You cannot achieve this with Executors or anything like that because Java virtual machines will be separate.
If you really need to synchronize between multiple independent instances, one of the approaches would be to dedicate internal port and implement a simple internal server within the application. Look into ServerSocket or RMI is full blown solution if you need extensive communications. First instance binds to the dedicated application port and becomes the master node. All later instances find the application port taken but then can use it to make HTTP (or just TCP/IP) call to the master node reporting about activities they need to do.
As you only need to execute some action sequentially, any slave node may ask master to do this rather than executing itself.
A potential problem with this approach is that if the user shuts down the master node, it may be complex to implement approach how another running node could take its place. If only one node is active at any time (receiving input from the user), it may take a role of the master node after discovering that the master is not responding and then the port is not occupied.
A distributed queue, could be used for this type of load-balancing. You put one or more 'request messages' into a queue, and the next available consumer application picks it up and processes it. Each such request message could describe your task to process.
This type of queue could be implemented as JMS queue (e.g. using ActiveMQ http://activemq.apache.org/), or on Windows there is also MSMQ: https://msdn.microsoft.com/en-us/library/ms711472(v=vs.85).aspx.
If performance is an issue and you can have C/C++ develepors, also the 'shared memory queue' could be interesting: shmemq API
I like the fact that threads can share some resources but they need to release their private resources in the end. Meantime, some computations may be obsolete and you kill them, using Task Manager. I am looking for the same thing done in one JVM process.
You say that thread.stop is unreliable and propose that my thread polls the interrupted flag instead. But, polling is not efficient, right? You do not want neither performance penalty nor polluting your code with ubiquitous if (interrupted) blocks. What is going to be the best appropriate option here?
Killing one process in an application that is composed of several interacting processes can be dangerous, but it is not usually as dangerous as killing a thread.
The ways in which one process depends on the state of another often are more limited. If the processes only interact through network communication, it probably won't be hard to design an application that can gracefully recover from having one of its processes killed. Likewise, if the processes interact through a shared transactional database.
It gets harder to handle a KILL safely if the processes interact through shared files. And it gets pretty close to impossible to guarantee the safety of an arbitrary KILL if the processes interact via shared memory.
Threads always interact via shared memory. A KILL that can't be handled, could come at any time. There's just no way to guarantee that killing some thread won't leave the whole program in a corrupt state.
That's why t.stop() is deprecated in Java. The real question should be, "why did they ever implement t.stop() in the first place?"
I am trying to test an application with Jmeter. The application uses a proprietary library that creates multiple threads. In JMeter I have created a AbstractJavaSamplerClient, which does not seem to wait for all the other threads that might be generated in the application. Instead it just runs its own default method and closes leaving the other threads running in the background - since I am connecting to a server in the application, I can see through the server logs that it is still connected. Since I don't have the references to the other threads as they are instantiated through the proprietary library, I can't use the common solutions of wait() or join().How can I get the main thread to wait for all the threads (none of which I have references too)?
Put all work with the library in a separate thread in a specially created thread group. The library will create new threads in that thread group and its descendants. List all threads of that group recursively with group.enumerate(new Thread[group.activeCount()*2],true). Then you can join() them all.
You can start with
Thread.getAllStackTraces().keySet();
which will give you a set of all running threads. This will include those you are interested in and all the other threads, including those internal to the JVM.
Hopefully you'll be able to filter out the ones you are interested in by name, and then you can join() them.
I have been poring over some research for multithreaded programming, such as this academic webpage, and have noticed that it is quite popular to create one Thread for each client that connects to a given server. In fact, I have found some sample client-server program that does just this. (I attempted to adopt the idea, but now, I am in doubt.) According to Java: How to Program, it is recommended that I use the ExecutorService to create and manage Threads since the programmer cannot predict when a Thread will actually be dispatched by the system, despite the order in which threads are created and started.
What I intend to do
As mentioned earlier, I am creating a server that creates a Thread for each client. The clients are to send data to me, and the Thread will fetch the data, store it in a file, and log the data.
My question
Would using the ExecutorService to create Threads (and manage them!) be effectively the same as giving each client a Thread , but more manageable? Also, would it eliminate the overhead caused by the famous "one-thread-per-client" idea?
Would using the ExecutorService to create Threads (and manage them!) be effectively the same as giving each client a Thread , but more manageable?
yes
Also, would it eliminate the overhead caused by the famous "one-thread-per-client" idea?
no. The overhead is usually in terms of the number of active threads which is not changed by using a thread pool.
I read some tutorial about threads and processes, it is said that the processes be scheduled by operating system kernel, and the threads can be managed and scheduled in a user mode.
I do not understand the saying "threads can be managed and scheduled in a user mode",
for example: the producer and consumer problem? is this a example for "scheduled in a user mode"? or can anyone explain me?
Not sure what tutorial you're looking at, but there are two ways that threads can be scheduled.
The first is user-mode scheduling, which basically mean that one process, using Green threads or perhaps fibers, schedules different threads to run without involving the operating system in its decision. This can be more portable across operating systems, but usually doesn't allow you to take advantage of multiple processors.
The second is kernel scheduling, which means that the various threads are visible to the kernel and are scheduled by it, possibly simultaneously on different processors. This can make thread creation and scheduling more expensive, however.
So it doesn't really depend on the problem that you are trying to solve. User-mode just means that the scheduling of threads happens without involving the operating system. Some early Java versions used Green/user-mode threads, but I believe most now use native/kernel threads.
EDIT:
Coding Horror has a nice overview of the difference between user and kernel mode.
Get a better tutorial? The official Java tutorials are quite good, and contain a lesson on concurrency, that also defines what process and thread mean.
PS: Whether threads are managed/scheduled in user mode is an implementation detail of the Java Virtual Machine that generally need not concern the application programmer.
Scheduled in user mode means you have control over the threads of your software but they are managed by the operating system kernel. So yes, the producer consumer problem is an example you normally handle yourself (but it is not directly related to user mode scheduling) by having two threads, a producer thread and a consumer thread. Both threads access the same shared recource. This resource has to be thread-safe, this means you have to make sure the shared resource does not get corrupted becouse both threads access it at the same time. Thread safety can either be guaranteed by using thread-safe data types or by manually locking or synchronizing your resource.
However, even if you have some control over your threads e.g. starting threads, stopping threads, make threads sleep etc. you do not have full control. The operating system is still managing which threads are allowed cpu time etc.