Confused on how Java shares variables during multiprocessing

Confused on how Java shares variables during multiprocessing - java

I just started using java so sorry if this question's answer is obvious. I can't really figure out how to share variables in java. I have been playing around with python and wanted to try to port some code over to Java to learn the langauge a bit better. Alot of my code is ported but I'm unsure how exactly multiprocessing and sharing of variables works in Java(my process is not disk bound, and uses alot of cpu and searching of a list).
In Python, I can do this:
from multiprocessing import Pool, Manager
manager = Manager()
shared_list = manager.list()
pool = Pool(process=4)
for variables_to_send in list_of_data_to_process:
pool.apply_async(function_or_class, (variables_to_send, shared_list))
pool.close()
pool.join()
I've been having a bit of trouble figuring out how to do multiprocessing and sharing like this in Java. This question helped me understand a bit(via the code) how implementing runnable can help and I'm starting to think java might automatically multiprocess threads(correct me if I'm wrong on this I read that once threads exceed capacity of a cpu they are moved to another cpu? The oracle docs seem to be more focused on threads than multiprocessing). But it doesn't explain how to share lists or other variables between proceses(and keep them in close enough sync).
Any suggestions or resources? I am hoping I'm searching for the wrong thing(multiprocessing java) and that this is hopefully as easy(or similarly straightforward) as it is in my above code.
Thanks!

There is an important difference between a thread and a process, and you are running into it now: with some exceptions, threads share memory, but processes do not.
Note that real operating systems have ways around just about everything I'm about to say, but these features aren't used in the typical case. So, to fire up a new process, you must clone the current process in some way with a system call (on *nix, this is fork()), and then replace the code, stack, command-line arguments, etc. of the child process with another system call (on *nix, this is the exec() family of system calls). Windows has rough equivalents of both these system calls, so everything I'm saying is cross-platform. Also, the Java Runtime Environment takes care of all these system calls under the covers, and without JNI or some other interop technology you can't really execute them yourself.
There are two important things to note about this model: the child process doesn't share the address space of the parent process, and the entire address space of the child process gets replaced on the exec() call. So, variables in the parent process are unavailable to the child process, and vice versa.
The thread model is quite different. Threads are kind of like lite processes, in that each thread has its own instruction pointer, and (on most systems) threads are scheduled by the operating system scheduler. However, a thread is a part of a process. Each process has at least one thread, and all the threads in the process share memory.
Now to your problem:
The Python multiprocessing module spawns processes with very little effort, as your code example shows. In Java, spawning a new process takes a little more work. It involves creating a new Process object using ProcessBuilder.start() or Runtime.exec(). Then, you can pipe strings to the child process, get back its output, wait for it to exit, and a few other communication primitives. I would recommend writing one program to act as the coordinator and fire up each of the child processes, and writing a worker program that roughly corresponds to function_or_class in your example. The coordinator can open multiple copies of the worker program, give each a task, and wait for all the workers to finish.

You can use Java Thread for this purpose. You need to create one user defined class. That class should have setter method through which you can set shared_list object. Implement Runnable interface and perform processing task in run() method. You can find good example on internet. If you are sharing the same instance of shared_list then you need to make sure that access to this variable is synchronized.

This is not the easiest way to work with threads in java but its the closed to the python code you posted. The task class is an instance of the callable interface and it has a call method. When we create each of the 10000 Task instances we pass them a reference to the same list. So when the call method of all those objects is called they will use the same list.
We are using a fixed size thread pool of 4 threads here so all the tasks we are submitting get queued and wait for a thread to be available.
public class SharedListRunner {
public void RunList() {
ExecutorService executerService = Executors.newFixedThreadPool(4);
List<String> sharedList = new List<String>();
sharedList.add("Hello");
for(int i=0; i < 10000; i++)
executerService.submit(new Task(list));
}
}
public class Task implements Callable<String> {
List<String> sharedList;
public Task(List<String> sharedList) {
this.sharedList = sharedList;
}
#Override
public String call() throws Exception {
//Do something to shared list
sharedList.size();
return "World";
}
}
At any one time 4 threads are accessing the list. If you want to dig further 4 Java threads are accessing the list, There are probably fewer OS threads servicing those 4 java threads and there are even fewer processor threads normally 2 or 4 per core of your cpu.

Related

Balancing multiple queues

I suspect this is really easy but I’m unsure if there’s a naïve way of doing it in Java. Here’s my problem, I have two scripts for processing data and both have the same inputs/outputs except one is written for the single CPU and the other is for GPUs. The work comes from a queue server and I’m trying to write a program that sends the data to either the CPU or GPU script depending on which one is free.
I do not understand how to do this.
I know with executorservice I can specify how many threads I want to keep running but not sure how to balance between two different ones. I have 2 GPU’s and 8 CPU cores on the system and thought I could have threadexecutorservice keep 2 GPU and 8 CPU processes running but unsure how to balance between them since the GPU will be done a lot quicker than the CPU tasks.
Any suggestions on how to approach this? Should I create two queues and keep pooling them to see which one is less busy? or is there a way to just put all the work units(all the same) into one queue and have the GPU or CPU process take from the same queue as they are free?
UPDATE: just to clarify. the CPU/GPU programs are outside the scope of the program I'm making, they are simply scripts that I call via two different method. I guess the simplified version of what I'm asking is if two methods can take work from the same queue?

Can two methods take work from the same queue?
Yes, but you should use a BlockingQueue to save yourself some synchronization heartache.
Basically, one option would be to have a producer which places tasks into the queue via BlockingQueue.offer. Then design your CPU/GPU threads to call BlockingQueue.take and perform work on whatever they receive.
For example:
main (...) {
BlockingQueue<Task> queue = new LinkedBlockingQueue<>();
for (int i=0;i<CPUs;i++) {
new CPUThread(queue).start();
}
for (int i=0;i<GPUs;i++) {
new GPUThread(queue).start();
}
for (/*all data*/) {
queue.offer(task);
}
}
class CPUThread {
public void run() {
while(/*some condition*/) {
Task task = queue.take();
//do task work
}
}
}
//etc...

Obviously there is more than one way to do it, usually simplest is the best. I would suggest threadpools, one with 2 threads for CPU tasks, second with 8 threads will run GPU tasks. Your work unit manager can submit work to the pool that has idle threads at the moment (I would recommend synchronizing that block of code). Standard Java ThreadPoolExecutor has getActiveCount() method you can use for it, see
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getActiveCount().

Use Runnables like this:
CPUGPURunnable implements Runnable {
run() {
if ( Thread.currentThread() instance of CPUGPUThread) {
CPUGPUThread t = Thread.currentThread();
if ( t.isGPU())
runGPU();
else
runCPU();
}
}
}
CPUGPUThreads is a Thread subclass that knows if it runs in CPU or GPU mode, using a flag. Have a ThreadFactory for ThreadPoolExecutors that creates either a CPU of GPU thread. Set up a ThreadPoolExecutor with two workers. Make sure the Threadfactory creates a CPU and then a GPU thread instance.

I suppose you have two objects that represents two GPUs, with methods like boolean isFree() and void execute(Runnable). Then you should start 8 threads which in a loop take next job from the queue, put it in a free GPU, if any, otherwise execute the job itself.

Multi-Threading in Perl vs Java

I am new to Perl and I am writing a program that requires the use of threading. I have found the threads feature in Perl, but I find that I am still a bit confused. As stated in the title of this post Java is probably the most common to use threads. I am not saying that it is perfect, but I can get the job done using the Thread class.
In Java I have a method called startThreads and in that method I created the threads and then started them. After starting them there is a while loop that is checking if the treads are done. If all the threads have exited properly then the while loop exits, but if not then the while loop is watching what threads have timed out and then safely interrupts those threads and sets their shutdown flags to true.
The problem:
In Perl I want to use the same algorithm that I have stated above, but of course there are differences in Perl and I am new to Perl. Is it possible to have the while loop running while the other threads are running? How is it done?

You can implement your while loop in Perl using threads->list() but you should consider a different approach, first.
Instead of waiting for threads, how about waiting for results?
The basic idea here is that you have code which takes work units from a queue and which puts the results of the work units (either objects or exceptions) into a output queue.
Start a couple of threads that wait for work units in the input queue. when one show up, run the work unit and put the result in the output queue.
In your main code, you just need to put all the N work units into the input queue (make sure it's large enough). After that, you can wait for N outputs in the output queue and you're done without needing to worry about threads, joins and exceptions.
[EDIT] All your questions should be answered in http://perldoc.perl.org/perlthrtut.html

Be careful when using threads in perl. There are a lot of nuances about safely accessing shared data. Because most Perl code does not do threading, very few modules are thread-safe.

After reading the link that Michael Slade provided, I found that Cyber-Guard Enterprise was correct. By detaching the thread the main still is performing work. I haven't tested it out yet, but it looks like $thr->is_running() can tell me if the thread is still running.
This was taken from the url that was provided and shows how detach is used.
perldoc.perl.org/perlthrtut.html?
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
$a = 0;
while (1) {
$a++;
print("\$a is $a\n");
sleep(1);
}
}

What does Thread-Safe mean in java or when do we call Thread-Safe?

I am not understanding this concept in any manner.
public class SomeName {
public static void main(String args[]) {
}
}
This is my class SomeName. Now what is thread here.
Do we call the class as a thread.
Do we call this class as thread when some other object is trying to access its method or members?
Do we call this class as thread when some other object is trying to access this object?
What does it mean when we call something in java as thread-safe ?

Being thread-safe means avoiding several problems. The most common and probably the worst is called threadlock. The old analogy is the story of the dining philosophers. They are very polite and will never reach out their chopsticks to take food when someone else is doing the same. If they all reach out at the same time, then they all stop at the same time, and wait...and nothing ever happens, because they're all too polite to go first.
As someone else pointed out, if your app never creates additional threads, but merely runs from a main method, then there is only one thread, or one "dining philosopher," so threadlock can't occur. When you have multiple threads, the simplest way to avoid threadlock is to use a "monitor", which is just an object that's set aside. In effect, your methods have to obtain a "lock" on this monitor before accessing threads, so there are no collisions. However, you can still have threadlock, because there might be two objects trying to access two different threads, each with its own monitor. Object A has to wait for Object B to release its lock on monitor object 1; Object B has to wait for Object A to release its lock on monitor object 2. So now you're back to threadlock.
In short, thread safety is not terribly difficult to understand, but it does take time, practice and experience. The first time you write a multi-threaded app, you will run into threadlock. Then you will learn, and it soon becomes pretty intuitive. The biggest caveat is that you need to keep the multi-threaded parts of an app as simple as possible. If you have lots of threads, with lots of monitors and locks, it becomes exponentially more difficult to ensure that your dining philosophers never freeze.
The Java tutorial goes over threading extremely well; it was the only resource I ever needed.

You might want to think of thread as CPU executing the code that you wrote.
What is thread?
A thread is a single sequential flow of control within a program.
From Java concurrency in practice:
Thread-safe classes encapsulate any needed synchronization so that
clients need not provide their own.

At any time you have "execution points" where the JVM is running your code stepping through methods and doing what your program tells it to do.
For simple programs you only have one. For more complex programs you can have several, usually invoked with a new Thread().run or an Executor.
"Thread-safe" refers to that your code is written in such a way that one execution point cannot change what another execution point sees. This is usually very desirable as these changes can be very hard to debug, but as you only have one, there is not another so this does not apply.
Threads is an advanced subject which you will come back to later, but for now just think that if you do not do anything special with Threads or Swing this will not apply to you. It will later, but not now.

Well, in your specific example, when your program runs, it has just 1 thread.
The main thread.

A class is thread safe when an object of that class can be accessed in parallel from multiple threads (and hence from multiple CPUs) without any of the guarantees that it would provide in a single threaded way to be broken.
You should read first about what exactly threads are, for instance on Wikipedia, which might make it then easier to understand the relation between classes and threads and the notion of threadsafety.

Every piece of code in Java is executed on some thread. By default, there is a "main" thread that calls your main method. All code in your program executes on the main thread unless you create another thread and start it. Threads start when you explicitly call the Thread.start() method; they can also start implicitly when you call an API that indirectly calls Thread.start(). (API calls that start a thread are generally documented to do so.) When Thread.start() is called, it creates a new thread of execution and calls the Thread object's run() method. The thread exits when its run() method returns.
There are other ways to affect threads, but that's the basics. You can read more details in the Java concurrency tutorial.

Java multi-threading - what is the best way to monitor the activity of a number of threads?

I have a number of threads that are performing a long runing task. These threads themselves have child threads that do further subdivisions of work. What is the best way for me to track the following:
How many total threads my process has created
What the state of each thread currently is
What part of my process each thread has currently got to
I want to do it in as efficient a way as possible and once threads finish, I don't want any references to them hanging around becasuse I need to be freeing up memory as early as possible.
Any advice?

Don't think in terms of threads, which are OS objects and carry no application semantics, but in terms of tasks. A Thread cannot know it is 50% complete, a task can. Look at the facilities in java.util.concurrent for managing tasks in terms of executors and callable objects.
In most cases where you're using Java (i.e. non-embedded systems) you should not care how many threads your process has created any more (or any less) than how many objects it has created - you don't want to run out, but if you are explicitly managing OS resources in a high-level language you're probably working at the wrong level of abstraction.
For intermediate feedback, create a progress listener interface containing a method for informing the listener where the task has got to, pass it to the task on creation and call it during your task when the progress changes. Make sure any implementation of the interface is thread safe.

It seems that the information you are looking for is mostly app specific ("what part of my process each thread currently does?"). Even, "how many total threads my process has created" is app specific because you are not interested in all sort of threads that the JVM has created (GUI, GC, etc.).
Thus, the best course of action is to create your dedicated subclass of Thread. What the thread start/finish processing a job your class will register the necessary details with some central registry.
[EDIT]
Here's a typical implementation (can be refined further):
public class MyThread extends Thread
{
private Runnable runnable;
private String description;
private Registry reg;
public MyThread(Runnable runnable, String description, Registry reg) {
this.runnable = runnable;
this.description = description;
this.reg = reg;
}
public void run() {
int id = reg.jobStarting(description);
try {
runnable.run();
reg.jobEnded(id);
}
catch(Throwable t) {
reg.jobFailed(id, t);
}
}
}

Use JMX: http://java.sun.com/j2se/1.5.0/docs/guide/management/agent.html
Add the following parameters to your jvm (after javac ...):
-Dcom.sun.management.jmxremote.port=8086 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
Then connect using jconsole. JConsole will be in the bin folder of your JDK. I am sure it comes with the JDK but i'm not sure if it's bundled with the JRE. In this case, when JConsole pop's up, enter localhost:8086 as the ip address. Change the port if needed.
In JConsole, click on the Thread tab. This will show you the number of running and started threads with a nice graph. You can also click on the thread to see the current stack trace. You even have a button to detect dead locks!

I would change the design approach altogether and reason with Work instead of Threads.
You chunk your big task into work that you submit to executors/worker (see also the thread pool pattern). Then you can register listener that get notified when a work is started/completed/aborted.
The JCA specification implement this pattern in the WorkManager, you can draw some inspiration from it:
void scheduleWork(Work work,
long startTimeout,
ExecutionContext execContext,
WorkListener workListener)
And the listener
void workAccepted(WorkEvent e);
void workCompleted(WorkEvent e);
void workRejected(WorkEvent e);
void workStarted(WorkEvent e);
Otherwise have a look at the java.util.concurrent there is also some interesting stuff in it.

Use Visual VM from sun/oracle, free tool. Pretty good tool, gives you lot of details about threads, memory used, cpu for the process running.

Java threads query

Im working on a java application that involves threads. So i just wrote a piece of code to just familiarize myself with the execution of multiple yet concurrent threads
public class thready implements Runnable{
private int num;
public thready(int a) {
this.num=a;
}
public void run() {
System.out.println("This is thread num"+num);
for (int i=num;i<100;i++)
{
System.out.println(i);
}
}
public static void main(String [] args)
{
Runnable runnable =new thready(1);
Runnable run= new thready(2);
Thread t1=new Thread(runnable);
Thread t2=new Thread(run);
t1.start();
t2.start();
}}
Now from the output of this code, I think at any point in time only 1 thread is executing and the execution seems to alternate between the threads. Now i would like to know if my understanding of the situation is correct. And if it is I would like to know if there is any way in which i could get both threads to executing simultaneously as i wish to incorporate this scenario in a situation wherein i want to write a tcp/ip socket listener that simultaneously listens on 2 ports, at the same time. And such a scenario cant have any downtime.
Any suggestions/advice would be of great help.
Cheers

How many processors does your machine have? If you have multiple cores, then both threads should be running at the same time. However, console output may well be buffered and will require locking internally - that's likely to be the effect you're seeing.
The easiest way to test this is to make the threads do some real work, and time them. First run the two tasks sequentially, then run them in parallel on two different threads. If the two tasks don't interact with each other at all (including "hidden" interactions like the console) then you should see a roughly 2x performance improvement using two threads - if you have two cores or more.
As Thilo said though, this may well not be relevant for your real scenario anyway. Even a single-threaded system can still listen on two sockets, although it's easier to have one thread responsible for each socket. In most situations where you're listening on sockets, you'll spend a lot of the time waiting for more data anyway - in which case it doesn't matter whether you've got more than one core or not.
EDIT: As you're running on a machine with a single core (and assuming no hyperthreading) you will only get one thread executing at a time, pretty much by definition. The scheduler will make sure that both threads get CPU time, but they'll basically have to take turns.

If you have more than one CPU, both threads can run simultaneously. Even if you have only one CPU, as soon as one of the threads waits for I/O, the other can use the CPU. The JVM will most likely also try to dice out CPU time slices fairly. So for all practical purposes (unless all they do is use the CPU), your threads will run simultaneously (as in: within a given second, each of them had access to the CPU).
So even with a single CPU, you can have two threads listening on a TCP/IP socket each.

Make the threads sleep in between the println statements. What you have executes too fast for you to see the effect.

Threads are just a method of virtualizing the CPU so that it can be used by several applications/threads simultaneously. But as the CPU can only execute one program at a time, the Operating System switches between the different threads/processes very fast.
If you have a CPU with just one core (leaving aside hyperthreading) then your observation, that only one thread is executing at a time, is completely correct. And it's not possible in any other way, you're not doing anything wrong.

If the threads each take less than a single CPU quantum, they will appear to run sequentially. Printing 100 numbers in a row is probably not intensive enough to use up an entire quantum, so you're probably seeing sequential running of threads.
As well, like others have suggested, you probably have two CPU, or a hyperthreaded CPU at least. The last pure single core systems were produced around a decade ago, so it's unlikely that your threads aren't running side-by-side.
Try increasing the amount of processing that you do, and you might see the output intermingle. Be aware that when you do, System.out.println is NOT threadsafe, as far as I know. You'll get one thread interrupting the output of another mid-line.

They do run simultaneously, they just can't use the outputstream at the same time.
Replace your run- method with this:
public void run() {
for (int i=num;i<100;i++) {
try {
Thread.sleep(100);
System.out.println("Thread " + num + ": " + i);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

If you are getting many messages per second and processing each piece of data takes few milliseconds to few seconds, it is not a good idea to start one-thread per message. Ultimately number of threads spawned are limited by the underlying OS. You may get out-of-threads error or something like that.
Java 5 introduced Thread Pool framework where you can allocate a fixed number of threads and submit the job (instance of Runnable). This framework will run the job in one of the available thread in the pool. It is more efficient as there is not much context switching done. I wrote a blog entry to jump-start on this framework.
http://dudefrommangalore.blogspot.com/2010/01/concurrency-in-java.html
Cheers,
-- baliga

For the question on listening on 2 ports, clients has to send message to one of them. But since both ports are opened to accept connections within a single JVM, if the JVM fails having 2 ports does not provide you high-availability.
Usual pattern for writing a server which listen on a port is to have one thread listen on the port. As soon as the data arrives, spawn another thread, hand-over the content as well as the client socket to the newly spawned thread and continue accepting new messages.
Another pattern is to have multiple threads listen on the same socket. When client connects, connection is made to one of the thread.

Two ways this could go wrong:
System.out.println() may use a buffer, you should call flush() to get it to the screen.
There has to be some synchronisation
build into the System.out object or
you couldn't use it in a
multithreaded application without
messing up the output, so it is
likely that one thread holds a lock
for most of the time, making the other thread wait. Try using System.out in one thread and Sytem.err in the other.

Go and read up on multitasking and multiprogramming. http://en.wikipedia.org/wiki/Computer_multitasking

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.