Multi-Threading in Perl vs Java

Multi-Threading in Perl vs Java - java

I am new to Perl and I am writing a program that requires the use of threading. I have found the threads feature in Perl, but I find that I am still a bit confused. As stated in the title of this post Java is probably the most common to use threads. I am not saying that it is perfect, but I can get the job done using the Thread class.
In Java I have a method called startThreads and in that method I created the threads and then started them. After starting them there is a while loop that is checking if the treads are done. If all the threads have exited properly then the while loop exits, but if not then the while loop is watching what threads have timed out and then safely interrupts those threads and sets their shutdown flags to true.
The problem:
In Perl I want to use the same algorithm that I have stated above, but of course there are differences in Perl and I am new to Perl. Is it possible to have the while loop running while the other threads are running? How is it done?

You can implement your while loop in Perl using threads->list() but you should consider a different approach, first.
Instead of waiting for threads, how about waiting for results?
The basic idea here is that you have code which takes work units from a queue and which puts the results of the work units (either objects or exceptions) into a output queue.
Start a couple of threads that wait for work units in the input queue. when one show up, run the work unit and put the result in the output queue.
In your main code, you just need to put all the N work units into the input queue (make sure it's large enough). After that, you can wait for N outputs in the output queue and you're done without needing to worry about threads, joins and exceptions.
[EDIT] All your questions should be answered in http://perldoc.perl.org/perlthrtut.html

Be careful when using threads in perl. There are a lot of nuances about safely accessing shared data. Because most Perl code does not do threading, very few modules are thread-safe.

After reading the link that Michael Slade provided, I found that Cyber-Guard Enterprise was correct. By detaching the thread the main still is performing work. I haven't tested it out yet, but it looks like $thr->is_running() can tell me if the thread is still running.
This was taken from the url that was provided and shows how detach is used.
perldoc.perl.org/perlthrtut.html?
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
$a = 0;
while (1) {
$a++;
print("\$a is $a\n");
sleep(1);
}
}

Related

ForkJoinPool stalls during invokeAll/join

I try to use a ForkJoinPool to parallelize my CPU intensive calculations.
My understanding of a ForkJoinPool is, that it continues to work as long as any task is available to be executed. Unfortunately I frequently observed worker threads idling/waiting, thus not all CPU are kept busy. Sometimes I even observed additional worker threads.
I did not expect this, as I strictly tried to use non blocking tasks.
My observation is very similar to those of ForkJoinPool seems to waste a thread.
After debugging a lot into ForkJoinPool I have a guess:
I used invokeAll() to distribute work over a list of subtasks. After invokeAll() finished to execute the first task itself it starts joining the other ones. This works fine, until the next task to join is on top of the executing queue. Unfortunately I submitted additional tasks asynchronously without joining them. I expected the ForkJoin framework to continue executing those task first and than turn back to joining any remaining tasks.
But it seems not to work this way. Instead the worker thread gets stalled calling wait() until the task waiting for gets ready (presumably executed by an other worker thread). I did not verify this, but it seems to be a general flaw of calling join().
ForkJoinPool provides an asyncMode, but this is a global parameter and can not be used for individual submissions. But I like to see my asynchronously forked tasks to be executed soon.
So, why does ForkJoinTask.doJoin() not simply executes any available task on top of its queue until it gets ready (either executed by itself or stolen by others)?

Since nobody else seems to understand my question I try to explain what I found after some nights of debugging:
The current implementation of ForkJoinTasks works well if all fork/join calls are strictly paired. Illustrating a fork by an opening bracket and join by a closing one a perfect binary fork join pattern may look like this:
{([][]) ([][])} {([][]) ([][])}
If you use invokeAll() you may also submit list of subtasks like this:
{([][][][]) ([][][][]) ([][][][])}
What I did however looks like this pattern:
{([) ([)} ... ]]
You may argue this looks ill or is a misuse of the fork-join framework. But the only constraint is, that the tasks completion dependencies are acyclic, else you may run into a deadlock. As long as my [] tasks are not dependent on the () tasks, I don't see any problem with it. The offending ]]'s just express that I do not wait for them explicitly; they may finish some day, it does not matter to me (at that point).
Indeed the current implementation is able to execute my interlocked tasks, but only by spawning additional helper threads which is quite inefficient.
The flaw seems to be the current implementation of join(): joining an ) expects to see its corresponding ( on top of its execution queue, but it finds a [ and is perplexed. Instead of simply executing [] to get rid of it, the current thread suspends (calling wait()) until someone else comes around to execute the unexpected task. This causes a drastic performance break down.
My primary intend was to put additional work onto the queue to prevent the worker thread from suspending if the queue runs empty. Unfortunately the opposite happens :-(

You are dead right about join(). I wrote this article two years ago that points out the problem with join().
As I said there, the framework cannot execute newly submitted requests until it finishes the earlier ones. And each WorkThread cannot steal until it's current request finishes which results in the wait().
The additional threads you see are "continuation threads." Since join() eventually issues a wait(), these threads are needed so the entire framework doesn't stall.

You’re not using this framework for the very narrow purpose for which it was intended.
The framework started life as the experiment in the 2000 research paper. It’s been modified since then but the basic design, fork-and-join on large arrays, remains the same. The basic purpose is to teach undergraduates how to walk down the leaves of a balanced tree. When people use it for other than simple array-processing weird things happen. What it is doing in Java7 is beyond me; which is the purpose of the article.
The problems only get worse in Java8. There it’s the engine to drive all stream parallel work. Have a read in part two of that article. The lambda interest lists are filled with reports of thread stalls, stack overflow, and out of memory errors.
You use it at your own risk when you don’t use it for pure recursive decomposition of large data structures. Even then, the excessive threads it creates can cause havoc. I’m not going to pursue this discussion any further.

Set Java exit code without exiting yet

In a highly concurrent program with lots of shutdown operations, wondering how to set the exit code without prematurely calling System.exit()? Possible to set an "execute this code when everything else is done" method? but I'd really just like to prematurely set the exit code.

If I understand correctly what you want is to somehow keep the exit code, run some methods and then call System.exit with the pre-decided exit code.
IMO what you should do is use Shutdown hooks instead. I.e. your code will run before the JVM shuts down and (if I got your requirement correctly) will have the same result with a straightforward coding implementation (i.e. instead of using using state variable and unusual coding logic to achieve what you are trying to do etc)

Have a master thread spawn off all other threads such that it only shuts down when all other threads are complete.

In a highly concurrent program with lots of shutdown operations
This is a code smell to me.
I can understand how multiple threads might want to shut down, but they shouldn't be allowed to do so.
Instead, I would create a global method called initiateShutdown(int code). This method would contain logic to determine when it's appropriate to actually shut down. Since you may not want a thread returning from this method, you could implement some sort of never-returning lock, and consign the thread to waiting on this lock.

Just store the result somewhere and use any suitable synchronization tool to tell that you are done. When you are done, just read the stored result and exit using System.exit(result).
I'm curious, if several threads set the result, which should you use?

Confused on how Java shares variables during multiprocessing

I just started using java so sorry if this question's answer is obvious. I can't really figure out how to share variables in java. I have been playing around with python and wanted to try to port some code over to Java to learn the langauge a bit better. Alot of my code is ported but I'm unsure how exactly multiprocessing and sharing of variables works in Java(my process is not disk bound, and uses alot of cpu and searching of a list).
In Python, I can do this:
from multiprocessing import Pool, Manager
manager = Manager()
shared_list = manager.list()
pool = Pool(process=4)
for variables_to_send in list_of_data_to_process:
pool.apply_async(function_or_class, (variables_to_send, shared_list))
pool.close()
pool.join()
I've been having a bit of trouble figuring out how to do multiprocessing and sharing like this in Java. This question helped me understand a bit(via the code) how implementing runnable can help and I'm starting to think java might automatically multiprocess threads(correct me if I'm wrong on this I read that once threads exceed capacity of a cpu they are moved to another cpu? The oracle docs seem to be more focused on threads than multiprocessing). But it doesn't explain how to share lists or other variables between proceses(and keep them in close enough sync).
Any suggestions or resources? I am hoping I'm searching for the wrong thing(multiprocessing java) and that this is hopefully as easy(or similarly straightforward) as it is in my above code.
Thanks!

There is an important difference between a thread and a process, and you are running into it now: with some exceptions, threads share memory, but processes do not.
Note that real operating systems have ways around just about everything I'm about to say, but these features aren't used in the typical case. So, to fire up a new process, you must clone the current process in some way with a system call (on *nix, this is fork()), and then replace the code, stack, command-line arguments, etc. of the child process with another system call (on *nix, this is the exec() family of system calls). Windows has rough equivalents of both these system calls, so everything I'm saying is cross-platform. Also, the Java Runtime Environment takes care of all these system calls under the covers, and without JNI or some other interop technology you can't really execute them yourself.
There are two important things to note about this model: the child process doesn't share the address space of the parent process, and the entire address space of the child process gets replaced on the exec() call. So, variables in the parent process are unavailable to the child process, and vice versa.
The thread model is quite different. Threads are kind of like lite processes, in that each thread has its own instruction pointer, and (on most systems) threads are scheduled by the operating system scheduler. However, a thread is a part of a process. Each process has at least one thread, and all the threads in the process share memory.
Now to your problem:
The Python multiprocessing module spawns processes with very little effort, as your code example shows. In Java, spawning a new process takes a little more work. It involves creating a new Process object using ProcessBuilder.start() or Runtime.exec(). Then, you can pipe strings to the child process, get back its output, wait for it to exit, and a few other communication primitives. I would recommend writing one program to act as the coordinator and fire up each of the child processes, and writing a worker program that roughly corresponds to function_or_class in your example. The coordinator can open multiple copies of the worker program, give each a task, and wait for all the workers to finish.

You can use Java Thread for this purpose. You need to create one user defined class. That class should have setter method through which you can set shared_list object. Implement Runnable interface and perform processing task in run() method. You can find good example on internet. If you are sharing the same instance of shared_list then you need to make sure that access to this variable is synchronized.

This is not the easiest way to work with threads in java but its the closed to the python code you posted. The task class is an instance of the callable interface and it has a call method. When we create each of the 10000 Task instances we pass them a reference to the same list. So when the call method of all those objects is called they will use the same list.
We are using a fixed size thread pool of 4 threads here so all the tasks we are submitting get queued and wait for a thread to be available.
public class SharedListRunner {
public void RunList() {
ExecutorService executerService = Executors.newFixedThreadPool(4);
List<String> sharedList = new List<String>();
sharedList.add("Hello");
for(int i=0; i < 10000; i++)
executerService.submit(new Task(list));
}
}
public class Task implements Callable<String> {
List<String> sharedList;
public Task(List<String> sharedList) {
this.sharedList = sharedList;
}
#Override
public String call() throws Exception {
//Do something to shared list
sharedList.size();
return "World";
}
}
At any one time 4 threads are accessing the list. If you want to dig further 4 Java threads are accessing the list, There are probably fewer OS threads servicing those 4 java threads and there are even fewer processor threads normally 2 or 4 per core of your cpu.

Analysing a multi-threaded Java application

In an open source application I'm participating, we've got a bug, where the application doesn't always close properly. That's what I'd like to solve.
Experience has shown that this happens most of the time when threads and processes are being started, but not managed correctly (e.g. a thread is waiting on a socket connection, the application is being shut down and the thread keeps on waiting).
With this in mind I've searched for '.start()' in the entire source and found 53 occurrences (which scared me a bit).
As a first step, I wanted to create a helper class (ThreadExecutor) where the current code 'thread.start()' would be replaced by 'ThreadExecutor.Execute(thread)' to have a) only a few changes in the existing source and b) a single class where I can easily check which threads don't end as they should. To do this I wanted to
add the thread to be executed to a list called activeThreads when calling the Execute method
start the thread
remove it from the activeThreads list when it ends.
This way I'd have an up to date list of all executing threads and when the app hangs on shutdown I could see in there which thread(s) is(are) causing it.
Questions:
What do you think about the concept? I'm usually coding c# and know how I'd do it using .NET with workers, but am not too sure what's best in Java (I'd like to modify as few lines of code as possible in the existing source).
If the concept seems ok, how can I get notified of a thread terminating. I'd like to avoid having an additional thread checking every once in a while what the state of all threads contained in activeThreads is, to remove them if they terminated.
Just to clarify: Before figuring out how to terminate the application properly, what I'm asking here is what's the best/easiest way to find which threads are at cause for certain test cases which are pretty hard to reproduce.

I would attempt to analyze your application's behavior before changing any code. Code changes can introduce new problems - not what you want to do if you're trying to solve problems.
The easiest way to introspect the state of your application with regard to which threads are currently running is to obtain a thread dump. You said that your problem is that the application hangs on shutdown. This is the perfect scenario to apply a thread dump. You'll be able to see which threads are blocked.
You can read more about thread dumps here.

Try to make all threads daemon(when all remaining threads are daemon the JVM terminates). Use thread.setDaemon(true) before starting each thread.

You could try to look into your application using jvisualvm (which is shipped with the jdk, find it in the bin folder of your jdk). JVisualVM can connect to your application and display a lot of interesting information, including which processes are still running. I'd give that a shot before starting down the road you describe.
Here is some documentation on JVisualVM should you need it.

The best way in java is to use Thread pools instead of Threads directly (although using threads directly is accepted). Thread pools accept Runnable objects, which you can see as Tasks. The idea is that most threads would do a small task and then end, because making a Thread is expensive and harder to manager you can use the threadpool, which allows things like 'ThreadPoolExecutor.awaitTermination()`. If you have more tasks than Threads in the pool, remaining tasks will just be queued.
Changing a Thread into a Runnable is easy, and you can even execute a runnable on a Thread you make yourself.
Note that this might not work for threads that run a long time, but your question seems to suggest that they will eventually finish.
As for your second question, the best way to find out which threads are running at a certain point is to run the application in a debugger (such as Eclipse) and pause all threads on a breakpoint in the close function.

I would try the trial edition of jprofiler or something similar, which gives you a lot of insight into what your application and its threads actually do.
Don't change the code yet, but try to reproduce and understand when this happens.

Create yourself a static thread pool.
static ExecutorService threads = Executors.newCachedThreadPool();
For every start of thread change:
new Thread(new AThread()).start();
to
threads.submit(new AThread ());
When your code exits, list all running threads with:
List<Runnable> runningThreads = threads.shutdownNow();
for ( Runnable t : runningThreads ) {
System.out.println("Thread running at shutdown: "+t.toString());
}
This will not only shut down all running threads, it will list them out for you to see what their issue is.
EDIT: Added
If you want to keep track of all running threads use:
Future f = threads.submit(new AThread ());
and store it in a list somewhere. You can then find out about its state with calls like:
f.isDone();
... etc.

Java threads query

Im working on a java application that involves threads. So i just wrote a piece of code to just familiarize myself with the execution of multiple yet concurrent threads
public class thready implements Runnable{
private int num;
public thready(int a) {
this.num=a;
}
public void run() {
System.out.println("This is thread num"+num);
for (int i=num;i<100;i++)
{
System.out.println(i);
}
}
public static void main(String [] args)
{
Runnable runnable =new thready(1);
Runnable run= new thready(2);
Thread t1=new Thread(runnable);
Thread t2=new Thread(run);
t1.start();
t2.start();
}}
Now from the output of this code, I think at any point in time only 1 thread is executing and the execution seems to alternate between the threads. Now i would like to know if my understanding of the situation is correct. And if it is I would like to know if there is any way in which i could get both threads to executing simultaneously as i wish to incorporate this scenario in a situation wherein i want to write a tcp/ip socket listener that simultaneously listens on 2 ports, at the same time. And such a scenario cant have any downtime.
Any suggestions/advice would be of great help.
Cheers

How many processors does your machine have? If you have multiple cores, then both threads should be running at the same time. However, console output may well be buffered and will require locking internally - that's likely to be the effect you're seeing.
The easiest way to test this is to make the threads do some real work, and time them. First run the two tasks sequentially, then run them in parallel on two different threads. If the two tasks don't interact with each other at all (including "hidden" interactions like the console) then you should see a roughly 2x performance improvement using two threads - if you have two cores or more.
As Thilo said though, this may well not be relevant for your real scenario anyway. Even a single-threaded system can still listen on two sockets, although it's easier to have one thread responsible for each socket. In most situations where you're listening on sockets, you'll spend a lot of the time waiting for more data anyway - in which case it doesn't matter whether you've got more than one core or not.
EDIT: As you're running on a machine with a single core (and assuming no hyperthreading) you will only get one thread executing at a time, pretty much by definition. The scheduler will make sure that both threads get CPU time, but they'll basically have to take turns.

If you have more than one CPU, both threads can run simultaneously. Even if you have only one CPU, as soon as one of the threads waits for I/O, the other can use the CPU. The JVM will most likely also try to dice out CPU time slices fairly. So for all practical purposes (unless all they do is use the CPU), your threads will run simultaneously (as in: within a given second, each of them had access to the CPU).
So even with a single CPU, you can have two threads listening on a TCP/IP socket each.

Make the threads sleep in between the println statements. What you have executes too fast for you to see the effect.

Threads are just a method of virtualizing the CPU so that it can be used by several applications/threads simultaneously. But as the CPU can only execute one program at a time, the Operating System switches between the different threads/processes very fast.
If you have a CPU with just one core (leaving aside hyperthreading) then your observation, that only one thread is executing at a time, is completely correct. And it's not possible in any other way, you're not doing anything wrong.

If the threads each take less than a single CPU quantum, they will appear to run sequentially. Printing 100 numbers in a row is probably not intensive enough to use up an entire quantum, so you're probably seeing sequential running of threads.
As well, like others have suggested, you probably have two CPU, or a hyperthreaded CPU at least. The last pure single core systems were produced around a decade ago, so it's unlikely that your threads aren't running side-by-side.
Try increasing the amount of processing that you do, and you might see the output intermingle. Be aware that when you do, System.out.println is NOT threadsafe, as far as I know. You'll get one thread interrupting the output of another mid-line.

They do run simultaneously, they just can't use the outputstream at the same time.
Replace your run- method with this:
public void run() {
for (int i=num;i<100;i++) {
try {
Thread.sleep(100);
System.out.println("Thread " + num + ": " + i);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

If you are getting many messages per second and processing each piece of data takes few milliseconds to few seconds, it is not a good idea to start one-thread per message. Ultimately number of threads spawned are limited by the underlying OS. You may get out-of-threads error or something like that.
Java 5 introduced Thread Pool framework where you can allocate a fixed number of threads and submit the job (instance of Runnable). This framework will run the job in one of the available thread in the pool. It is more efficient as there is not much context switching done. I wrote a blog entry to jump-start on this framework.
http://dudefrommangalore.blogspot.com/2010/01/concurrency-in-java.html
Cheers,
-- baliga

For the question on listening on 2 ports, clients has to send message to one of them. But since both ports are opened to accept connections within a single JVM, if the JVM fails having 2 ports does not provide you high-availability.
Usual pattern for writing a server which listen on a port is to have one thread listen on the port. As soon as the data arrives, spawn another thread, hand-over the content as well as the client socket to the newly spawned thread and continue accepting new messages.
Another pattern is to have multiple threads listen on the same socket. When client connects, connection is made to one of the thread.

Two ways this could go wrong:
System.out.println() may use a buffer, you should call flush() to get it to the screen.
There has to be some synchronisation
build into the System.out object or
you couldn't use it in a
multithreaded application without
messing up the output, so it is
likely that one thread holds a lock
for most of the time, making the other thread wait. Try using System.out in one thread and Sytem.err in the other.

Go and read up on multitasking and multiprogramming. http://en.wikipedia.org/wiki/Computer_multitasking

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.