Multithreading--Why one thread is doing all of the work? - java

I am multiplying two matrices using two threads (however, the program is written to scale up as well, so I could possibly use three, four, etc threads instead). Each thread calculates/does the work for one row (or column) of the final matrix. If one thread is doing work on a row, the other one(s) should not work on that row. It/they should move on to the next available row.
First of all, I am not certain if the way I implemented the problem is correct. If you can see a better way, please let me know.
Secondly, the way I have done it, every time I test it (with different size matrices--even huge ones), only one thread does the work. That is, each time, the same thread is getting access to the synchronized block of the run() method. The other threads are entering the run() method, but why is only one thread always gaining the lock and doing all of the work?
This is my run method:
public void run() {
System.out.println(Thread.currentThread().getName());
while (i < number of columns in final matrix) {
synchronized (this) {
if (i < number of columns in final matrix) {
for (int j = 0; j < Main.B[0].length; j++) {
for (int k = 0; k < Main.A[0].length; k++) {
Main.C[i][j] += Main.A[i][k] * Main.B[k][j];
}
}
i++;
}
}
}
}
This is the code in my driver class that creates the threads and starts the program:
MyRunnable r = new MyRunnable();
Thread thread1 = new Thread(r);
Thread thread2 = new Thread(r);
thread1.start();
thread2.start();
try {
thread1.join();
thread2.join();
} catch (InterruptedException ie) {
System.out.println("\nThe following error occurred: " + ie);
}
}
I guess my question is two-fold--is my approach correct for the problem at hand? If so, (and if not), why is one thread always grabbing the lock and doing all of the work? I have checked the program with up to 6 threads on 20x20 matrices and always only one thread is doing the work.

As some of the comments suggested, the problem is in the locking (i.e. the synchronized(this) part). Synchronizing is done on this which, in your case, a single instance of MyRunnable, so while one thread is doing the work inside the synchronized block, all other threads will wait until the work is finished. So effectively, only one thread is doing real work at a time.
Here's how to solve the problem. Since you need your threads to work on different rows in parallel, then this work must not be synchronized by a lock (because locking means the opposite: only one thread can do the work at a time). What you do need to synchronize is the part where each thread decides which row it will work on.
Here's a sample pseudo code:
public void run(){
int workRow;
synchronized(this){
workRow = findNextUnprosessedRow();
}
for(int i=0; i<matrix[workRow].length; i++){
//do the work
}
}
Note that the actual work is intentionally not synchronized, for reasons given above.
The way you are using threads is correct, so there is not problem with that, however, I would suggest you have a look at Java's concurrency API: Thread Pools. Here's an example of how to use it in your context:
//Creates a pool of 5 concurrent thread workers
ExecutorService es = Executores.newFixedThreadPool(5);
//List of results for each row computation task
List<Future<Void>> results = new ArrayList<Future<Void>>();
try{
for(int row=0; row<matrix.length; row++){
final int workRow = row;
//The main part. You can submit Callable or Runnable
// tasks to the ExecutorService, and it will run them
// for you in the number of threads you have allocated.
// If you put more than 5 tasks, they will just patiently
// wait for a task to finish and release a thread, then run.
Future<Void> task = es.submit(new Callable<Void>(){
#Override
public Void call(){
for(int col=0; col<matrix[workRow].length; col++){
//do something for each column of workRow
}
return null;
}
});
//Store the work task in the list.
results.add(task);
}
}finally{
//Make sure thread-pool is shutdown and all worker
//threads are released.
es.shutdown();
}
for(Future<Void> task : results){
try{
//This will wait for threads to finish.
// i.e. same as Thread.join()
task.get();
}catch(ExecutionException e){
//One of the tasks threw an exception!
throw new RuntimeException(e);
}
}
This approach is a lot cleaner, because the work distribution is done the main
thread (the outer for-loop), and therefore there is no need to synchronize it.
You also get few bonuses when working with thread pools:
It nicely takes care of any exceptions during the computations in each
of the threads. When working with bare threads, like in your approach, it is easy
to "lose" an exception.
Threads are pooled. That is, they get automatically reused so you don't need to worry about the cost of spawning new threads. This is particularly useful in your case, since you will need to spawn a thread per row in your matrix, which may be fairly large, I suspect.
Tasks submitted to ExecutorService are wrapped in a useful Future<Result> object, which is most useful when each computation task actually returns some kind of result. In your case, if you needed to sum-up all values in the matrix, then each computation task could return the sum for the row. Then you'd just need to sum up those up.
Got a bit long, but hope it clears some things up.

Your problem is that you synchronize the whole region with synchronized(this). This means that only one thread at a time is allowed to enter the loop doing the calculation. Of course it could mean that multiple threads can calculate different parts but never multiple threads at once. This also means your "parallel" solution it is not faster than one thread.
If you want to do the calculation in parallel have a look at Parallel Matrix Multiplication in Java 6 and Fork Join Matrix Multiplication in Java which should cover the topic

Thread scheduling depends on the particular VM implementation. In some implementations a thread will continue to run until it blocks in some way or is preempted by a higher priority thread. In your case all the threads have the same priority, so the first thread to enter the synchronized block never blocks, it does not get preempted. Some schedulers implement priority aging, such that a starved thread will eventually increase in priority, but you may not be running long enough for that to have an effect.
Add a Thread.yield() call just after the end of the synchronized block. This tells the scheduler to pick a new thread to run (maybe the same one, but probably a different one).

Your run function has the first thread to get the lock do all the work on a row while still owning the lock. For the next row, maybe another thread will get the lock, but it will block all other threads until it is done.
What I would do is have an array of booleans that is the same as the number of rows, and use these to claim the task of processing each individual row. It would be something like the following pseudocode:
//before creating the threads, pre-fill BoolList with trues
function run()
{
while (true)
{
lock(BoolList)
{
//find first true value and set it to false
//if no true found, return
}
//do the actual math of multiplying the row we claimed above
}
}
Also keep in mind that the overhead of creating a new thread is sufficient that multi-threading this program would only be worth it for large matrices.

As mru already stated in his comment, you problem is that all row calculation is performed inside "synchronized (this)" block. Because of this all threads will wait for one row to be processed before starting on next one, and same thread always acquiring the lock is probably the result of optimization, since you pretty much make calculations single-thread. You might consider putting only decision on which row to process inside the synchronized block:
int rowToProcess;
synchronized (this) {
if (i < number of columns in final matrix){
rowToProcess = i;
i++;
}
else
return;
}

Related

Run functions concurrently in java with wait [duplicate]

During the course of my program execution, a number of threads are started. The amount of threads varies depending on user defined settings, but they are all executing the same method with different variables.
In some situations, a clean up is required mid execution, part of this is stopping all the threads, I don't want them to stop immediately though, I just set a variable that they check for that terminates them. The problem is that it can be up to 1/2 second before the thread stops. However, I need to be sure that all threads have stopped before the clean up can continues. The cleanup is executed from another thread so technically I need this thread to wait for the other threads to finish.
I have thought of several ways of doing this, but they all seem to be overly complex. I was hoping there would be some method that can wait for a group of threads to complete. Does anything like this exist?
Just join them one by one:
for (Thread thread : threads) {
thread.join();
}
(You'll need to do something with InterruptedException, and you may well want to provide a time-out in case things go wrong, but that's the basic idea...)
If you are using java 1.5 or higher, you can try CyclicBarrier. You can pass the cleanup operation as its constructor parameter, and just call barrier.await() on all threads when there is a need for cleanup.
Have you seen the Executor classes in java.util.concurrent? You could run your threads through an ExecutorService. It gives you a single object you can use to cancel the threads or wait for them to complete.
Define a utility method (or methods) yourself:
public static waitFor(Collection<? extends Thread) c) throws InterruptedException {
for(Thread t : c) t.join();
}
Or you may have an array
public static waitFor(Thread[] ts) throws InterruptedException {
waitFor(Arrays.asList(ts));
}
Alternatively you could look at using a CyclicBarrier in the java.util.concurrent library to implement an arbitrary rendezvous point between multiple threads.
If you control the creation of the Threads (submission to an ExecutorService) then it appears you can use an ExecutorCompletionService
see ExecutorCompletionService? Why do need one if we have invokeAll? for various answers there.
If you don't control thread creation, here is an approach that allows you to join the threads "one by one as they finish" (and know which one finishes first, etc.), inspired by the ruby ThreadWait class.
Basically by newing up "watching threads" which alert when the other threads terminate, you can know when the "next" thread out of many terminates.
You'd use it something like this:
JoinThreads join = new JoinThreads(threads);
for(int i = 0; i < threads.size(); i++) {
Thread justJoined = join.joinNextThread();
System.out.println("Done with a thread, just joined=" + justJoined);
}
And the source:
public static class JoinThreads {
java.util.concurrent.LinkedBlockingQueue<Thread> doneThreads =
new LinkedBlockingQueue<Thread>();
public JoinThreads(List<Thread> threads) {
for(Thread t : threads) {
final Thread joinThis = t;
new Thread(new Runnable() {
#Override
public void run() {
try {
joinThis.join();
doneThreads.add(joinThis);
}
catch (InterruptedException e) {
// "should" never get here, since we control this thread and don't call interrupt on it
}
}
}).start();
}
}
Thread joinNextThread() throws InterruptedException {
return doneThreads.take();
}
}
The nice part of this is that it works with generic Java threads, without modification, any thread can be joined. The caveat is it requires some extra thread creation. Also this particular implementation "leaves threads behind" if you don't call joinNextThread() the full number of times, and doesn't have an "close" method, etc. Comment here if you'd like a more polished version created. You could also use this same type of pattern with "Futures" instead of Thread objects, etc.

Looping in Threads

Consider the following two designs of run method:
Approach A
public void run() {
do {
//do something
} while (condition);
}
Approach B
public void run() {
//do something...
if (condition) {
new Thread(this).start();
}
}
The second approach seems cleaner to me, after some debate, I have been told it's not a good idea to use approach two.
Question:
What are reasons (if there is any) that I shouldn't be using approach 2?
You have two things here. A loop, and a method that continuously runs itself again in a new thread until a condition is met (not a loop).
If you need a loop, you would choose the standard normal loop that everyone understands and works perfectly.
If you need to write a weird piece of code that creates new threads for no reason, and makes other developers doubt your skills and understanding, you would go for option B.
There's absolutely no sense in your choice B unless there would be something additional like a queue or ThreadPoolExecutor involved for re-invoking the method, so the method would add this at the end for invocation at a later time, sort of like a "lazy loop".
Because approach B uses one more thread than approach A. Creating threads is expensive, for a number of reasons #see Why is creating a Thread said to be expensive?
Approach A is also a little clearer to the reader, IMO. The simplest option usually is.
The 2nd option creates a new thread every time it is iterated, so it ends up being unnecessarily costly, especially when option A does the same thing but doesn't create new threads for every iteration.
The only good use-case I can find for Pattern B is if there is a significant delay before you want to re-run the method. For example for some kind of polling system that is supposed to run every X minutes until the system is being shut down.
In that case, using a scheduler instead of a Thread.sleep(fiveMinutes) makes sense to avoid tieing up resources unnecessarily (maybe you are holding on to a database connections or such).
Note that in that case, you'd be using a scheduler, not just Thread#start, so I am allowing for a rather liberal interpretation of Pattern B.
They will behave very differently.
The first solution will loop until condition is false and then terminate.
The second solution will start a new thread and die until condition is false. It will likely accomplish what you want to do but it will waste a lot of resources allocating and destroying new threads.
Here's an example that loops over 5 values and prints the value and current thread name:
Loop:
Runnable loop = new Runnable() {
int i = 0;
#Override
public void run() {
do {
System.out.printf("%s: %s%n", Thread.currentThread().getName(), i);
i++;
} while(i < 5);
}
};
loop.run();
main: 0
main: 1
main: 2
main: 3
main: 4
Threaded:
Runnable thread = new Runnable() {
int i = 0;
#Override
public void run() {
System.out.printf("%s: %s%n", Thread.currentThread().getName(), i);
i++;
if(i < 5) {
new Thread(this).start();
}
}
};
thread.run();
main: 0
Thread-0: 1
Thread-1: 2
Thread-2: 3
Thread-3: 4
As you can see, the threaded example prints each line on a different thread which is very wasteful and probably not what you want to accomplish.

how to safely increment while using threads in java

hi guys i was wondering if i can get a little advice im trying to write a program that can counts how many threads are waiting to process a function, and then once a certain number is achieved it releases all the thread. but my problem is i cant increment properly being that i can the all process the increment code at the same time , thus not incrementing it at all.
protected synchronized boolean isOpen()
{
//this code just calls mymethod intrested where the problem lies
lock.interested();
while(!lock.isReady())
{
}
return true;// this statement releases all my threads
}
public synchronized void interested()
{
count++;// how do i make this increment correctly with threads
System.out.println(count+"-"+ lanes+"-"+ThreadID.get());
if(count==lanes)
{
count =0;
ready =true;
}
}
The problem with your approach is that only one thread can enter the synchronized method at a time and hence, you will never proceed, as all but the first threads are waiting to enter the synchronized method while the first thread is performing a busy-wait loop. You have to use wait which not only solves the waste of CPU cycles of your busy wait but will also free the associated lock of the synchronized code so that the next thread can proceed:
protected synchronized boolean isOpen()
{
lock.interested();
while(!lock.isReady())
{
wait(); // now the next thread can enter isOpen()
}
notify(); // releases the previous thread wait()ing in this method
return true;
}
However, note that this works quite unreliable due to your code being split over multiple different objects. It’s strongly recommend to put the code maintaining the counter and code implementing the waiting for the counter into one object in order to run under the same lock. Your code structure must ensure that interested() can’t be invoked on the lock instance with isOpen not noticing. From the two code fragments you have posted, it’s impossible to deduce whether this is the case.
write a program that can counts how many threads are waiting to
process a function, and then once a certain number is achieved it
releases all the threads
A good solution will be to use CountDownLatch.
From the manual:
A CountDownLatch is initialized with a given count. The await methods
block until the current count reaches zero due to invocations of the
countDown() method, after which all waiting threads are released and
any subsequent invocations of await return immediately. This is a
one-shot phenomenon -- the count cannot be reset. If you need a
version that resets the count, consider using a CyclicBarrier.
You can find a good code example here
You should not use synchronised. Because only one thread will acquire monitor at a time.
You can use CountDownLatch. Just define the no of threads while initialising CountDownLatch.
private CountDownLatch countDownLatch = new CountDownLatch(no_of_threads);
protected boolean isOpen()
{
//this code just calls mymethod intrested where the problem lies
countDownLatch.countDown();
countDownLatch.await();
return true;// this statement releases all my threads
}
All the threads are waiting in countDownLatch.await(). Once the required amount of thread comes(countDownLatch.countDown() is called) it will allow to proceed.

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?
With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.
You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.
If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.
If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.

ThreadPoolExecutor's getActiveCount()

I have a ThreadPoolExecutor that seems to be lying to me when I call getActiveCount(). I haven't done a lot of multithreaded programming however, so perhaps I'm doing something incorrectly.
Here's my TPE
#Override
public void afterPropertiesSet() throws Exception {
BlockingQueue<Runnable> workQueue;
int maxQueueLength = threadPoolConfiguration.getMaximumQueueLength();
if (maxQueueLength == 0) {
workQueue = new LinkedBlockingQueue<Runnable>();
} else {
workQueue = new LinkedBlockingQueue<Runnable>(maxQueueLength);
}
pool = new ThreadPoolExecutor(
threadPoolConfiguration.getCorePoolSize(),
threadPoolConfiguration.getMaximumPoolSize(),
threadPoolConfiguration.getKeepAliveTime(),
TimeUnit.valueOf(threadPoolConfiguration.getTimeUnit()),
workQueue,
// Default thread factory creates normal-priority,
// non-daemon threads.
Executors.defaultThreadFactory(),
// Run any rejected task directly in the calling thread.
// In this way no records will be lost due to rejection
// however, no records will be added to the workQueue
// while the calling thread is processing a Task, so set
// your queue-size appropriately.
//
// This also means MaxThreadCount+1 tasks may run
// concurrently. If you REALLY want a max of MaxThreadCount
// threads don't use this.
new ThreadPoolExecutor.CallerRunsPolicy());
}
In this class I also have a DAO that I pass into my Runnable (FooWorker), like so:
#Override
public void addTask(FooRecord record) {
if (pool == null) {
throw new FooException(ERROR_THREAD_POOL_CONFIGURATION_NOT_SET);
}
pool.execute(new FooWorker(context, calculator, dao, record));
}
FooWorker runs record (the only non-singleton) through a state machine via calculator then sends the transitions to the database via dao, like so:
public void run() {
calculator.calculate(record);
dao.save(record);
}
Once my main thread is done creating new tasks I try and wait to make sure all threads finished successfully:
while (pool.getActiveCount() > 0) {
recordHandler.awaitTermination(terminationTimeout,
terminationTimeoutUnit);
}
What I'm seeing from output logs (which are presumably unreliable due to the threading) is that getActiveCount() is returning zero too early, and the while() loop is exiting while my last threads are still printing output from calculator.
Note I've also tried calling pool.shutdown() then using awaitTermination but then the next time my job runs the pool is still shut down.
My only guess is that inside a thread, when I send data into the dao (since it's a singleton created by Spring in the main thread...), java is considering the thread inactive since (I assume) it's processing in/waiting on the main thread.
Intuitively, based only on what I'm seeing, that's my guess. But... Is that really what's happening? Is there a way to "do it right" without putting a manual incremented variable at the top of run() and a decremented at the end to track the number of threads?
If the answer is "don't pass in the dao", then wouldn't I have to "new" a DAO for every thread? My process is already a (beautiful, efficient) beast, but that would really suck.
As the JavaDoc of getActiveCount states, it's an approximate value: you should not base any major business logic decisions on this.
If you want to wait for all scheduled tasks to complete, then you should simply use
pool.shutdown();
pool.awaitTermination(terminationTimeout, terminationTimeoutUnit);
If you need to wait for a specific task to finish, you should use submit() instead of execute() and then check the Future object for completion (either using isDone() if you want to do it non-blocking or by simply calling get() which blocks until the task is done).
The documentation suggests that the method getActiveCount() on ThreadPoolExecutor is not an exact number:
getActiveCount
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Returns: the number of threads
Personally, when I am doing multithreaded work such as this, I use a variable that I increment as I add tasks, and decrement as I grab their output.

Categories