Let's say I have 5 threads that must make a combined total of 1,000,000 function calls for a parallel Monte Carlo Method program. I assigned 1,000,000 / 5 function calls for each of the 5 threads. However, after many tests (some tests ranging up to 1 trillion iterations) I realized that some threads were finishing much faster than others. So instead I would like to dynamically assign workload to each of these threads. My first approach involved a AtomicLong variable that was set to an initial value of, let's say, 1 billion. After each function call, I would decrement the AtomicLong by 1. Before every function call the program would check to see if the AtomicLong was greater than 0, like this:
AtomicLong remainingIterations = new AtomicLong(1000000000);
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for (int i = 0; i < 5; i++) {//create 5 threads
threadPool.submit(new Runnable() {
public void run() {
while (remainingIterations.get() > 0) {//do a function call if necessary
remainingIterations.decrementAndGet();//decrement # of remaining calls needed
doOneFunctionCall();//perform a function call
}
}
});
}//more unrelated code is not show (thread shutdown, etc.)
This approach seemed to be extremely slow, am I using AtomicLong correctly? Is there a better approach?
am I using AtomicLong correctly?
Not quite. The way you are using it, two threads could each check remainingIterations, each see 1, then each decrement it, putting you at -1 total.
As for you slowness issue, it is possible that, if doOneFunctionCall() completes quickly, your app is being bogged down by the lock-contention surrounding your AtomicLong.
The nice thing about an ExecutorService is that it logically decouples the work being done from the threads that are doing it. You can submit more jobs than you have threads, and the ExecutorService will execute them as soon as it is able:
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for (int i = 0; i < 1000000; i++) {
threadPool.submit(new Runnable() {
public void run() {
doOneFunctionCall();
}
});
}
This might be balancing your work a bit too much in the other direction: creating too many short-lived Runnable objects. You can experiment to see what gives you the best balance between distributing the work and performing the work quickly:
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for (int i = 0; i < 1000; i++) {
threadPool.submit(new Runnable() {
public void run() {
for (int j = 0; j < 1000; j++) {
doOneFunctionCall();
}
}
});
}
Look at ForkJoinPool. What you are attempting is called divide-and-conquer. In F/J you set the number of threads to 5. Each thread has a queue of pending Tasks. You can evenly set the number of Tasks for each thread/queue and when a thread runs out of work it work-steals from another thread's queue. This way you don't need the AtomicLong.
There a many examples of using this Class. If you need more info, let me know.
An elegant approach to avoid the creation of 1B tasks is to use a synchronous queue and a ThreadPoolExecutor, doing so submit will be blocked until a thread becomes available.
I didn't test actual performance though.
BlockingQueue<Runnable> queue = new SynchronousQueue<>();
ExecutorService threadPool = new ThreadPoolExecutor(5, 5,
0L, TimeUnit.MILLISECONDS,
queue);
for (int i = 0; i < 1000000000; i++) {
threadPool.submit(new Runnable() {
public void run() {
doOneFunctionCall();
}
});
}
Related
I'm wondering whether there is any advantage to keeping the same threads over the course of the execution of an object, rather than re-using the same Thread objects. I have an object for which a single (frequently used) method is parallelized using local Thread variables, such that every time the method is called, new Threads (and Runnables) are instantiated. Because the method is called so frequently, a single execution may instantiate upwards of a hundred thousand Thread objects, even though there are never more than a few (~4-6) active at any given time.
Following is a cut down example of how this method is currently implemented, to give a sense of what I mean. For reference, n is of course the pre-determined number of threads to use, whereas this.dataStructure is a (thread-safe) Map which serves as the input to the computation, as well as being modified by the computation. There are other inputs involved, but as they are not relevant to this question, I've omitted their usage. I've also omitted exception handling for the same reason.
Runnable[] tasks = new Runnable[n];
Thread[] threads = new Thread[n];
ArrayBlockingQueue<MyObject> inputs = new ArrayBlockingQueue<>(this.dataStructure.size());
inputs.addAll(this.dataStructure.values());
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
threads[i] = new Thread(tasks[i]);
threads[i].start();
}
for (int i = 0; i < n; i++)
threads[i].join();
Because these Threads (and their runnables) always execute the same way using a single ArrayBlockingQueue as input, an alternative to this would be to just "refill the queue" every time the method is called and just re-start the same Threads. This is easily implemented, but I'm unsure as to whether it would make any difference one way or the other. I'm not too familiar with concurrency, so any help is appreciated.
PS.: If there is a more elegant way to handle the polling, that would also be helpful.
It is not possible to start a Thread more than once, but conceptually, the answer to your question is yes.
This is normally accomplished with a thread pool. A thread pool is a set of Threads which rarely actually terminate. Instead, an application is passes its task to the thread pool, which picks a Thread in which to run it. The thread pool then decides whether the Thread should be terminated or reused after the task completes.
Java has some classes which make use of thread pools quite easy: ExecutorService and CompletableFuture.
ExecutorService usage typically looks like this:
ExecutorService executor = Executors.newCachedThreadPool();
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
executor.submit(tasks[i]);
}
// Doesn't interrupt or halt any tasks. Will wait for them all to finish
// before terminating its threads.
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
Executors has other methods which can create thread pools, like newFixedThreadPool() and newWorkStealingPool(). You can decide for yourself which one best suits your needs.
CompletableFuture use might look like this:
Runnable[] tasks = new Runnable[n];
CompletableFuture<?>[] futures = new CompletableFuture<?>[n];
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
futures[i] = CompletableFuture.runAsync(tasks[i]);
}
CompletableFuture.allOf(futures).get();
The disadvantage of CompletableFuture is that the tasks cannot be canceled or interrupted. (Calling cancel will mark the task as completing with an exception instead of completing successfully, but the task will not be interrupted.)
Per definition, you cannot restart a thread. According to the documentation:
It is never legal to start a thread more than once. In particular, a thread may not be restarted once it has completed execution.
Nevertheless a thread is a valuable resource, and there are implementations to reuse threads. Have a look at the Java Tutorial about Executors.
I want the final count to be 10000 always but even though I have used synchronized here, Im getting different values other than 1000. Java concurrency newbie.
public class test1 {
static int count = 0;
public static void main(String[] args) throws InterruptedException {
int numThreads = 10;
Thread[] threads = new Thread[numThreads];
for(int i=0;i<numThreads;i++){
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
synchronized (this) {
for (int i = 0; i < 1000; i++) {
count++;
}
}
}
});
}
for(int i=0;i<numThreads;i++){
threads[i].start();
}
for (int i=0;i<numThreads;i++)
threads[i].join();
System.out.println(count);
}
}
Boris told you how to make your program print the right answer, but the reason why it prints the right answer is, your program effectively is single threaded.
If you implemented Boris's suggestion, then your run() method probably looks like this:
public void run() {
synchronized (test1.class) {
for (int i = 0; i < 1000; i++) {
count++;
}
}
}
No two threads can ever be synchronized on the same object at the same time, and there's only one test1.class in your program. That's good because there's also only one count. You always want the number of lock objects and their lifetimes to match the number and lifetimes of the data that they are supposed to protect.
The problem is, you have synchronized the entire body of the run() method. That means, no two threads can run() at the same time. The synchronized block ensures that they all will have to execute in sequence—just as if you had simply called them one-by-one instead of running them in separate threads.
This would be better:
public void run() {
for (int i = 0; i < 1000; i++) {
synchronized (test1.class) {
count++;
}
}
}
If each thread releases the lock after each increment operation, then that gives other threads a chance to run concurrently.
On the other hand, all that locking and unlocking is expensive. The multi-threaded version almost certainly will take a lot longer to count to 10000 than a single threaded program would do. There's not much you can do about that. Using multiple CPUs to gain speed only works when there's big computations that each CPU can do independently of the others.
For your simple example, you can use AtomicInteger instead of static int and synchronized.
final AtomicInteger count = new AtomicInteger(0);
And inside Runnable only this one row:
count.IncrementAndGet();
Using syncronized blocks the whole class to be used by another threads if you have more complex codes with many of functions to use in a multithreaded code environment.
This code does'nt runs faster because of incrementing the same counter 1 by 1 is always a single operation which cannot run more than once at a moment.
So if you want to speed up running near 10x times faster, you should counting each thread it's own counter, than summing the results in the end. You can do this with ThreadPools using executor service and Future tasks wich can return a result for you.
I'm wondering how can I replace this CyclicBarrier with something else, like the usual joining of all threads in an array. This is the snippet (try-catch blocks etc. ommited for clarity):
protected void execute(int nGens)
{
CyclicBarrier thread_barrier = new CyclicBarrier(n_threads+1);
ExecutorService thread_pool = Executors.newFixedThreadPool(n_threads);
for (int i=0; i<thread_array.length; i++) // all threads are in this array
threadPool.execute(thread_array[i]);
for (int i=0; i<total_generations; i++)
threadBarrier.await();
threadPool.shutdown();
while(!threadPool.isTerminated()){}
}
and this is the code executed by the threads
public void run()
{
for (int i=0; i<total_generations; i++)
{
next_generation(); // parallel computation (aka the 'task')
thread_barrier.await();
}
}
as you can see, all threads are launched on startup, then perform a certain task a number of times. Each time they finish a task, they wait until all other threads have finished their task, and then they perform it again. Are there any lower level ways to achieve this kind of synchronization?
I am fairly new with java executors, so this maybe an easy question.
ExecutorService executorService = Executors.newFixedThreadPool(NumberOfThreads - 1);
do_work();
for(int i = 1; i < NumberOfThreads; i++)
{
executorService.execute(new Runnable()
{
public void run()
{
do_work();
}
});
}
My question is:
If I create a fixed thread pool with 'N' threads, and if I want to execute 'N' tasks, like the code above. Do I have guarantees that each thread will only execute one task (do_work())?
No. It's a pool, and the assignment of threads to tasks doesn't make such guarantees.
e.g. imagine your do_work() method completes immediately. By the time you submit your 2nd Runnable, all the threads in the pool will be available, and any one of them will be a candidate for your job.
Suppose I have the following code which I wan't to optimize by spreading the workload over the multiple CPU cores of my PC:
double[] largeArray = getMyLargeArray();
double result = 0;
for (double d : largeArray)
result += d;
System.out.println(result);
In this example I could distribute the work done within the for-loop over multiple threads and verify that the threads have all terminated before proceeding to printing the result. I therefore came up with something that looks like this:
final double[] largeArray = getMyLargeArray();
int nThreads = 5;
final double[] intermediateResults = new double[nThreads];
Thread[] threads = new Thread[nThreads];
final int nItemsPerThread = largeArray.length/nThreads;
for (int t = 0; t<nThreads; t++) {
final int t2 = t;
threads[t] = new Thread(){
#Override public void run() {
for (int d = t2*nItemsPerThread; d<(t2+1)*nItemsPerThread; d++)
intermediateResults[t2] += largeArray[d];
}
};
}
for (Thread t : threads)
t.start();
for (Thread t : threads)
try {
t.join();
} catch (InterruptedException e) { }
double result = 0;
for (double d : intermediateResults)
result += d;
System.out.println(result);
Assume that the length of the largeArray is dividable by nThreads. This solution works correctly.
However, I am encountering the problem that the above threading of for-loops occurs a lot in my program, which causes a lot of overhead due to the creation and garbage collection of threads. I am therefore looking at modifying my code by using a ThreadPoolExecutor. The threads giving the intermediate results would then be reused in the next execution (summation, in this example).
Since I store my intermediate results in an array of a size which has to be known beforehand, I was thinking of using a thread pool of fixed size.
I am having trouble, however, with letting a thread know at which place in the array it should store its result.
Should I define my own ThreadFactory?
Or am I better of using an array of ExecutorServices created by the method Executors.newSingleThreadExecutor(ThreadFactory myNumberedThreadFactory)?
Note that in my actual program it is very hard to replace the double[] intermediateResults with something of another type. I would prefer a solution which is confined to creating the right kind of thread pool.
I am having trouble, however, with letting a thread know at which place in the array it should store its result. Should I define my own ThreadFactory?
No need for that. The interfaces used by executors (Runnable and Callable) are run by threads, and you can pass whatever arguments to implementations you want to pass (for instance, an array, a begin index and an end index).
A ThreadPoolExecutor is indeed a good solution. Also look at FutureTask if you have runnables bearing results.
ExecutorService provides you with API to get the result from thread pool via Future interface:
Future<Double> futureResult = executorService.submit(new Callable<Double>() {
Double call() {
double totalForChunk = 0.0;
// do calculation here
return totalForChunk;
}
});
Now all you need to do is to submit tasks (Callable instances) and wait for result to be available:
List<Future<Double>> results = new ArrayList<Double>();
for (int i = 0; i < nChunks; i++) {
results.add(executorService.submit(callableTask));
}
Or even simpler:
List<Future<Double>> results = executorService.invokeAll(callableTaskList);
The rest is easy, iterate over results and collect total:
double total = 0.0;
for (Future<Double> result : results) {
total += result.get(); // this will block until your task is completed by executor service
}
Having that said, you do not care how much threads you have in executor service. You just submit tasks and collect results when they are available.
You would be better off creating "worker" threads that take information about work to be performed from a queue. Your process would then be to create an initially empty WorkQueue and then create and start the worker threads. Each worker thread would pick up its work from the queue, do the work, and put the work on a "finished" queue for the master to pick up and handle.