Why does a FIFO array queue lock not seem fair? - java

In §7.5.1 of The Art of Multiprocessor Programming by Herlihy et al. (2nd ed., 2020), the authors present a simple lock that uses an array queue to achieve FIFO locking. Intuitively, the nth thread has a (thread-local) index into an array, and then spins on that array element until the n - 1 thread unlocks the lock. Its code looks like this:
public class ALock {
ThreadLocal<Integer> mySlotIndex = new ThreadLocal<>() {
#Override protected Integer initialValue() { return 0; }
};
AtomicInteger tail;
volatile boolean[] flag;
int size;
public ALock(int capacity) {
size = capacity;
tail = new AtomicInteger(0);
flag = new boolean[capacity];
flag[0] = true;
}
public void lock() {
int slot = tail.getAndIncrement() % size;
mySlotIndex.set(slot);
while (!flag[slot]) {};
}
public void unlock() {
int slot = mySlotIndex.get();
flag[slot] = false;
flag[(slot + 1) % size] = true;
}
}
I am using a minimal test program to check that this lock is fair. In a nutshell, I create NUM_THREADS threads and map each one to an array index id. Each thread tries to acquire the same lock. Once it succeeds, it increments a global COUNT and also increments RUNS_PER_THREAD[id].
If the lock is correct, the final value of COUNT should equal the sum of the values in RUNS_PER_THREAD. If the lock is fair, the elements of RUNS_PER_THREAD should be approximately equal.
public class Main {
static long COUNT = 0;
static int NUM_THREADS = 16;
// static Lock LOCK = new ReentrantLock(true);
static ALock LOCK = new ALock(NUM_THREADS);
static long[] RUNS_PER_THREAD = new long[NUM_THREADS];
static Map<Long, Integer> THREAD_IDS = new HashMap<>();
public static void main(String[] args) {
var threads = IntStream.range(0, NUM_THREADS).mapToObj(Main::makeWorker).toArray(Thread[]::new);
for (int i = 0; i < threads.length; i++) THREAD_IDS.put(threads[i].getId(), i);
for (var thread: threads) thread.start();
try { Thread.sleep(300L); } catch (InterruptedException e) {}
for (var thread: threads) thread.interrupt();
try { Thread.sleep(100L); } catch (InterruptedException e) {}
for (int i = 0; i < NUM_THREADS; i++) System.out.printf("Thread %d:\t%12d%n", i, RUNS_PER_THREAD[i]);
System.out.println("Counted up to: \t\t\t" + COUNT);
System.out.println("Sum for all threads: \t" + Arrays.stream(RUNS_PER_THREAD).sum());
}
private static Thread makeWorker(int i) {
return new Thread(() -> {
while (true) {
if (Thread.interrupted()) return;
LOCK.lock();
try {
COUNT++;
var id = THREAD_IDS.get(Thread.currentThread().getId());
RUNS_PER_THREAD[id]++;
} finally {
LOCK.unlock();
}}});
}
}
If the test program is run with a fair ReentrantLock, the final count of runs per thread with 16 threads (on my M1 Max Mac with Java 17) is almost exactly equal. If the same test is run with ALock, the first few threads seem to acquire the lock approximately 10 times more frequently than the last few threads.
Is ALock, as presented, unfair, and if so, why? Alternatively, is my minimal test flawed, and if so, why does it seem to demonstrate the fairness of ReentrantLock?

Your test code has non-threadsafe update for COUNT++. Switch to COUNT.incrementAndGet() and:
static AtomicLong COUNT = new AtomicLong();
ALock will give unfair results especially when number of threads exceeds CPUs. The implementation relies on high CPU spin loop while (!flag[slot]) and not all threads are having same opportunity to enter their lock spin-loops - the first few threads are performing more of the lock-unlock cycles. Adding Thread.yield should balance out the thread access to the boolean array so all threads have similar opportunities to run through their own lock spin loop.
while (!flag[slot]) {
Thread.yield();
}
You should see different results if you try setting NUM_THREADS to be same or less than Runtime.getRuntime().availableProcessors() - the use of Thread.yield() may not make a difference compared to when NUM_THREADS > Runtime.getRuntime().availableProcessors().
Using this lock class will lead to slower throughput as at any one time up to N-1 threads are in high CPU spin loop waiting for the current locking thread to call unlock(). In ideal lock implementations, N-1 waiters won't be consuming CPU.
The ALock locking stategy will only work if the exact same number of threads is used as provided new ALock(NUM_THREADS) because otherwise the use of int slot = tail.getAndIncrement() % size; may result in 2 threads reading from the same slot.
Note that any code relying on spin loop or Thread.yield() to work is not an effective implementation and should not be used in production code. Both can be avoided with the classes of java.util.concurrent.*.

Related

Why is synchronization for count not working?

I want the final count to be 10000 always but even though I have used synchronized here, Im getting different values other than 1000. Java concurrency newbie.
public class test1 {
static int count = 0;
public static void main(String[] args) throws InterruptedException {
int numThreads = 10;
Thread[] threads = new Thread[numThreads];
for(int i=0;i<numThreads;i++){
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
synchronized (this) {
for (int i = 0; i < 1000; i++) {
count++;
}
}
}
});
}
for(int i=0;i<numThreads;i++){
threads[i].start();
}
for (int i=0;i<numThreads;i++)
threads[i].join();
System.out.println(count);
}
}
Boris told you how to make your program print the right answer, but the reason why it prints the right answer is, your program effectively is single threaded.
If you implemented Boris's suggestion, then your run() method probably looks like this:
public void run() {
synchronized (test1.class) {
for (int i = 0; i < 1000; i++) {
count++;
}
}
}
No two threads can ever be synchronized on the same object at the same time, and there's only one test1.class in your program. That's good because there's also only one count. You always want the number of lock objects and their lifetimes to match the number and lifetimes of the data that they are supposed to protect.
The problem is, you have synchronized the entire body of the run() method. That means, no two threads can run() at the same time. The synchronized block ensures that they all will have to execute in sequence—just as if you had simply called them one-by-one instead of running them in separate threads.
This would be better:
public void run() {
for (int i = 0; i < 1000; i++) {
synchronized (test1.class) {
count++;
}
}
}
If each thread releases the lock after each increment operation, then that gives other threads a chance to run concurrently.
On the other hand, all that locking and unlocking is expensive. The multi-threaded version almost certainly will take a lot longer to count to 10000 than a single threaded program would do. There's not much you can do about that. Using multiple CPUs to gain speed only works when there's big computations that each CPU can do independently of the others.
For your simple example, you can use AtomicInteger instead of static int and synchronized.
final AtomicInteger count = new AtomicInteger(0);
And inside Runnable only this one row:
count.IncrementAndGet();
Using syncronized blocks the whole class to be used by another threads if you have more complex codes with many of functions to use in a multithreaded code environment.
This code does'nt runs faster because of incrementing the same counter 1 by 1 is always a single operation which cannot run more than once at a moment.
So if you want to speed up running near 10x times faster, you should counting each thread it's own counter, than summing the results in the end. You can do this with ThreadPools using executor service and Future tasks wich can return a result for you.

AtomicInteger is not working properly in java

I am having writer and reader threads though I use AtomicBoolean and AtomicInteger I could see duplicate values in the reader thread, please help me to find what's wrong with my code.
package automic;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicInteger;
public class AutomicTest {
public volatile AtomicBoolean isStopped = new AtomicBoolean(false);
public AtomicInteger count = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException {
AutomicTest test = new AutomicTest();
Thread writerThread = new Thread(() ->{
while(!test.isStopped.get()) {
test.count.incrementAndGet();
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
Thread readerThread = new Thread(() ->{
while(!test.isStopped.get()) {
System.out.println("Counter :"+test.count.get());
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
writerThread.start();
readerThread.start();
Thread.sleep(4000);
test.isStopped.getAndSet(true);
writerThread.join();
readerThread.join();
}
}
Counter :1
Counter :2
Counter :3 // duplicate
Counter :3 // duplicate
Counter :4
Counter :5
Counter :7
Counter :8
Counter :9
Counter :10
Counter :11
Counter :12 // duplicate
Counter :12 // duplicate
Counter :13
Counter :15 // duplicate
Counter :15 // duplicate
Counter :17
Counter :18
Counter :19
Counter :20
Counter :21
Counter :22
Counter :23
Counter :24
Counter :25
Counter :26
Counter :27
Counter :28
Counter :29
Counter :30
Counter :31
Counter :32
Counter :33
Counter :34
Counter :35
Counter :36
Counter :37
Counter :38
Counter :39
Counter :40
The two big take-aways from this are that:
Thread.sleep(100) does not mean "Sleep for 100 ms, exactly to the nanosecond". It's a little less exact, depends on the granularity and accuracy of the internal OS clock, on native thread scheduling, other tasks running on the computer. Even the sleep-wakeup cycle takes some (a surprisingly high) amount of time.
Atomics are good when multiple threads cooperate independently. If you need them to be somehow dependent, and/or react to other threads' actions, atomics are not what you want. You'll need to use an actual synchronization mechanism.
Therefore, you cannot use sleep() and atomics to schedule two threads to run in a perfectly balanced tick-tock cycle.
This happens in your code:
Writer thread writes value 1 at time 100, then goes to sleep.
Reader thread reads value 1 at time 100, then goes to sleep. They can both easily run on the same ms, on a multicore system they can even run at exactly the same time, and they probably do.
Reader wakes up at time 200.0 and reads value 1 again.
Writer wakes up at time 200.02 and writes value 2. Oops, we just got a duplicate.
Do note that the threads can even flip back, in that case you'll see a missing number in the sequence, and occasionally you do. To balance the threads to run in a perfect A-B-A-B scheme you can do e.g. something like this:
public class AutomicTest {
private volatile boolean isStopped = false;
private final CyclicBarrier barrier = new CyclicBarrier(2);
private int count = 0;
public static void main(String[] args) throws InterruptedException {
AutomicTest test = new AutomicTest();
Thread writerThread = new Thread(() -> {
while (!test.isStopped) {
test.count++;
try {
test.barrier.await();
Thread.sleep(100);
} catch (InterruptedException | BrokenBarrierException ignored) {
Thread.currentThread().interrupt();
break;
}
}
});
Thread readerThread = new Thread(() -> {
while (!test.isStopped) {
try {
test.barrier.await();
System.out.println("Counter: " + test.count);
Thread.sleep(100);
} catch (InterruptedException | BrokenBarrierException ignored) {
Thread.currentThread().interrupt();
break;
}
}
});
writerThread.start();
readerThread.start();
Thread.sleep(4000);
test.isStopped = true;
writerThread.join();
readerThread.join();
}
}
The key here is a CyclicBarrier which is:
A synchronization aid that allows a set of threads to all wait for
each other to reach a common barrier point. CyclicBarriers are useful
in programs involving a fixed sized party of threads that must
occasionally wait for each other. The barrier is called cyclic
because it can be re-used after the waiting threads are released.
In this case the barrier is set up to have two synchronized parties - the Writer and the Reader:
The Writer first writes its value, then waits for all parties to arrive to the barrier (in other words, it waits for the Reader to read the value).
The Reader first waits for all parties to arrive to the barrier (in other words, it waits for the Writer to write a new value), only then reads the value.
In this scheme, the count value's visibility is enforced by the CyclicBarrier, so you do not even need an AtomicInteger here. More specifically:
Actions in a thread prior to calling await() happen-before [...]
actions following a successful return from the corresponding await()
in other threads.
Oh, and the isStopped also does not need an AtomicBoolean, a volatile is enough. But it will work either way. Sorry, I understand this was supposed to be a task to practice atomics, but they're not a good tool if you need the threads to wait for each other.
Footnote: The mechanism above is still not exactly correct when you remove the sleep() calls. The reason for that is that once released, the Reader races with the Writer on the next loop iteration. To fix that, the Writer must wait for the previous Reader to finish and the Reader must wait for its Writer to finish. This can be achiever by using a second barrier or perhaps a Phaser which I intentionally did not use in the example above as it is more advanced and you need to learn CyclicBarriers and CountDownLatches before moving on to Phasers. Also the shutdown mechanism needs to be tuned. Good luck!
EDIT: I actually wrote the no-sleep() double-Phaser solution and found out that it is much easier to read (if you do not care about long-running-task interruption which you normally should!) and much faster than an equivalent CyclicBarrier solution. So we both learned something today. Here it is:
public class AutomicTest {
private volatile boolean isStopped = false;
private final Phaser valueWritten = new Phaser(2);
private final Phaser valueRead = new Phaser(2);
private int count = 0;
public static void main(String[] args) throws InterruptedException {
AutomicTest test = new AutomicTest();
Thread writerThread = new Thread(() -> {
while (!test.isStopped) {
// wait for the previous value to be read
test.valueRead.arriveAndAwaitAdvance();
test.count++;
// acknowledge the write
test.valueWritten.arrive();
}
});
Thread readerThread = new Thread(() -> {
while (!test.isStopped) {
// wait for the value to be written
test.valueWritten.arriveAndAwaitAdvance();
System.out.println("Counter: " + test.count);
// acknowledge the read
test.valueRead.arrive();
}
});
writerThread.start();
readerThread.start();
test.valueRead.arrive(); // start the writer
Thread.sleep(4000);
test.isStopped = true;
test.valueRead.forceTermination();
test.valueWritten.forceTermination();
writerThread.join();
readerThread.join();
}
}

Multithreading issue in Java, different results at runtime

Whenever I run this program it gives me different result. Can someone explain to me, or give me some topics where I could find answer in order to understand what happens in the code?
class IntCell {
private int n = 0;
public int getN() {return n;}
public void setN(int n) {this.n = n;}
}
public class Count extends Thread {
static IntCell n = new IntCell();
public void run() {
int temp;
for (int i = 0; i < 200000; i++) {
temp = n.getN();
n.setN(temp + 1);
}
}
public static void main(String[] args) {
Count p = new Count();
Count q = new Count();
p.start();
q.start();
try { p.join(); q.join(); }
catch (InterruptedException e) { }
System.out.println("The value of n is " + n.getN());
}
}
The reason is simple: you don't get and modify your counter atomically such that your code is prone to race condition issues.
Here is an example that illustrates the problem:
Thread #1 calls n.getN() gets 0
Thread #2 calls n.getN() gets 0
Thread #1 calls n.setN(1) to set n to 1
Thread #2 is not aware that thread #1 has already set n to 1 so still calls n.setN(1) to set n to 1 instead of 2 as you would expect, this is called a race condition issue.
Your final result would then depend on the total amount of race condition issues met while executing your code which is unpredictable so it changes from one test to another.
One way to fix it, is to get and set your counter in a synchronized block in order to do it atomically as next, indeed it will enforce the threads to acquire an exclusive lock on the instance of IntCell assigned to n before being able to execute this section of code.
synchronized (n) {
temp = n.getN();
n.setN(temp + 1);
}
Output:
The value of n is 400000
You could also consider using AtomicInteger instead of int for your counter in order to rely on methods of type addAndGet(int delta) or incrementAndGet() to increment your counter atomically.
The access to the IntCell n static variable is concurrent between your two threads :
static IntCell n = new IntCell();
public void run() {
int temp;
for (int i = 0; i < 200000; i++) {
temp = n.getN();
n.setN(temp + 1);
}
}
Race conditions make that you cannot have a predictable behavior when n.setN(temp + 1); is performed as it depends on which thread has previously called :temp = n.getN();.
If it the current thread, you have the value put by the thread otherwise you have the last value put by the other thread.
You could add synchronization mechanism to avoid the problem of unexpected behavior.
You are running 2 threads in parallel and updating a shared variable by these 2 threads, that is why your answer is always different. It is not a good practice to update shared variable like this.
To understand, you should first understand Multithreading and then notify and wait, simple cases
You modify the same number n with two concurrent Threads. If Thread1 reads n = 2, then Thread2 reads n = 2 before Thread2 has written the increment, Thread1 will increment n to 3, but Thread2 will no more increment, but write another "3" to n. If Thread1 finishes its incrementation before Thread2 reads, both will increment.
Now both Threads are concurrent and you can never tell which one will get what CPU cycle. This depends on what else runs on your machine. So You will always lose a different number of incrementations by the above mentioned overwriting situation.
To solve it, run real incrementations on n via n++. They go in a single CPU cycle.

Is synchronization better option for multithreading shared resources?

public class MyResource {
private int count = 0;
void increment() {
count++;
}
void insert() { // incrementing shared resource count
for (int i = 0; i < 100000000; i++) {
increment();
}
}
void insert1() { //incrementing shared resource count
for (int i = 0; i < 100000000; i++) {
increment();
}
}
void startThread() {
Thread t1 = new Thread(new Runnable() { //thread incrementing count using insert()
#Override
public void run() {
insert();
}
});
Thread t2 = new Thread(new Runnable() { //thread incrementing count using insert1()
#Override
public void run() {
insert1();
}
});
t1.start();
t2.start();
try {
t1.join(); //t1 and t2 race to increment count by telling current thread to wait
t2.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
void entry() {
long start = System.currentTimeMillis();
startThread(); //commenting insert(); insert1() gives output as time taken = 452(approx) 110318544 (obvious)
// insert(); insert1(); //commenting startThread() gives output as time taken = 452(approx) 200000000
long end = System.currentTimeMillis();
long time = end - start;
System.out.println("time taken = " + time);
System.out.println(count);
}
}
Program entry point is from entry() method.
1.Only using insert(); insert1(); (Normal method calling ) and commenting startThread()(which executes thread) gives me result as shown in code.
2.Now commenting insert(); insert1(); and using startThread()(which executes thread) gives me result as shown in code.
3.Now I synchronize increment() gives me output as time taken = 35738 200000000
As Above synchronizing avoids access of shared resource but on other hand it takes lot of time to process.
So what's use of this synchronizing if it decrease the performance ?
Sometimes you just want two or more things to go on at the same time. Imagine the server of a chat application or a program that updates the GUI while a long task is running to let the user know that processing is going on
You are not suppose to use synchronization to increase performance, you are suppose to use it in order to protect shared resources.
Is this a real code example? Because if you want to use threads here in order to split the work synchronize
increment()
is not the best approach...
EDIT
as described here, you can change the design of this specific code to divide the work between the 2 threads more efficiently.
i altered their example to fit your needs, but all the methods described there are good.
import java.util.*;
import java.util.concurrent.*;
import static java.util.Arrays.asList;
public class Sums {
static class Counter implements Callable<Long> {
private final long _limit;
Counter(long limit) {
_limit = limit;
}
#Override
public Long call() {
long counter = 0;
for (long i = 0; i <= _limit; i++) {
counter++
}
return counter;
}
}
public static void main(String[] args) throws Exception {
int counter = 0;
ExecutorService executor = Executors.newFixedThreadPool(2);
List <Future<Long>> results = executor.invokeAll(asList(
new Counter(500000), new Counter(500000));
));
executor.shutdown();
for (Future<Long> result : results) {
counter += result.get();
}
}
}
and if you must use synchronisation, AtomicLong will do a better job.
Performance is not the only factor. Correctness can also be very important. Here is another question that has some low level details about the keyword synchronized.
If you are looking for performance, consider using the java.util.concurrent.atomic.AtomicLong class. It has been optimized for fast, atomic access.
EDIT:
Synchonized is overkill in this use case. Synchronized would be much more useful for FileIO or NetworkIO where the calls are much longer and correctness is much more important. Here is the source code for AtomicLong. Volatile was chosen because it is much more performant for short calls that change shared memory.
Adding a synchronized keyword adds in extra java bytecode that does a lot of checking for the right state to get the lock safely. Volatile will put the data in main memory, which takes longer to access, but the CPU enforces atomic access instead of the jvm generating extra code under the hood.

Concurrent checking if collection is empty

I have this piece of code:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
#Override
public void run(){
while(!intervals.isEmpty()){
//remove one interval
//do calculations
//add some intervals
}
}
This code is being executed by a specific number of threads at the same time. As you see, loop should go on until there are no more intervals left in the collection, but there is a problem. In the beginning of each iteration an interval gets removed from collection and in the end some number of intervals might get added back into same collection.
Problem is, that while one thread is inside the loop the collection might become empty, so other threads that are trying to enter the loop won't be able to do that and will finish their work prematurely, even though collection might be filled with values after the first thread will finish the iteration. I want the thread count to remain constant (or not more than some number n) until all work is really finished.
That means that no threads are currently working in the loop and there are no elements left in the collection. What are possible ways of accomplishing that? Any ideas are welcomed.
One way to solve this problem in my specific case is to give every thread a different piece of the original collection. But after one thread would finish its work it wouldn't be used by the program anymore, even though it could help other threads with their calculations, so I don't like this solution, because it's important to utilize all cores of the machine in my problem.
This is the simplest minimal working example I could come up with. It might be to lengthy.
public class Test{
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
private int threadNumber;
private Thread[] threads;
private double result;
public Test(int threadNumber){
intervals.add(new Interval(0, 1));
this.threadNumber = threadNumber;
threads = new Thread[threadNumber];
}
public double find(){
for(int i = 0; i < threadNumber; i++){
threads[i] = new Thread(new Finder());
threads[i].start();
}
try{
for(int i = 0; i < threadNumber; i++){
threads[i].join();
}
}
catch(InterruptedException e){
System.err.println(e);
}
return result;
}
private class Finder implements Runnable{
#Override
public void run(){
while(!intervals.isEmpty()){
Interval interval = intervals.poll();
if(interval.high - interval.low > 1e-6){
double middle = (interval.high + interval.low) / 2;
boolean something = true;
if(something){
intervals.add(new Interval(interval.low + 0.1, middle - 0.1));
intervals.add(new Interval(middle + 0.1, interval.high - 0.1));
}
else{
intervals.add(new Interval(interval.low + 0.1, interval.high - 0.1));
}
}
}
}
}
private class Interval{
double low;
double high;
public Interval(double low, double high){
this.low = low;
this.high = high;
}
}
}
What you might need to know about the program: After every iteration interval should either disappear (because it's too small), become smaller or split into two smaller intervals. Work is finished after no intervals are left. Also, I should be able to limit number of threads that are doing this work with some number n. The actual program looks for a maximum value of some function by dividing the intervals and throwing away the parts of those intervals that can't contain the maximum value using some rules, but this shouldn't really be relevant to my problem.
The CompletableFuture class is also an interesting solution for these kind of tasks.
It automatically distributes workload over a number of worker threads.
static CompletableFuture<Integer> fibonacci(int n) {
if(n < 2) return CompletableFuture.completedFuture(n);
else {
return CompletableFuture.supplyAsync(() -> {
System.out.println(Thread.currentThread());
CompletableFuture<Integer> f1 = fibonacci(n - 1);
CompletableFuture<Integer> f2 = fibonacci(n - 2);
return f1.thenCombineAsync(f2, (a, b) -> a + b);
}).thenComposeAsync(f -> f);
}
}
public static void main(String[] args) throws Exception {
int fib = fibonacci(10).get();
System.out.println(fib);
}
You can use atomic flag, i.e.:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue<>();
private AtomicBoolean inUse = new AtomicBoolean();
#Override
public void run() {
while (!intervals.isEmpty() && inUse.compareAndSet(false, true)) {
// work
inUse.set(false);
}
}
UPD
Question has been updated, so I would give you better solution. It is more "classic" solution using blocking queue;
private BlockingQueue<Interval> intervals = new ArrayBlockingQueue<Object>();
private volatile boolean finished = false;
#Override
public void run() {
try {
while (!finished) {
Interval next = intervals.take();
// put work there
// after you decide work is finished just set finished = true
intervals.put(interval); // anyway, return interval to queue
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
UPD2
Now it seems better to re-write solution and divide range to sub-ranges for each thread.
Your problem looks like a recursive one - processing one task (interval) might produce some sub-tasks (sub intervals).
For that purpose I would use ForkJoinPool and RecursiveTask:
class Interval {
...
}
class IntervalAction extends RecursiveAction {
private Interval interval;
private IntervalAction(Interval interval) {
this.interval = interval;
}
#Override
protected void compute() {
if (...) {
// we need two sub-tasks
IntervalAction sub1 = new IntervalAction(new Interval(...));
IntervalAction sub2 = new IntervalAction(new Interval(...));
sub1.fork();
sub2.fork();
sub1.join();
sub2.join();
} else if (...) {
// we need just one sub-task
IntervalAction sub3 = new IntervalAction(new Interval(...));
sub3.fork();
sub3.join();
} else {
// current task doesn't need any sub-tasks, just return
}
}
}
public static void compute(Interval initial) {
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new IntervalAction(initial));
// invoke will return when all the processing is completed
}
I had the same problem, and I tested the following solution.
In my test example I have a queue (the equivalent of your intervals) filled with integers. For the test, at each iteration one number is taken from the queue, incremented and placed back in the queue if the new value is below 7 (arbitrary). This has the same impact as your interval generation on the mechanism.
Here is an example working code (Note that I develop in java 1.8 and I use the Executor framework to handle my thread pool.) :
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
public class Test {
final int numberOfThreads;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
final BlockingQueue<Integer> sleepingThreadsTokens;
final ThreadPoolExecutor executor;
public static void main(String[] args) {
final Test test = new Test(2); // arbitrary number of thread => 2
test.launch();
}
private Test(int numberOfThreads){
this.numberOfThreads = numberOfThreads;
this.queue = new PriorityBlockingQueue<Integer>();
this.availableThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.sleepingThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
}
public void launch() {
// put some elements in queue at the beginning
queue.add(1);
queue.add(2);
queue.add(3);
for(int i = 0; i < numberOfThreads; i++){
availableThreadsTokens.add(1);
}
System.out.println("Start");
boolean algorithmIsFinished = false;
while(!algorithmIsFinished){
if(sleepingThreadsTokens.size() != numberOfThreads){
try {
availableThreadsTokens.take();
} catch (final InterruptedException e) {
e.printStackTrace();
// some treatment should be put there in case of failure
break;
}
if(!queue.isEmpty()){ // Continuation condition
sleepingThreadsTokens.drainTo(availableThreadsTokens);
executor.submit(new Loop(queue.poll(), queue, availableThreadsTokens));
}
else{
sleepingThreadsTokens.add(1);
}
}
else{
algorithmIsFinished = true;
}
}
executor.shutdown();
System.out.println("Finished");
}
public static class Loop implements Runnable{
int element;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
public Loop(Integer element, BlockingQueue<Integer> queue, BlockingQueue<Integer> availableThreadsTokens){
this.element = element;
this.queue = queue;
this.availableThreadsTokens = availableThreadsTokens;
}
#Override
public void run(){
System.out.println("taking element "+element);
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
if(element < 7){
this.queue.add(element+1);
System.out.println("Inserted element"+(element + 1));
}
else{
System.out.println("no insertion");
}
this.availableThreadsTokens.offer(1);
}
}
}
I ran this code for check, and it seems to work properly. However there are certainly some improvement that can be made :
sleepingThreadsTokens do not have to be a BlockingQueue, since only the main accesses it. I used this interface because it allowed a nice sleepingThreadsTokens.drainTo(availableThreadsTokens);
I'm not sure whether queue has to be blocking or not, since only main takes from it and does not wait for elements (it waits only for tokens).
...
The idea is that the main thread checks for the termination, and for this it has to know how many threads are currently working (so that it does not prematurely stops the algorithm because the queue is empty). To do so two specific queues are created : availableThreadsTokens and sleepingThreadsTokens. Each element in availableThreadsTokens symbolizes a thread that have finished an iteration, and wait to be given another one. Each element in sleepingThreadsTokens symbolizes a thread that was available to take a new iteration, but the queue was empty, so it had no job and went to "sleep". So at each moment availableThreadsTokens.size() + sleepingThreadsTokens.size() = numberOfThreads - threadExcecutingIteration.
Note that the elements on availableThreadsTokens and sleepingThreadsTokens only symbolizes thread activity, they are not thread nor design a specific thread.
Case of termination : let suppose we have N threads (aribtrary, fixed number). The N threads are waiting for work (N tokens in availableThreadsTokens), there is only 1 remaining element in the queue and the treatment of this element won't generate any other element. Main takes the first token, finds that the queue is not empty, poll the element and sends the thread to work. The N-1 next tokens are consumed one by one, and since the queue is empty the token are moved into sleepingThreadsTokens one by one. Main knows that there is 1 thread working in the loop since there is no token in availableThreadsTokens and only N-1 in sleepingThreadsTokens, so it waits (.take()). When the thread finishes and releases the token Main consumes it, discovers that the queue is now empty and put the last token in sleepingThreadsTokens. Since all tokens are now in sleepingThreadsTokens Main knows that 1) all threads are inactive 2) the queue is empty (else the last token wouldn't have been transferred to sleepingThreadsTokens since the thread would have take the job).
Note that if the working thread finishes the treatment before all the availableThreadsTokens are moved to sleepingThreadsTokens it makes no difference.
Now if we suppose that the treatment of the last element would have generated M new elements in the queue then the Main would have put all the tokens from sleepingThreadsTokens back to availableThreadsTokens, and start to assign them treatments again. We put all the token back even if M < N because we don't know how much elements will be inserted in the future, so we have to keep all the thread available.
I would suggest a master/worker approach then.
The master process goes through the intervals and assigns the calculations of that interval to a different process. It also removes/adds as necessary. This way, all the cores are utilized, and only when all intervals are finished, the process is done. This is also known as dynamic work allocation.
A possible example:
public void run(){
while(!intervals.isEmpty()){
//remove one interval
Thread t = new Thread(new Runnable()
{
//do calculations
});
t.run();
//add some intervals
}
}
The possible solution you provided is known as static allocation, and you're correct, it will finish as fast as the slowest processor, but the dynamic approach will utilize all memory.
I've run into this problem as well. The way I solved it was to use an AtomicInteger to know what is in the queue. Before each offer() increment the integer. After each poll() decrement the integer. The CLQ has no real isEmpty() since it must look at head/tail nodes and this can change atomically (CAS).
This doesn't guarantee 100% that some thread may increment after another thread decrements so you need to check again before ending the thread. It is better than relying on while(...isEmpty())
Other than that, you may need to synchronize.

Categories