To simplify my case, let's assume that I'm implementing a Binary Search using Java's Fork-Join framework. My goal is to find a specific integer value (the target integer) in an array of integers. This can be done by breaking the array by half until it's small enough to perform a serial search. The result of the algorithm needs to be a boolean value indicating whether the target integer was found in the array or not.
A similar problem is explored in Klaus Kreft's presentation in slide 28 onward. However, Kreft's goal is to find the largest number in the array so all entries have to be scanned. In my case, it is not necessary to scan the full array because once the target integer was found, the search can be stopped.
My problem is that once I encounter the target integer many tasks have already been inserted to the thread pool and I need to cancel them since there is no point in continuing the search. I tried to call getPool().terminate() from inside a RecursiveTask but that didn't help much since many tasks are already queued and I even noticed that new onces are queued too even after shutdown was called..
My current solution is to use a static volatile boolean that is initiated as 'false' and to check its value at the beginning of the task. If it's still 'false' then the task begins its works, if it's 'true', the task immediately returns. I can actually use a RecursiveAction for that.
So I think that this solution should work, but I wonder if the framework offers some standard way of handling cases like that - i.e. defining a stop condition to the recursion that consequently cancels all queued tasks.
Note that if I want to stop all running tasks immediately when the target integer was found (by one of the running tasks) I have to check the boolean after each line in these tasks and that can affect performance since the value of that boolean cannot be cached (it's defined as volatile).
So indeed, I think that some standard solution is needed and can be provided in the form of clearing the queue and interuppting the running tasks. But I haven't found such a solution and I wonder if anyone else knows about it or has a better idea.
Thank you for your time,
Assaf
EDIT: here is my testing code:
package xxx;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
public class ForkJoinTest {
static final int ARRAY_SIZE = 1000;
static final int THRESHOLD = 10;
static final int MIN_VALUE = 0;
static final int MAX_VALUE = 100;
static Random rand = new Random();
// a function for retrieving a random int in a specific range
public static int randInt(int min, int max) {
return rand.nextInt((max - min) + 1) + min;
}
static volatile boolean result = false;
static int[] array = new int[ARRAY_SIZE];
static int target;
#SuppressWarnings("serial")
static class MyAction extends RecursiveAction {
int startIndex, endIndex;
public MyAction(int startIndex, int endIndex) {
this.startIndex = startIndex;
this.endIndex = endIndex;
}
// if the target integer was not found yet: we first check whether
// the entries to search are too few. In that case, we perform a
// sequential search and update the result if the target was found.
// Otherwise, we break the search into two parts and invoke the
// search in these two tasks.
#Override
protected void compute() {
if (!result) {
if (endIndex-startIndex<THRESHOLD) {
//
for (int i=startIndex ; i<endIndex ; i++) {
if (array[i]==target) {
result = true;
}
}
} else {
int middleIndex = (startIndex + endIndex) / 2;
RecursiveAction action1 = new MyAction(startIndex, middleIndex);
RecursiveAction action2 = new MyAction(middleIndex+1, endIndex);
invokeAll(Arrays.asList(action1,action2));
}
}
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
for (int i=0 ; i<ARRAY_SIZE ; i++) {
array[i] = randInt(MIN_VALUE, MAX_VALUE);
}
target = randInt(MIN_VALUE, MAX_VALUE);
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new MyAction(0,ARRAY_SIZE));
System.out.println(result);
}
}
I think you may be inventing a barrier to the correct solution.
You say that your boolean stop flag must be volatile and so will interfere with the speed of the solution - well, yes and no - accessing a volatile does indeed do cache flushing but have you considered an AtomicBoolean?
I believe the correct solution is to use an AtomicBoolean flag to get all processes to stop. You should check is in as finely grained fashion as is reasonable to get your system to stop quickly.
It would be a mistake to attempt to clear all queues and interrupt all threads - this would lead to a horrible mess.
static AtomicBoolean finished = new AtomicBoolean();
....
protected void compute() {
if (!finished.get()) {
if (endIndex - startIndex < THRESHOLD) {
//
for (int i = startIndex; i < endIndex && !finished.get(); i++) {
if (array[i] == target) {
finished.set(true);
System.out.print("Found at " + i);
}
}
} else {
...
}
}
}
I left a comment above on how to do this by looking at an open source product that does this in many built-in-functions. Let me put some detail here.
If you want to cancel tasks that are beginning or are currently executing, then each task needs to know about every other task. When one task finds what it wants, that task need to inform every other task to stop. You cannot do this with dyadic recursive division (RecursiveTask, etc.) since you create new tasks recursively and the old tasks will never know about the new ones. I’m sure you could pass a reference to a stop-me field to each new task, but it will get very messy and debugging would be “interesting.”
You can do this with Java8 CountedCompleter(). The framework was butchered to support this class so things that should be done by the framework needs doing manually, but it can work.
Each task needs a volatile boolean and a method to set it to true. Each tasks need an array of references to all the other tasks. Create all the tasks up front, each with an empty array of to-be references to the other tasks. Fill in the array of references to every other task. Now submit each task (see the doc for this class, fork() addPendingCount() etc.)
When one tasks finds what it wants, it uses the array of references to the other tasks to set their boolean to true. If there is a race condition with multiple threads, it doesn’t matter since all threads set “true.” You will also need to handle tryComplete(), onCompletion() etc. This class is very muddled. It is used for the Java8 stream processing which is a story in itself.
What you cannot do is purge pending tasks from the deques before they begin. You need to wait until the task starts and check the boolean for true. If the execution is lengthy, then you may also want to check the boolean for true periodically. The overhead of a volatile read is not that bad and there really is no other way.
Related
TL;DR: When several CompletableFutures are waiting to get executed, how can I prioritize those whose values i'm interested in?
I have a list of 10,000 CompletableFutures (which calculate the data rows for an internal report over the product database):
List<Product> products = ...;
List<CompletableFuture<DataRow>> dataRows = products
.stream()
.map(p -> CompletableFuture.supplyAsync(() -> calculateDataRowForProduct(p), singleThreadedExecutor))
.collect(Collectors.toList());
Each takes around 50ms to complete, so the entire thing finishes in 500sec. (they all share the same DB connection, so cannot run in parallel).
Let's say I want to access the data row of the 9000th product:
dataRows.get(9000).join()
The problem is, all these CompletableFutures are executed in the order they have been created, not in the order they are accessed. Which means I have to wait 450sec for it to calculate stuff that at the moment I don't care about, to finally get to the data row I want.
Question:
Is there any way to change this behaviour, so that the Futures I try to access get priority over those I don't care about at the moment?
First thoughts:
I noticed that a ThreadPoolExecutor uses a BlockingQueue<Runnable> to queue up entries waiting for an available Thread.
So I thought about using a PriorityBlockingQueue, to change the priority of the Runnable when I access its CompletableFuture but:
PriorityBlockingQueue does not have a method to reprioritize an existing element, and
I need to figure out a way to get from the CompletableFuture to the corresponding Runnable entry in the queue.
Before I go further down this road, do you think this sounds like the correct approach. Do others ever had this kind of requirement? I tried to search for it, but found exactly nothing. Maybe CompletableFuture is not the correct way of doing this?
Background:
We have an internal report which displays 100 products per page. Initially we precalculated all DataRows for the report, which took way to long if someone has that many products.
So first optimization was to wrap the calculation in a memoized supplier:
List<Supplier<DataRow>> dataRows = products
.stream()
.map(p -> Suppliers.memoize(() -> calculateDataRowForProduct(p)))
.collect(Collectors.toList());
This means that initial display of first 100 entries now takes 5sec instead of 500sec (which is great), but when the user switches to the next pages, it takes another 5sec for each single one of them.
So the idea is, while the user is staring at the first screen, why not precalculate the next pages in the background. Which leads me to my question above.
Interesting problem :)
One way is to roll out custom FutureTask class to facilitate changing priorities of tasks dynamically.
DataRow and Product are both taken as just String here for simplicity.
import java.util.*;
import java.util.concurrent.*;
public class Testing {
private static String calculateDataRowForProduct(String product) {
try {
// Dummy operation.
Thread.sleep(200);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Computation done for " + product);
return "data row for " + product;
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
PriorityBlockingQueue<Runnable> customQueue = new PriorityBlockingQueue<Runnable>(1, new CustomRunnableComparator());
ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, customQueue);
List<String> products = new ArrayList<>();
for (int i = 0; i < 10; i++) {
products.add("product" + i);
}
Map<Integer, PrioritizedFutureTask<String>> taskIndexMap = new HashMap<>();
for (int i = 0; i < products.size(); i++) {
String product = products.get(i);
Callable callable = () -> calculateDataRowForProduct(product);
PrioritizedFutureTask<String> dataRowFutureTask = new PrioritizedFutureTask<>(callable, i);
taskIndexMap.put(i, dataRowFutureTask);
executor.execute(dataRowFutureTask);
}
List<Integer> accessOrder = new ArrayList<>();
accessOrder.add(4);
accessOrder.add(7);
accessOrder.add(2);
accessOrder.add(9);
int priority = -1 * accessOrder.size();
for (Integer nextIndex : accessOrder) {
PrioritizedFutureTask taskAtIndex = taskIndexMap.get(nextIndex);
assert (customQueue.remove(taskAtIndex));
customQueue.offer(taskAtIndex.set_priority(priority++));
// Now this task will be at the front of the thread pool queue.
// Hence this task will execute next.
}
for (Integer nextIndex : accessOrder) {
PrioritizedFutureTask<String> dataRowFutureTask = taskIndexMap.get(nextIndex);
String dataRow = dataRowFutureTask.get();
System.out.println("Data row for index " + nextIndex + " = " + dataRow);
}
}
}
class PrioritizedFutureTask<T> extends FutureTask<T> implements Comparable<PrioritizedFutureTask<T>> {
private Integer _priority = 0;
private Callable<T> callable;
public PrioritizedFutureTask(Callable<T> callable, Integer priority) {
super(callable);
this.callable = callable;
_priority = priority;
}
public Integer get_priority() {
return _priority;
}
public PrioritizedFutureTask set_priority(Integer priority) {
_priority = priority;
return this;
}
#Override
public int compareTo(#NotNull PrioritizedFutureTask<T> other) {
if (other == null) {
throw new NullPointerException();
}
return get_priority().compareTo(other.get_priority());
}
}
class CustomRunnableComparator implements Comparator<Runnable> {
#Override
public int compare(Runnable task1, Runnable task2) {
return ((PrioritizedFutureTask)task1).compareTo((PrioritizedFutureTask)task2);
}
}
Output:
Computation done for product0
Computation done for product4
Data row for index 4 = data row for product4
Computation done for product7
Data row for index 7 = data row for product7
Computation done for product2
Data row for index 2 = data row for product2
Computation done for product9
Data row for index 9 = data row for product9
Computation done for product1
Computation done for product3
Computation done for product5
Computation done for product6
Computation done for product8
There is one more scope of optimization here.
The customQueue.remove(taskAtIndex) operation has O(n) time complexity with respect to the size of the queue (or the total number of products).
It might not affect much if the number of products is less (<= 10^5).
But it might result in a performance issue otherwise.
One solution to that is to extend BlockingPriorityQueue and roll out functionality to remove an element from a priority queue in O(logn) rather than O(n).
We can achieve that by keeping a hashmap inside the PriorityQueue structure. This hashmap will keep a count of elements vs the index (or indices in case of duplicates) of that element in the underlying array.
Fortunately, I had already implemented such a heap in Python sometime back.
If you have more questions on this optimization, its probably better to ask a new question altogether.
You could avoid submitting all of the tasks to the executor at the start, instead only submit one background task and when it finishes submit the next. If you want to get the 9000th row submit it immediately (if it has not already been submitted):
static class FutureDataRow {
CompletableFuture<DataRow> future;
int index;
List<FutureDataRow> list;
Product product;
FutureDataRow(List<FutureDataRow> list, Product product){
this.list = list;
index = list.size();
list.add(this);
this.product = product;
}
public DataRow get(){
submit();
return future.join();
}
private synchronized void submit(){
if(future == null) future = CompletableFuture.supplyAsync(() ->
calculateDataRowForProduct(product), singleThreadedExecutor);
}
private void background(){
submit();
if(index >= list.size() - 1) return;
future.whenComplete((dr, t) -> list.get(index + 1).background());
}
}
...
List<FutureDataRow> dataRows = new ArrayList<>();
products.forEach(p -> new FutureDataRow(dataRows, p));
dataRows.get(0).background();
If you want you could also submit the next row inside the get method if you expect that they will navigate to the next page afterwards.
If you were instead using a multithreaded executor and you wanted to run multiple background tasks concurrently you could modify the background method to find the next unsubmitted task in the list and start it when the current background task has finished.
private synchronized boolean background(){
if(future != null) return false;
submit();
future.whenComplete((dr, t) -> {
for(int i = index + 1; i < list.size(); i++){
if(list.get(i).background()) return;
}
});
return true;
}
You would also need to start the first n tasks in the background instead of just the first one.
int n = 8; //number of active background tasks
for(int i = 0; i < dataRows.size() && n > 0; i++){
if(dataRows.get(i).background()) n--;
}
To answer my own question...
There is a surprisingly simple (and surprisingly boring) solution to my problem. I have no idea why it took me three days to find it, I guess it required the right mindset, that you only have when walking along an endless tranquilizing beach looking into the sunset on a quiet Sunday evening.
So, ah, it's a little bit embarrassing to write this, but when I need to fetch a certain value (say for 9000th product), and the future has not yet computed that value, I can, instead of somehow forcing the future to produce that value asap (by doing all this repriorisation and scheduling magic), I can, well, I can, ... simply ... compute that value myself! Yes! Wait, what? Seriously, that's it?
It's something like this: if (!future.isDone()) {future.complete(supplier.get());}
I just need to store the original Supplier alongside the CompletableFuture in some wrapper class. This is the wrapper class, which works like a charm, all it needs is a better name:
public static class FuturizedMemoizedSupplier<T> implements Supplier<T> {
private CompletableFuture<T> future;
private Supplier<T> supplier;
public FuturizedSupplier(Supplier<T> supplier) {
this.supplier = supplier;
this.future = CompletableFuture.supplyAsync(supplier, singleThreadExecutor);
}
public T get() {
// if the future is not yet completed, we just calculate the value ourselves, and set it into the future
if (!future.isDone()) {
future.complete(supplier.get());
}
supplier = null;
return future.join();
}
}
Now, I think, there is a small chance for a race condition here, which could lead to the supplier being executed twice. But actually, I don't care, it produces the same value anyway.
Afterthoughts:
I have no idea why I didn't think of this earlier, I was completely fixated on the idea, it has to be the CompletableFuture which calculates the value, and it has to run in one of these background threads, and whatnot, and, well, none of these mattered or were in any way a requirement.
I think this whole question is a classic example of Ask what problem you really want to solve instead of coming up with a half baked broken solution, and ask how to fix that. In the end, I didn't care about CompletableFuture or any of its features at all, it was just the easiest way that came to my mind to run something in the background.
Thanks for your help!
I have this piece of code:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
#Override
public void run(){
while(!intervals.isEmpty()){
//remove one interval
//do calculations
//add some intervals
}
}
This code is being executed by a specific number of threads at the same time. As you see, loop should go on until there are no more intervals left in the collection, but there is a problem. In the beginning of each iteration an interval gets removed from collection and in the end some number of intervals might get added back into same collection.
Problem is, that while one thread is inside the loop the collection might become empty, so other threads that are trying to enter the loop won't be able to do that and will finish their work prematurely, even though collection might be filled with values after the first thread will finish the iteration. I want the thread count to remain constant (or not more than some number n) until all work is really finished.
That means that no threads are currently working in the loop and there are no elements left in the collection. What are possible ways of accomplishing that? Any ideas are welcomed.
One way to solve this problem in my specific case is to give every thread a different piece of the original collection. But after one thread would finish its work it wouldn't be used by the program anymore, even though it could help other threads with their calculations, so I don't like this solution, because it's important to utilize all cores of the machine in my problem.
This is the simplest minimal working example I could come up with. It might be to lengthy.
public class Test{
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
private int threadNumber;
private Thread[] threads;
private double result;
public Test(int threadNumber){
intervals.add(new Interval(0, 1));
this.threadNumber = threadNumber;
threads = new Thread[threadNumber];
}
public double find(){
for(int i = 0; i < threadNumber; i++){
threads[i] = new Thread(new Finder());
threads[i].start();
}
try{
for(int i = 0; i < threadNumber; i++){
threads[i].join();
}
}
catch(InterruptedException e){
System.err.println(e);
}
return result;
}
private class Finder implements Runnable{
#Override
public void run(){
while(!intervals.isEmpty()){
Interval interval = intervals.poll();
if(interval.high - interval.low > 1e-6){
double middle = (interval.high + interval.low) / 2;
boolean something = true;
if(something){
intervals.add(new Interval(interval.low + 0.1, middle - 0.1));
intervals.add(new Interval(middle + 0.1, interval.high - 0.1));
}
else{
intervals.add(new Interval(interval.low + 0.1, interval.high - 0.1));
}
}
}
}
}
private class Interval{
double low;
double high;
public Interval(double low, double high){
this.low = low;
this.high = high;
}
}
}
What you might need to know about the program: After every iteration interval should either disappear (because it's too small), become smaller or split into two smaller intervals. Work is finished after no intervals are left. Also, I should be able to limit number of threads that are doing this work with some number n. The actual program looks for a maximum value of some function by dividing the intervals and throwing away the parts of those intervals that can't contain the maximum value using some rules, but this shouldn't really be relevant to my problem.
The CompletableFuture class is also an interesting solution for these kind of tasks.
It automatically distributes workload over a number of worker threads.
static CompletableFuture<Integer> fibonacci(int n) {
if(n < 2) return CompletableFuture.completedFuture(n);
else {
return CompletableFuture.supplyAsync(() -> {
System.out.println(Thread.currentThread());
CompletableFuture<Integer> f1 = fibonacci(n - 1);
CompletableFuture<Integer> f2 = fibonacci(n - 2);
return f1.thenCombineAsync(f2, (a, b) -> a + b);
}).thenComposeAsync(f -> f);
}
}
public static void main(String[] args) throws Exception {
int fib = fibonacci(10).get();
System.out.println(fib);
}
You can use atomic flag, i.e.:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue<>();
private AtomicBoolean inUse = new AtomicBoolean();
#Override
public void run() {
while (!intervals.isEmpty() && inUse.compareAndSet(false, true)) {
// work
inUse.set(false);
}
}
UPD
Question has been updated, so I would give you better solution. It is more "classic" solution using blocking queue;
private BlockingQueue<Interval> intervals = new ArrayBlockingQueue<Object>();
private volatile boolean finished = false;
#Override
public void run() {
try {
while (!finished) {
Interval next = intervals.take();
// put work there
// after you decide work is finished just set finished = true
intervals.put(interval); // anyway, return interval to queue
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
UPD2
Now it seems better to re-write solution and divide range to sub-ranges for each thread.
Your problem looks like a recursive one - processing one task (interval) might produce some sub-tasks (sub intervals).
For that purpose I would use ForkJoinPool and RecursiveTask:
class Interval {
...
}
class IntervalAction extends RecursiveAction {
private Interval interval;
private IntervalAction(Interval interval) {
this.interval = interval;
}
#Override
protected void compute() {
if (...) {
// we need two sub-tasks
IntervalAction sub1 = new IntervalAction(new Interval(...));
IntervalAction sub2 = new IntervalAction(new Interval(...));
sub1.fork();
sub2.fork();
sub1.join();
sub2.join();
} else if (...) {
// we need just one sub-task
IntervalAction sub3 = new IntervalAction(new Interval(...));
sub3.fork();
sub3.join();
} else {
// current task doesn't need any sub-tasks, just return
}
}
}
public static void compute(Interval initial) {
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new IntervalAction(initial));
// invoke will return when all the processing is completed
}
I had the same problem, and I tested the following solution.
In my test example I have a queue (the equivalent of your intervals) filled with integers. For the test, at each iteration one number is taken from the queue, incremented and placed back in the queue if the new value is below 7 (arbitrary). This has the same impact as your interval generation on the mechanism.
Here is an example working code (Note that I develop in java 1.8 and I use the Executor framework to handle my thread pool.) :
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
public class Test {
final int numberOfThreads;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
final BlockingQueue<Integer> sleepingThreadsTokens;
final ThreadPoolExecutor executor;
public static void main(String[] args) {
final Test test = new Test(2); // arbitrary number of thread => 2
test.launch();
}
private Test(int numberOfThreads){
this.numberOfThreads = numberOfThreads;
this.queue = new PriorityBlockingQueue<Integer>();
this.availableThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.sleepingThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
}
public void launch() {
// put some elements in queue at the beginning
queue.add(1);
queue.add(2);
queue.add(3);
for(int i = 0; i < numberOfThreads; i++){
availableThreadsTokens.add(1);
}
System.out.println("Start");
boolean algorithmIsFinished = false;
while(!algorithmIsFinished){
if(sleepingThreadsTokens.size() != numberOfThreads){
try {
availableThreadsTokens.take();
} catch (final InterruptedException e) {
e.printStackTrace();
// some treatment should be put there in case of failure
break;
}
if(!queue.isEmpty()){ // Continuation condition
sleepingThreadsTokens.drainTo(availableThreadsTokens);
executor.submit(new Loop(queue.poll(), queue, availableThreadsTokens));
}
else{
sleepingThreadsTokens.add(1);
}
}
else{
algorithmIsFinished = true;
}
}
executor.shutdown();
System.out.println("Finished");
}
public static class Loop implements Runnable{
int element;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
public Loop(Integer element, BlockingQueue<Integer> queue, BlockingQueue<Integer> availableThreadsTokens){
this.element = element;
this.queue = queue;
this.availableThreadsTokens = availableThreadsTokens;
}
#Override
public void run(){
System.out.println("taking element "+element);
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
if(element < 7){
this.queue.add(element+1);
System.out.println("Inserted element"+(element + 1));
}
else{
System.out.println("no insertion");
}
this.availableThreadsTokens.offer(1);
}
}
}
I ran this code for check, and it seems to work properly. However there are certainly some improvement that can be made :
sleepingThreadsTokens do not have to be a BlockingQueue, since only the main accesses it. I used this interface because it allowed a nice sleepingThreadsTokens.drainTo(availableThreadsTokens);
I'm not sure whether queue has to be blocking or not, since only main takes from it and does not wait for elements (it waits only for tokens).
...
The idea is that the main thread checks for the termination, and for this it has to know how many threads are currently working (so that it does not prematurely stops the algorithm because the queue is empty). To do so two specific queues are created : availableThreadsTokens and sleepingThreadsTokens. Each element in availableThreadsTokens symbolizes a thread that have finished an iteration, and wait to be given another one. Each element in sleepingThreadsTokens symbolizes a thread that was available to take a new iteration, but the queue was empty, so it had no job and went to "sleep". So at each moment availableThreadsTokens.size() + sleepingThreadsTokens.size() = numberOfThreads - threadExcecutingIteration.
Note that the elements on availableThreadsTokens and sleepingThreadsTokens only symbolizes thread activity, they are not thread nor design a specific thread.
Case of termination : let suppose we have N threads (aribtrary, fixed number). The N threads are waiting for work (N tokens in availableThreadsTokens), there is only 1 remaining element in the queue and the treatment of this element won't generate any other element. Main takes the first token, finds that the queue is not empty, poll the element and sends the thread to work. The N-1 next tokens are consumed one by one, and since the queue is empty the token are moved into sleepingThreadsTokens one by one. Main knows that there is 1 thread working in the loop since there is no token in availableThreadsTokens and only N-1 in sleepingThreadsTokens, so it waits (.take()). When the thread finishes and releases the token Main consumes it, discovers that the queue is now empty and put the last token in sleepingThreadsTokens. Since all tokens are now in sleepingThreadsTokens Main knows that 1) all threads are inactive 2) the queue is empty (else the last token wouldn't have been transferred to sleepingThreadsTokens since the thread would have take the job).
Note that if the working thread finishes the treatment before all the availableThreadsTokens are moved to sleepingThreadsTokens it makes no difference.
Now if we suppose that the treatment of the last element would have generated M new elements in the queue then the Main would have put all the tokens from sleepingThreadsTokens back to availableThreadsTokens, and start to assign them treatments again. We put all the token back even if M < N because we don't know how much elements will be inserted in the future, so we have to keep all the thread available.
I would suggest a master/worker approach then.
The master process goes through the intervals and assigns the calculations of that interval to a different process. It also removes/adds as necessary. This way, all the cores are utilized, and only when all intervals are finished, the process is done. This is also known as dynamic work allocation.
A possible example:
public void run(){
while(!intervals.isEmpty()){
//remove one interval
Thread t = new Thread(new Runnable()
{
//do calculations
});
t.run();
//add some intervals
}
}
The possible solution you provided is known as static allocation, and you're correct, it will finish as fast as the slowest processor, but the dynamic approach will utilize all memory.
I've run into this problem as well. The way I solved it was to use an AtomicInteger to know what is in the queue. Before each offer() increment the integer. After each poll() decrement the integer. The CLQ has no real isEmpty() since it must look at head/tail nodes and this can change atomically (CAS).
This doesn't guarantee 100% that some thread may increment after another thread decrements so you need to check again before ending the thread. It is better than relying on while(...isEmpty())
Other than that, you may need to synchronize.
I am learning Thread-ing in Java in order to create some program run in parallel. To design programs with parallelism is something I never had a chance to learn back at my school programming class. I know how to create threads and make them run, but I have no idea how to use them efficiently. After all I know it is not actually using threads that makes a program fast but a good parallel design. So I did some experiment to test my knowledge. However, my paralleled version actually runs slower than an unparalleled one. I start to doubt if I really get the idea. If you could be so kind, would you mind having a look my following program:
I made a program to fill an array in a divide-and-conquer fashion (I know Java has a Arrays.fill utility, but I just want to test my knowledge in multithreading):
public class ParalledFill
{
private static fill(final double [] array,
final double value,
final int start,
final int size)
{
if (size > 1000)
{ // Each thread handles at most 1000 elements
Runnable task = new Runnable() { // Fork the task
public void run() {
fill(array, value, start, 1000); // Fill the first 1000 elements
}};
// Create the thread
Thread fork = new Thread(task);
fork.start();
// Fill the rest of the array
fill(array, value, start+1000, size-1000);
// Join the task
try {
fork.join();
}
catch (InterruptedException except)
{
System.err.println(except);
}
}
else
{ // The array is small enough, fill it via a normal loop
for (int i = start; i < size; ++i)
array[i] = value;
}
} // fill
public static void main(String [] args)
{
double [] bigArray = new double[1000*1000];
double value = 3;
fill(bigArray, value, 0, bigArray.length);
}
}
I tested this program, but it turns out to be even slower than just doing something like:
for (int i = 0; i < bigArray.length; ++i)
bigArray[i] = value;
I had my guess, it could be that java does some optimisation for filling an array using a loop which makes it much faster than my threaded version. But other than that, I feel more strongly that my way to handle threads/parallelism could be wrong. I have never designed anything using threads (always relied on compiler optimisation or OpenMP in C). Could anyone help me explain why my paralleled version isn’t faster? Was the program just too bad in terms of designing paralleled program?
Thanks,
Xing.
Unless you have multiple CPUs, or long running tasks like I/O, I'm guessing that all you're doing is time slicing between threads. If there's a single CPU that has so much work to do, adding threads doesn't decrease the work that has to be done. All you end up doing is adding overhead due to context switching.
You ought to read "Java Concurrency In Practice". Better to learn how to do things with the modern concurrency package rather than raw threads.
When is it a good idea to use AtomicReferenceArray? Please explain with an example.
looks like it's functionally equivalent to AtomicReference[], occupying a little less memory though.
So it's useful when you need more than a million atomic references - can't think of any use case.
If you had a shared array of object references, then you would use an AtomicReferenceArray to ensure that the array couldn't be updated simultaneously by different threads i.e. only one element can be updated at a time.
However, in an AtomicReference[] (array of AtomicReference) multiple threads can still update different elements simulateously, because the atomicity is on the elements, not on the array as a whole.
More info here.
It could be useful if you have a large number of objects that are updated concurrently, for example in a large multiplayer game.
An update of reference i would follow the pattern
boolean success = false;
while (!success)
{
E previous = atomicReferenceArray.get(i);
E next = ... // compute updated object
success = atomicReferenceArray.compareAndSet(i, previous, next);
}
Depending on the circumstances this may be faster and/or easier to use than locking (synchronized).
One possible use case would have been ConcurrentHashMap which extensively uses array internally. Array can be volatile but at per element level sematics can't be volatile. it's one of the reason automic array came into existence.
some notes from a C++ programmer below, please don't condemn my Java much :)
AtomicReferenceArray allows to avoid false sharing, when multiple CPU logical cores access the same cache line that is changed by one of the thread. Invalidating and re-fetching the cache is very expensive. Unfortunately there is no sizeof in Java, so we don't know how many bytes each AtomicReference takes, but assuming it's at least 8 bytes (the size of a pointer on 64-bit architectures), you can allocate as follows:
// a lower bound is enough
private final int sizeofAtomicReference = 8;
// good for x86/x64
private final int sizeofCacheLine = 64;
// the number of CPU cores
private final int nLogicalCores = Runtime.getRuntime().availableProcessors();
private final int refsPerCacheLine = (sizeofCacheLine + sizeofAtomicReference - 1) / sizeofAtomicReference;
private AtomicReferenceArray<Task> tasks = new AtomicReferenceArray<Task>(nLogicalCores * refsPerCacheLine);
Now if you assign a task to i-th thread via
tasks.compareAndSet(i*refsPerCacheLine, null, new Task(/*problem definition here*/));
you guarantee that the task references are assigned to different CPU cache lines. Thus there is no expensive false sharing. So the latency of passing tasks from the producer thread to the consumer threads is minimal (for Java, but not for C++/Assembly).
Bonus:
You then poll the tasks array in the worker threads like this:
// consider iWorker is the 0-based index of the logical core this thread is assigned to
final int myIndex = iWorker*refsPerCacheLine;
while(true) {
Task curTask = tasks.get(myIndex);
if(curTask == null) continue;
if(curTask.isTerminator()) {
return; // exit the thread
}
// ... Process the task here ...
// Signal the producer thread that the current worker is free
tasks.set(myIndex, null);
}
import java.util.concurrent.atomic.AtomicReferenceArray;
public class AtomicReferenceArrayExample {
AtomicReferenceArray<String> arr = new AtomicReferenceArray<String>(10);
public static void main(String... args) {
new Thread(new AtomicReferenceArrayExample().new AddThread()).start();
new Thread(new AtomicReferenceArrayExample().new AddThread()).start();
}
class AddThread implements Runnable {
#Override
public void run() {
// Sets value at the index 1
arr.set(0, "A");
// At index 0, if current reference is "A" then it changes as "B".
arr.compareAndSet(0, "A", "B");
// At index 0, if current value is "B", then it is sets as "C".
arr.weakCompareAndSet(0, "B", "C");
System.out.println(arr.get(0));
}
}
}
// Result:
// C
// C
This code should produce even and uneven output because there is no synchronized on any methods. Yet the output on my JVM is always even. I am really confused as this example comes straight out of Doug Lea.
public class TestMethod implements Runnable {
private int index = 0;
public void testThisMethod() {
index++;
index++;
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
Output
Thread[Thread-8,5,main] 135134
Thread[Thread-8,5,main] 135136
Thread[Thread-8,5,main] 135138
Thread[Thread-8,5,main] 135140
Thread[Thread-8,5,main] 135142
Thread[Thread-8,5,main] 135144
I tried with volatile and got the following (with an if to print only if odd):
Thread[Thread-12,5,main] 122229779
Thread[Thread-12,5,main] 122229781
Thread[Thread-12,5,main] 122229783
Thread[Thread-12,5,main] 122229785
Thread[Thread-12,5,main] 122229787
Answer to comments:
the index is infact shared, because we have one TestMethod instance but many Threads that call testThisMethod() on the one TestMethod that we have.
Code (no changes besides the mentioned above):
public class TestMethod implements Runnable {
volatile private int index = 0;
public void testThisMethod() {
index++;
index++;
if(index % 2 != 0){
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
First off all: as others have noted there's no guarantee at all, that your threads do get interrupted between the two increment operations.
Note that printing to System.out pretty likely forces some kind of synchronization on your threads, so your threads are pretty likely to have just started a time slice when they return from that, so they will probably complete the two incrementation operations and then wait for the shared resource for System.out.
Try replacing the System.out.println() with something like this:
int snapshot = index;
if (snapshot % 2 != 0) {
System.out.println("Oh noes! " + snapshot);
}
You don't know that. The point of automatic scheduling is that it makes no guarantees. It might treat two threads that run the same code completely different. Or completely the same. Or completely the same for an hour and then suddenly different...
The point is, even if you fix the problems mentioned in the other answers, you still cannot rely on things coming out a particular way; you must always be prepared for any possible interleaving that the Java memory and threading model allows, and that includes the possibility that the println always happens after an even number of increments, even if that seems unlikely to you on the face of it.
The result is exactly as I would expect. index is being incremented twice between outputs, and there is no interaction between threads.
To turn the question around - why would you expect odd outputs?
EDIT: Whoops. I wrongly assumed a new runnable was being created per Thread, and therefore there was a distinct index per thread, rather than shared. Disturbing how such a flawed answer got 3 upvotes though...
You have not marked index as volatile. This means that the compiler is allowed to optimize accesses to it, and it probably merges your 2 increments to one addition.
You get the output of the very first thread you start, because this thread loops and gives no chance to other threads to run.
So you should Thread.sleep() or (not recommended) Thread.yield() in the loop.