im doing a few concurrency experiments in java.
I have this prime calculation method, which is just for mimicking a semi-expensive operation:
static boolean isprime(int n){
if (n == 1)
return false;
boolean flag = false;
for (int i = 2; i <= n / 2; ++i) {
if (n % i == 0) {
flag = true;
break;
}
}
return ! flag;
}
And then I have this main method, which simply calculates all prime number from 0 to N, and stores results in a array of booleans:
public class Main {
public static void main(String[] args) {
final int N = 100_000;
int T = 1;
boolean[] bool = new boolean[N];
ExecutorService es = Executors.newFixedThreadPool(T);
final int partition = N / T;
long start = System.nanoTime();
for (int j = 0; j < N; j++ ){
boolean res = isprime(j);
bool[j] = res;
}
System.out.println(System.nanoTime()-start);
}
This gives me results like: 893888901 n/s 848995600 n/s
And i also have this drivercode, where I use a executorservice where I use one thread to do the same:
public class Main {
public static void main(String[] args) {
final int N = 100_000;
int T = 1;
boolean[] bool = new boolean[N];
ExecutorService es = Executors.newFixedThreadPool(T);
final int partition = N / T;
long start = System.nanoTime();
for (int i = 0; i < T; i++ ){
final int current = i;
es.execute(new Runnable() {
#Override
public void run() {
for (int j = current*partition; j < current*partition+partition; j++ ){
boolean res = isprime(j);
bool[j] = res;
}
}
});
}
es.shutdown();
try {
es.awaitTermination(1, TimeUnit.MILLISECONDS);
} catch (Exception e){
System.out.println("what?");
}
System.out.println(System.nanoTime()-start);
}
this gives results like: 9523201 n/s , 15485300 n/s.
Now the second example is, as you can read, much faster than the first. I can't really understand why that is? should'nt the exercutorservice thread (with 1 thread) be slower, since it's basically doing the work sequentially + overhead from "awaking" the thread, compared to the main thread?
I was expecting the executorservice to be faster when I started adding multiple threads, but this is a little counterintuitive.
It's the timeout at the bottom of your code. If you set that higher you arrive at pretty similar execution times.
es.awaitTermination(1000, TimeUnit.MILLISECONDS);
The execution times you mention for the first main are much higher than the millisecond you allow the second main to wait for the threads to finish.
Related
I have to write program that finds the sum of a 2D array of int,
I coded every thing as I know and there is no syntax error but when I use someways to check my code the thread is not working at all but sometimes work some of thread not all of them
I put the number 1 to check the summation
and I put lock to make sure not two of thread in same method of summation only for make sure
and the n for see how much time it's join the add method
public class extend extends Thread {
int a, b;
private static int sum = 0;
static int n;
boolean lock;
int[][] arr;
public extend() {
arr = new int[45][45];
for (int i = 0; i < 45; i++) {
for (int j = 0; j < 45; j++)
arr[i][j] = 1;
}
n = 0;
lock = false;
}
public extend(int a, int b) {
arr = new int[45][45];
for (int i = 0; i < 45; i++) {
for (int j = 0; j < 45; j++)
arr[i][j] = 1;
}
n = 0;
lock = false;
this.a = a;
this.b = b;
}
public void run() {
add(a, b);
}
public void add(int st, int e) {
n++;
while (lock) ;
lock = true;
int sums = 0;
synchronized (this) {
for (int i = st; i < e; i++) {
for (int j = 0; j < 45; j++) {
sums += arr[i][j];
}
}
}
sum = sums;
lock = false;
}
public int getSum() {
return sum;
}
public static void main(String[] args) {
long ss = System.currentTimeMillis();
Thread t1 = new Thread(new extend(0, 9));
Thread t2 = new Thread(new extend(9, 18));
Thread t3 = new Thread(new extend(18, 27));
Thread t4 = new Thread(new extend(27, 36));
Thread t5 = new Thread(new extend(36, 45));
t1.start();
t2.start();
t3.start();
t4.start();
t5.start();
long se = System.currentTimeMillis();
System.out.println("The sum for 45*45 array is: " + sum);
System.out.println("time start;" + (se - ss));
System.out.print(n);
}
}
I'm sorry to say, but there's so much wrong with this code, it's hard to point at one problem:
You start your threads, but you don't wait for them to finish using .join()
Extending Thread when you actually meant implementing Runnable
Using busy waiting in your thread with while (true)
Using of static intfor counting
But, if there's only one thing you must fix, wait for your threads:
t1.join();
...
t5.join();
Your lockout of the sum variable may not even result in a speedup while taking into account the overhead of creating Threads, but your main problem is you are not adding sums to sum.
Change:
sum = sums;
to:
sum += sums;
This will make your code work for some of the time. It is not guaranteed to work and will sometimes output weird results like 1620 instead of 2025. You should learn more about how to properly handle multithreading, race conditions, and atomic locks.
I separate the array and sum its parts separately, at the end, adding everything to a single variable using join.
class code Main
int partArray = array.length / THREAD;
int first = 0;
AtomicInteger result = new AtomicInteger(0);
Thread[] thr = new Thread[THREAD];
for(i = 0; i < THREAD; ++i) {
thr[i] = new Thread(new ThreadSum(first, first + partArray, array, result));
thr[i].start();
first += partArray;
}
for(i = 0; i < THREAD; ++i) {
thr[i].join();
}
class code Thread
int first;
int end;
private int[] array;
private AtomicInteger result;
public ThreadSum(int first, int end, int[] array, AtomicInteger result) {
this.first = first;
this.end = end;
this.array = array;
this.result = result;
}
public synchronized void run() {
int sum = 0;
for(int i = first; i < end; ++i) {
sum += array[i];
}
result.getAndAdd(sum);
}
How do I implement this without using join?
Any help guys.
In the end, all the answers and comments from #Tudor and #JBNizet helped me solve the problem. I used CountDownLatch.
CountDownLatch countDownLatch = new CountDownLatch(THREAD);
for(i = 0; i < THREAD; ++i) {
thr[i] = new Thread(new ThreadSum(first, first + partArray, array, result,countDownLatch));
thr[i].start();
first += partArray;
}
countDownLatch.await();
class code Thread
CountDownLatch countDownLatch;
public ThreadSum(int first, int end, int[] array, AtomicInteger result, CountDownLatch countDownLatch) {
this.first = first;
this.end = end;
this.array = array;
this.result = result;
this.countDownLatch = countDownLatch;
}
#Override
public void run() {
int sum = 0;
System.out.println(currentThread().getName());
for(int i = first; i < end; ++i) {
sum += array[i];
}
countDownLatch.countDown();
result.getAndAdd(sum);
}
Here is version of the code using a thread pool that technically fulfills the requirement of not using join():
int partArray = array.length / THREAD;
int first = 0;
AtomicInteger result = new AtomicInteger(0);
ExecutorService threadPool = Executors.newCachedThreadPool();
for(i = 0; i < THREAD; ++i) {
threadPool.execute(new ThreadSum(first, first + partArray, array, result));
first += partArray;
}
threadPool.shutdown();
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
There are a couple of things not ok here:
Using join() is required because otherwise there is no precise point in the program execution where you can safely retrieve the computed sum and know that the parallel summation is finished.
synchronized on the run method is not required because individual array chunks can be summed up in parallel and you are already using AtomicInteger for synchronization.
Suppose the following illustrative example. There is the class B involving some numerical procedures, for example the factorial. The computation runs in a separate thread:
public class B implements Callable <Integer> {
private int n;
public B(int n_) {n = n_;}
public Integer call() {return f();}
public Integer f() {
if (n == 1) return 1;
else {
int fn = 1;
for (int i = n; i > 1; i--) fn *= i;
return fn;
}
}
}
The next class A is using the factorial to evaluate the remainder r = x^n /n!
public class A {
public double rem (double x, int n){
B b = new B(n);
ExecutorService es = Executors.newFixedThreadPool(5);
Future <Integer> nf = es.submit(b); //Factorial
es.submit(()->
{
double r = 1; //Remainder x^n/n
for (int i = 1; i <= n; i++) r = r * x;
try { r = r / nf.get();}
catch (Exception e) {e.printStackTrace();}
return r;
});
return 0;
}
}
How to ensure that rem() function returns the value after the submit() procedure has been finished? Unfortunately, this does not work:
public static void main(String[] args) {
A a = new A();
double r = a.rem(0.5, 10);
}
Is it necessary to run A in another thread and modify A so that:
public class A implements Callable <Double> {
private int n;
private double x;
public A(double x_, int n_) {x = x_; n = n_;}
public Double call() {return rem(x, n);}
....
}
and run A.rem() in a separate thread ?
public static void main(String[] args) {
A a = new A(0.5, 10);
ExecutorService es = Executors.newFixedThreadPool(5);
Future <Double> nf = es.submit(a); //Factorial
double r = nf.get();
}
Is there any simpler solution avoiding two different threads?
Could I ask for a short sample code?
Using Future.get() inside a task submitted to a thread pool is dangerous: current thread is blocked and cannot run other tasks. This may lead to thread starvation - a specific kind of deadlock.
The correct approach is to make acyclic graph where each node is an asynchronous function call of type CompletableFuture, which runs only after all arguments are calculated. Only the general result is extracted using Future.get() called on the main thread.
This is an example of such a graph, made close to what you wanted to implement: first, functions factorial and power run in parallel. As soon as they both complete, function to compute reminder is called.
public static long fact(int n) {
long res = 1;
for (int i = n; i > 1; i--) res *= i;
return res;
}
public static double pow(double base, int pow) {
double r = 1;
for (int i = 0; i < pow; i++) r *= base;
return r;
}
public static double rem(double val1, long val2) {
return val1/val2;
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(5);
double base = 0.5;
int n = 10;
CompletableFuture<Double> f1 = CompletableFuture.supplyAsync(() -> pow(base, n), es);
CompletableFuture<Long> f2 = CompletableFuture.supplyAsync(() -> fact(n), es);
CompletableFuture<Double> f3 = f1.thenCombineAsync(f2, (v1,v2)->rem(v1,v2), es);
double r1 = f3.get();
System.out.println("r1="+r1);
// compare with the result of synchronous execution:
double r2 = rem(pow(base, n), fact(n));
System.out.println("r2="+r2);
}
Callable objects implement the method get which returns the value calculated by the thread - have a look at the following link:
https://blogs.oracle.com/corejavatechtips/using-callable-to-return-results-from-runnables
I am experimenting with techniques for ensuring the visibility of side effects accomplished by concurrent tasks executed using the Java Executor framework.
As a simple scenario, considere an hypothetic problem of matrix multiplication.
Let's say that the matrices to multiply could be considerably large (e.g., few thousands rows and columns) and that to speed up the multiplication of such matrices I implement a concurrent algorithm where the calculation of each cell in the result matrix is considered as an independent (i.e., parallelizable) task.
To simplify a bit, let's ignore that for small input matrices this parallelization may be not such a good idea.
So considere below the first version of my program:
public class MatrixMultiplier {
private final int[][] m;
private final int[][] n;
private volatile int[][] result; //the (lazily computed) result of the matrix multiplication
private final int numberOfMRows; //number of rows in M
private final int numberOfNColumns; //number of columns in N
private final int commonMatrixDimension; //number of columns in M and rows in N
public MatrixMultiplier(int[][] m, int[][] n) {
if(m[0].length != n.length)
throw new IllegalArgumentException("Uncompatible arguments: " + Arrays.toString(m) + " and " + Arrays.toString(n));
this.m = m;
this.n = n;
this.numberOfMRows = m.length;
this.numberOfNColumns = n[0].length;
this.commonMatrixDimension = n.length;
}
public synchronized int[][] multiply() {
if (result == null) {
result = new int[numberOfMRows][numberOfNColumns];
ExecutorService executor = createExecutor();
Collection<Callable<Void>> tasks = new ArrayList<>();
for (int i = 0; i < numberOfMRows; i++) {
final int finalI = i;
for (int j = 0; j < numberOfNColumns; j++) {
final int finalJ = j;
tasks.add(new Callable<Void>() {
#Override
public Void call() throws Exception {
calculateCell(finalI, finalJ);
return null;
}
});
}
}
try {
executor.invokeAll(tasks);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
executor.shutdownNow();
}
}
return result;
}
private ExecutorService createExecutor() {
final int availableProcessors = Runtime.getRuntime().availableProcessors();
final int processorsBound = availableProcessors + 1;
final int maxConcurrency = numberOfMRows * numberOfNColumns;
final int threadPoolSize = maxConcurrency < processorsBound ? maxConcurrency : processorsBound;
return Executors.newFixedThreadPool(threadPoolSize);
}
private void calculateCell(int mRow, int nColumn) {
int sum = 0;
for (int k = 0; k < commonMatrixDimension; k++) {
sum += m[mRow][k] * n[k][nColumn];
}
result[mRow][nColumn] = sum;
}
}
As far as I understand there is a problem with this implementation: some modifications to the result matrix by the executed tasks may not be necessarily visible to the thread invoking multiply().
Assuming the previous is correct, consider the alternative implementation of multiply() relying on explicit locks (the new lock related code is commented with //<LRC>):
public synchronized int[][] multiply() {
if (result == null) {
result = new int[numberOfMRows][numberOfNColumns];
final Lock[][] locks = new Lock[numberOfMRows][numberOfNColumns]; //<LRC>
for (int i = 0; i < numberOfMRows; i++) { //<LRC>
for (int j = 0; j < numberOfNColumns; j++) { //<LRC>
locks[i][j] = new ReentrantLock(); //<LRC>
} //<LRC>
} //<LRC>
ExecutorService executor = createExecutor();
Collection<Callable<Void>> tasks = new ArrayList<>();
for (int i = 0; i < numberOfMRows; i++) {
final int finalI = i;
for (int j = 0; j < numberOfNColumns; j++) {
final int finalJ = j;
tasks.add(new Callable<Void>() {
#Override
public Void call() throws Exception {
try { //<LRC>
locks[finalI][finalJ].lock(); //<LRC>
calculateCell(finalI, finalJ);
} finally { //<LRC>
locks[finalI][finalJ].unlock(); //<LRC>
} //<LRC>
return null;
}
});
}
}
try {
executor.invokeAll(tasks);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
executor.shutdownNow();
}
for (int i = 0; i < numberOfMRows; i++) { //<LRC>
for (int j = 0; j < numberOfNColumns; j++) { //<LRC>
locks[i][j].lock(); //<LRC>
locks[i][j].unlock(); //<LRC>
} //<LRC>
} //<LRC>
}
return result;
}
The usage of explicit locks above has as unique goal to ensure the publication of the changes to the invoking thread, since there is no any possibility of contention.
My main question is if this is a valid solution to the problem of publishing side effects in my scenario.
As a secondary question: is there a more efficient/elegant way to solve this problem ? Please note that I am not looking for alternative algorithm implementations (e.g., the Strassen's algorithm) for parallelizing matrix multiplication, since mine is just a simple case study. I am rather interested on alternatives for ensuring the visibility of changes in an algorithm like the one presented here.
UPDATE
I think the alternative implementation below improves on the previous implementation. It makes use of one single internal lock without affecting much the concurrency:
public class MatrixMultiplier {
...
private final Object internalLock = new Object();
public synchronized int[][] multiply() {
if (result == null) {
result = new int[numberOfMRows][numberOfNColumns];
ExecutorService executor = createExecutor();
Collection<Callable<Void>> tasks = new ArrayList<>();
for (int i = 0; i < numberOfMRows; i++) {
final int finalI = i;
for (int j = 0; j < numberOfNColumns; j++) {
final int finalJ = j;
tasks.add(new Callable<Void>() {
#Override
public Void call() throws Exception {
calculateCell(finalI, finalJ);
synchronized (internalLock){}
return null;
}
});
}
}
try {
executor.invokeAll(tasks);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
executor.shutdownNow();
}
}
synchronized (internalLock){}
return result;
}
...
}
This alternative is just more efficient but both it and the previous implementation which makes use of many locks look correct to me. Are all my observations correct ? Is there a more efficient/elegant way to deal with the synchronization problem in my scenario?
Declaring result as volatile only ensures that changing the reference of result (i.e. result = ...; operations) is visible to everyone.
The most obvious way to resolve this is to eliminate the side effect. In this case this is easy: just make calculateCell() and the Callable invoking it return the value and let the main thread write the values into the array.
You could of course do explicit locking, like you did in your second example but it seems an overkill to use nxm locks when you could use just one lock. Of course one lock would kill the parallelism in your example, so once again the solution is to make calculateCell() return the value and only lock for the duration of writing the result in the result array.
Or indeed you can use ForkJoin and forget about the whole thing because it will do it for you.
This is a parallel implementation of Levenshtein distance that I was writing for fun. I'm disappointed in the results. I am running this on a core i7 processor, so I have plenty of available threads. However, as I increase the thread count, the performance degrades significantly. By that I mean it actually runs slower with more threads for input of the same size.
I was hoping that someone could look at the way I am using threads, and the java.util.concurrent package, and tell me if I am doing anything wrong. I'm really only interested in reasons why the parallelism is not working as I would expect. I don't expect the reader to look at the complicated indexing going on here. I believe the calculations I'm doing are correct. But even if they are not, I think I should still be seeing a close to linear speed-up as I increase the number of threads in the threadpool.
I've included the benchmarking code I used. I'm using libraries found here for benchmarking. The second code block is what I used for benchmarking.
Thanks for any help :).
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
public class EditDistance {
private static final int MIN_CHUNK_SIZE = 5;
private final ExecutorService threadPool;
private final int threadCount;
private final String maxStr;
private final String minStr;
private final int maxLen;
private final int minLen;
public EditDistance(String s1, String s2, ExecutorService threadPool,
int threadCount) {
this.threadCount = threadCount;
this.threadPool = threadPool;
if (s1.length() < s2.length()) {
minStr = s1;
maxStr = s2;
} else {
minStr = s2;
maxStr = s1;
}
maxLen = maxStr.length();
minLen = minStr.length();
}
public int editDist() {
int iterations = maxLen + minLen - 1;
int[] prev = new int[0];
int[] current = null;
for (int i = 0; i < iterations; i++) {
int currentLen;
if (i < minLen) {
currentLen = i + 1;
} else if (i < maxLen) {
currentLen = minLen;
} else {
currentLen = iterations - i;
}
current = new int[currentLen * 2 - 1];
parallelize(prev, current, currentLen, i);
prev = current;
}
return current[0];
}
private void parallelize(int[] prev, int[] current, int currentLen,
int iteration) {
int chunkSize = Math.max(current.length / threadCount, MIN_CHUNK_SIZE);
List<Future<?>> futures = new ArrayList<Future<?>>(currentLen);
for (int i = 0; i < currentLen; i += chunkSize) {
int stopIdx = Math.min(currentLen, i + chunkSize);
Runnable worker = new Worker(prev, current, currentLen, iteration,
i, stopIdx);
futures.add(threadPool.submit(worker));
}
for (Future<?> future : futures) {
try {
Object result = future.get();
if (result != null) {
throw new RuntimeException(result.toString());
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (ExecutionException e) {
// We can only finish the computation if we complete
// all subproblems
throw new RuntimeException(e);
}
}
}
private void doChunk(int[] prev, int[] current, int currentLen,
int iteration, int startIdx, int stopIdx) {
int mergeStartIdx = (iteration < minLen) ? 0 : 2;
for (int i = startIdx; i < stopIdx; i++) {
// Edit distance
int x;
int y;
int leftIdx;
int downIdx;
int diagonalIdx;
if (iteration < minLen) {
x = i;
y = currentLen - i - 1;
leftIdx = i * 2 - 2;
downIdx = i * 2;
diagonalIdx = i * 2 - 1;
} else {
x = i + iteration - minLen + 1;
y = minLen - i - 1;
leftIdx = i * 2;
downIdx = i * 2 + 2;
diagonalIdx = i * 2 + 1;
}
int left = 1 + ((leftIdx < 0) ? iteration + 1 : prev[leftIdx]);
int down = 1 + ((downIdx < prev.length) ? prev[downIdx]
: iteration + 1);
int diagonal = penalty(x, y)
+ ((diagonalIdx < 0 || diagonalIdx >= prev.length) ? iteration
: prev[diagonalIdx]);
int dist = Math.min(left, Math.min(down, diagonal));
current[i * 2] = dist;
// Merge prev
int mergeIdx = i * 2 + 1;
if (mergeIdx < current.length) {
current[mergeIdx] = prev[mergeStartIdx + i * 2];
}
}
}
private int penalty(int maxIdx, int minIdx) {
return (maxStr.charAt(maxIdx) == minStr.charAt(minIdx)) ? 0 : 1;
}
private class Worker implements Runnable {
private final int[] prev;
private final int[] current;
private final int currentLen;
private final int iteration;
private final int startIdx;
private final int stopIdx;
Worker(int[] prev, int[] current, int currentLen, int iteration,
int startIdx, int stopIdx) {
this.prev = prev;
this.current = current;
this.currentLen = currentLen;
this.iteration = iteration;
this.startIdx = startIdx;
this.stopIdx = stopIdx;
}
#Override
public void run() {
doChunk(prev, current, currentLen, iteration, startIdx, stopIdx);
}
}
public static void main(String args[]) {
int threadCount = 4;
ExecutorService threadPool = Executors.newFixedThreadPool(threadCount);
EditDistance ed = new EditDistance("Saturday", "Sunday", threadPool,
threadCount);
System.out.println(ed.editDist());
threadPool.shutdown();
}
}
There is a private inner class Worker inside EditDistance. Each worker is responsible for filling in a range of the current array using EditDistance.doChunk. EditDistance.parallelize is responsible for creating those workers, and waiting for them to finish their tasks.
And the code I am using for benchmarks:
import java.io.PrintStream;
import java.util.concurrent.*;
import org.apache.commons.lang3.RandomStringUtils;
import bb.util.Benchmark;
public class EditDistanceBenchmark {
public static void main(String[] args) {
if (args.length != 2) {
System.out.println("Usage: <string length> <thread count>");
System.exit(1);
}
PrintStream oldOut = System.out;
System.setOut(System.err);
int strLen = Integer.parseInt(args[0]);
int threadCount = Integer.parseInt(args[1]);
String s1 = RandomStringUtils.randomAlphabetic(strLen);
String s2 = RandomStringUtils.randomAlphabetic(strLen);
ExecutorService threadPool = Executors.newFixedThreadPool(threadCount);
Benchmark b = new Benchmark(new Benchmarker(s1, s2, threadPool,threadCount));
System.setOut(oldOut);
System.out.println("threadCount: " + threadCount +
" string length: "+ strLen + "\n\n" + b);
System.out.println("s1: " + s1 + "\ns2: " + s2);
threadPool.shutdown();
}
private static class Benchmarker implements Runnable {
private final String s1, s2;
private final int threadCount;
private final ExecutorService threadPool;
private Benchmarker(String s1, String s2, ExecutorService threadPool, int threadCount) {
this.s1 = s1;
this.s2 = s2;
this.threadPool = threadPool;
this.threadCount = threadCount;
}
#Override
public void run() {
EditDistance d = new EditDistance(s1, s2, threadPool, threadCount);
d.editDist();
}
}
}
It's very easy to accidentally write code that does not parallelize very well. A main culprit is when your threads compete for underlying system resources (e.g. a cache line). Since this algorithm inherently acts on things that are close to each other in physical memory, I suspect pretty strongly that may be the culprit.
I suggest you review this excellent article on False Sharing
http://www.drdobbs.com/go-parallel/article/217500206?pgno=3
and then carefully review your code for cases where threads would block one another.
Additionally, running more threads than you have CPU cores will slow down performance if your threads are CPU bound (if you're already using all cores to near 100%, adding more threads will only add overhead for the context switches).