Why the concurrent threads limit doesn't works as expected?

Why the concurrent threads limit doesn't works as expected? - java

Though there are similar issues, I couldn't found any similar examples like the one I got. I really appreciate any help understanding where I got wrong with my implementation.
What I'm trying to do:
I have a Main class Driver, which can instantiates unknown number of threads. Each thread call a singleton class which should simulate a 'fake' file transfer action.
The issue I have is that I need to limit the concurrent transfers to 2 transfers, regardless the number of concurrent requests.
The way I tried to solve my problem is by adding each new Thread in a ConcurrentLinkedQueue and managing it by using Executors.newFixedThreadPool(POOL_SIZE) to limit the concurrent threads to be 2. for every interation - I poll new thread from the pool using pool.submit.
The Problem I have is my output is like this:
[Thread1], [Thread1, Thread2], [Thread1, Thread2, Thread3]...
While it should be:
[Thread1, Thread2], [Thread3, Thread4]
Why the limitation doesn't work here?
My implementation:
Copier - this is my singleton class.
public class Copier {
private final int POOL_SIZE = 2;
private static volatile Copier instance = null;
private Queue<Reportable> threadQuere = new ConcurrentLinkedQueue();
private static FileCopier fileCopier = new FileCopier();
private Copier() {
}
public static Copier getInstance() {
if (instance == null) {
synchronized (Copier.class) {
if (instance == null) {
instance = new Copier();
}
}
}
return instance;
}
public void fileTransfer(Reportable reportable) {
threadQuere.add(reportable);
ExecutorService pool = Executors.newFixedThreadPool(POOL_SIZE);
for (int i=0; i < threadQuere.size(); i++) {
System.out.println("This is the " + (i+1) + " thread");
pool.submit(new CopyThread());
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
CopyThread - represend a thread class
public class CopyThread implements Reportable, Runnable {
private static FileCopier fileCopier = new FileCopier();
#Override
public void report(String bitrate) {
System.out.println(bitrate);
}
#Override
public void run() {
synchronized(fileCopier) {
long startTime = System.nanoTime();
long bytes = fileCopier.copyFile();
long endTime = System.nanoTime();
double duration = (double)(endTime - startTime) / 1000000000; // get in seconds
double bytesInMegas = (double) bytes / 1000000;
report(bytesInMegas + "MB were transferred in " + duration + " seconds");
}
}
}
Driver - my main class where do I create all the threads
public class Driver {
public static void main(String[] args) {
Copier copier = Copier.getInstance();
CopyThread copyThread1 = new CopyThread();
CopyThread copyThread2 = new CopyThread();
CopyThread copyThread3 = new CopyThread();
CopyThread copyThread4 = new CopyThread();
copier.fileTransfer(copyThread1);
copier.fileTransfer(copyThread2);
copier.fileTransfer(copyThread3);
copier.fileTransfer(copyThread4);
int q = 0;
}
}

A simpler solution would be a Semaphore with 2 permits.
This makes sure that "outside" threads can't bypass the limit either, since your solution expects that the simultaneous tasks are limited by the size of the threadpool.
Your solution uses several concurrency tools when a single one would suffice. Your DCL singleton is a bit outdated too.

Everything is probably fine here (although a bit weird). You are printing the thread numbers before submiting, what you need to do is put print in a run method, and you will see that everything works fine. The print are all gonna go off normally, because the area where you are using print has nothing to do with Executors. There is more problems with your code, but I think you did all that just for testing/learning so that's why it's like that.
In that case, like I said, put prints in the run method (you can use some static variable in CopyThread class for counting threads). Your output will be something like 2 prints about thread numbers (1 and 2), 2 prints about how long transfer took and then prints about thread 3 and 4 (I say probably, because we are working with threads, can't be sure of anything) - all this at the step 4 ofcourse, when your fileTransfer submits 4 runnables. Your singleton is outdated, because it uses double checked locking, which is wrong on multithreaded machine, check this: here. That's not ruining your program so worry about it later. About everything else (weird queue usage, fileTransfer method making new threads pools etc.) like I said, it's probably for learning, but if it's not - your queue may as well be deleted, you are using it only for counting and counting like this could be done with some counter variable, and your fileTransfer method should just submit new runnable to pool (which would be instance variable) to transfer a file, not create pool and submit few runnables, it's kinda anty-intuitive.
Edit: check this, I put all in Cat.java for simplicity, changed some things that I had to change (I don't have FileCopier class etc., but answer to your problem is here):
import java.util.*;
import java.util.concurrent.*;
class Copier {
private final int POOL_SIZE = 2;
private static volatile Copier instance = null;
private Copier() {
}
public static Copier getInstance() {
if (instance == null) {
synchronized (Copier.class) {
if (instance == null) {
instance = new Copier();
}
}
}
return instance;
}
public void fileTransfer() {
ExecutorService pool = Executors.newFixedThreadPool(POOL_SIZE);
for (int i=0; i < 4; i++) {
pool.submit(new CopyThread());
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
class CopyThread implements Runnable {
private static int counter = 0;
public void report(String bitrate) {
System.out.println(bitrate);
}
Object obj = new Object();
#Override
public void run() {
synchronized(obj) {
System.out.println("This is the " + (++counter) + " thread");
long startTime = System.nanoTime();
long bytes = 0;
for(int i=0; i<100000; i++)
bytes+=1;
long endTime = System.nanoTime();
double duration = (double)(endTime - startTime) / 1000000000; // get in seconds
double bytesInMegas = (double) bytes / 1000000;
report(bytesInMegas + "MB were transferred in " + duration + " seconds");
}
}
}
public class Cat {
public static void main(String[] args) {
Copier copier = Copier.getInstance();
copier.fileTransfer();
}
}

Related

Why do consumers decrease the producer's performance

I'm currently trying to increase the performance of my software by implementing the producer-consumer pattern. In my particular case I have a producer that sequentially creates Rows and multiple consumers that perform some task for a given batch of rows.
The problem I'm facing now is that when I measure the performance of my Producer-Consumer pattern, I can see that the producer's running time massively increases and I don't understand why this is the case.
So far I mainly profiled my code and did micro-benchmarking yet the results did not lead me to the actual problem.
public class ProdCons {
static class Row {
String[] _cols;
Row() {
_cols = Stream.generate(() -> "Row-Entry").limit(5).toArray(String[]::new);
}
}
static class Producer {
private static final int N_ITER = 8000000;
final ExecutorService _execService;
final int _batchSize;
final Function<Row[], Consumer> _f;
Producer(final int batchSize, final int nThreads, Function<Row[], Consumer> f) throws InterruptedException {
_execService = Executors.newFixedThreadPool(nThreads);
_batchSize = batchSize;
_f = f;
// init all threads to exclude their generaration time
startThreads();
}
private void startThreads() throws InterruptedException {
List<Callable<Void>> l = Stream.generate(() -> new Callable<Void>() {
#Override
public Void call() throws Exception {
Thread.sleep(10);
return null;
}
}).limit(4).collect(Collectors.toList());
_execService.invokeAll(l);
}
long run() throws InterruptedException {
final long start = System.nanoTime();
int idx = 0;
Row[] batch = new Row[_batchSize];
for (int i = 0; i < N_ITER; i++) {
batch[idx++] = new Row();
if (idx == _batchSize) {
_execService.submit(_f.apply(batch));
batch = new Row[_batchSize];
idx = 0;
}
}
final long time = System.nanoTime() - start;
_execService.shutdownNow();
_execService.awaitTermination(100, TimeUnit.MILLISECONDS);
return time;
}
}
static abstract class Consumer implements Callable<String> {
final Row[] _rowBatch;
Consumer(final Row[] data) {
_rowBatch = data;
}
}
static class NoOpConsumer extends Consumer {
NoOpConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
return null;
}
}
static class SomeConsumer extends Consumer {
SomeConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
String res = null;
for (int i = 0; i < 1000; i++) {
res = "";
for (final Row r : _rowBatch) {
for (final String s : r._cols) {
res += s;
}
}
}
return res;
}
}
public static void main(String[] args) throws InterruptedException {
final int nRuns = 10;
long totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new NoOpConsumer(data)).run();
}
System.out.println("Avg time with NoOpConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new SomeConsumer(data)).run();
}
System.out.println("Avg time with SomeConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
}
Actually, since the consumers run in different threads than the producer, I would expect that the running time of the producer is not effected by the Consumer's workload. However, running the program I get the following output
#1 Thread, #100 batch size
Avg time with NoOpConsumer: 0.7507254368s
Avg time with SomeConsumer: 1.5334749871s
Note that the time measurement does only measure the production time and not the consumer time and that not submitting any jobs requires on avg. ~0.6 secs.
Even more surprising is that when I increase the number of threads from 1 to 4, I get the following results (4-cores with hyperthreading).
#4 Threads, #100 batch size
Avg time with NoOpConsumer: 0.7741189636s
Avg time with SomeConsumer: 2.5561667638s
Am I doing something wrong? What am I missing? Currently I have to believe that the running time differences are due to context switches or anything related to my system.

Threads are not completely isolated from one another.
It looks like your SomeConsumer class allocates a lot of memory, and this produces garbage collection work that is shared between all threads, including your producer thread.
It also accesses a lot of memory, which can knock the memory used by the producer out of L1 or L2 cache. Accessing real memory takes a lot longer than accessing cache, so this can make your producer take longer as well.
Note also that I didn't actually verify that you're measuring the producer time properly, and it's easy to make mistakes there.

How can I schedule some work in n threads separately

Lets say I have n threads concurrently taking values from a shared queue:
public class WorkerThread implements Runnable{
private BlockingQueue queue;
private ArrayList<Integer> counts = new ArrayList<>();
private int count=0;
public void run(){
while(true) {
queue.pop();
count++;
}
}
}
Then for each thread, I want to count every 5 seconds how many items it has dequeued, and then store it in its own list (counts)
I've seen here Print "hello world" every X seconds how you can run some code every x seconds:
Timer t = new Timer();
t.scheduleAtFixedRate(new TimerTask(){
#Override
public void run(){
counts.add(count);
count = 0
}
}, 0, 5000);
The problem with this is that I can't access count variable and the list of counts unless they are static. But I don't want them to be static because I don't want the different threads to share those variables.
Any ideas of how to handle this?

I don't think it's possible to use scheduled execution for you case(neither Timer nor ScheduledExecutorService), because each new scheduled invocation will create a new tasks with while loop. So number of tasks will increase constantly.
If you don't need to access this list of counts in runtime i would suggest something like this one:
static class Task implements Runnable {
private final ThreadLocal<List<Integer>> counts = ThreadLocal.withInitial(ArrayList::new);
private volatile List<Integer> result = new ArrayList<>();
private BlockingQueue<Object> queue;
public Task(BlockingQueue<Object> queue) {
this.queue = queue;
}
#Override
public void run() {
int count = 0;
long start = System.nanoTime();
try {
while (!Thread.currentThread().isInterrupted()) {
queue.take();
count++;
long end = System.nanoTime();
if ((end - start) >= TimeUnit.SECONDS.toNanos(1)) {
counts.get().add(count);
count = 0;
start = end;
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// the last value
counts.get().add(count);
// copy the result cause it's not possible
// to access thread local variable outside of this thread
result = counts.get();
}
public List<Integer> getCounts() {
return result;
}
}
public static void main(String[] args) throws Exception {
ExecutorService executorService = Executors.newFixedThreadPool(3);
BlockingQueue<Object> blockingQueue = new LinkedBlockingQueue<>();
Task t1 = new Task(blockingQueue);
Task t2 = new Task(blockingQueue);
Task t3 = new Task(blockingQueue);
executorService.submit(t1);
executorService.submit(t2);
executorService.submit(t3);
for (int i = 0; i < 50; i++) {
blockingQueue.add(new Object());
Thread.sleep(100);
}
// unlike shutdown() interrupts running threads
executorService.shutdownNow();
executorService.awaitTermination(1, TimeUnit.SECONDS);
System.out.println("t1 " + t1.getCounts());
System.out.println("t2 " + t2.getCounts());
System.out.println("t3 " + t3.getCounts());
int total = Stream.concat(Stream.concat(t1.getCounts().stream(), t2.getCounts().stream()), t3.getCounts().stream())
.reduce(0, (a, b) -> a + b);
// 50 as expected
System.out.println(total);
}

Why not a static AtomicLong?
Or the WorkerThread(s) can publish that they poped to the TimerTask or somewhere else? And the TimerTask reads that info?

How can this loop ever exit?

So, I ran a test and the results make no sense to me. Lets consider the following code:
ThreadStuffCounter counter_1 = new ThreadStuffCounter(1);
while(counter_1.doProceed) {
Thread.sleep(500);
Thread thread = new Thread(counter_1);
thread.start();
}
With the Runnable as follows:
package test;
public class ThreadStuffCounter implements Runnable {
public volatile boolean doProceed = true;
private int id = -1;
public volatile int i = -1;
public ThreadStuffCounter(int id) {
this.id = id;
}
#Override
public void run() {
for (i = 0; i < 10; i++) {
System.out.println("i = " + i + " in runnable id = " + id);
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
doProceed = false;
}
}
Only one instance of counter is shared between threads. It takes less time for another thread to start then even one increment to be made on the counter.doProceed should, as I understand never be set to false and the loop should continue indefinitely until I get an out of memory exception and cannot start any more threads.
How is it possible for the loop to exit?
EDIT: Modified code to make sure the answer below is correct.
package test;
public class ThreadStuffCounter implements Runnable{
public volatile boolean doProceed = true;
private int id = -1;
volatile int i = -1;
public ThreadStuffCounter(int id){
this.id = id;
}
#Override
public void run() {
i = 0;
while (i < 10){
System.out.println("i = " + i + " in runnable id = " + id +
"; from thead id = " + Thread.currentThread().getId());
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
i++;
}
ThreadStuff.doProceed = false;
}
}
And
package test;
public class ThreadStuff {
public static volatile boolean doProceed = true;
public static void main (String[] args) throws InterruptedException{
ThreadStuffCounter counter_1 = new ThreadStuffCounter(1);
while(doProceed){
Thread.sleep(500);
Thread thread = new Thread(counter_1);
thread.start();
}
}
}
Also, it appears more then n threads are needed if you are running for i < n. You need however many, so that n threads increment at the same time.

When at least one of the threads executes the for loop and i value is greater or equal than 10, then doProceed variable will be false (yes, this may happen), and since it's volatile this will stop the execution of the while loop that creates and starts new threads. Then, is up to all the threads to just finish executing the code of the for loop and then finishing their execution. This seems to happen because the time to start a new thread in your environment is slower than the time for a current thread to finish its execution. Also, note that several threads may increase i value, which will accelerate the for loop execution.
Probably if you loop to a higher value (not tested) then this could generate an infinite loop and the application will break when there aren't enough resources to create and start new threads.
After some tests using the limit as 10, 50 and 1000. I noticed that when you have a bigger value, since lots of threads are created, all of them increase the value of i at the same time and i slowly starts to get closer to the limit value set in the for loop. Description of my current environment:
OS: Windows 7 Professional 64 bits
Processor: Intel(R) Core(TM) i5-2520M CPU # 2.50GHz (4 CPUs), ~2.5GHz
Ram: 8192MB

Measure Contention Time In Locks

I want to measure the contention times for different kinds of lock. This example
is for a TAS Lock. I am unable to figure out how to go about it. I have used ctime to measure the amount of time a thread is waiting on a lock, but it seems that I am going
about it in a wrong way. Since after ctime1 or before ctime2 if may switch and as a result I will not get the actual contention time. Am I approaching it in the right manner or is there some other way to go about it?
class Stack{
int stack[];
int top, size;
void push(int e);
int pop();
}
class Pusher implements Runnable{
Thread t;
Stack s;
TASLock lock;
long ctime, etime;
Pusher(Stack temp, TASLock tempLock){
t = new Thread(this);
s = temp;
lock = tempLock;
ctime = etime = 0;
t.start();
}
public void run(){
long ctime1, ctime2, etime1, etime2;
int id = (int)t.getId();
int times = 10000;
etime1 = System.nanoTime();
while(times-->0){
//System.out.println("Pushing - "+t.getName());
ctime1 = System.nanoTime();
lock.lock();
ctime2 = System.nanoTime();
ctime += ctime2 - ctime1;
try{
s.push(id);
} finally { lock.unlock(); }
}
etime2 = System.nanoTime();
etime = etime2-etime1;
System.out.println(t.getName()+" Waiting time : "+ctime);
System.out.println(t.getName()+" Execution time : "+etime);
}
}
class Popper implements Runnable{
Works the same way as push...
}
class StackDriver{
public static void main(String[] args){
Stack s = new Stack(1000);
TASLock lock = new TASLock();
int i, noOfThreads = Integer.parseInt(args[0]);
for(i=0; i<noOfThreads; i++){
if(i%2 == 0)
new Pusher(s,lock);
else
new Popper(s,lock);
}
}
}

As you wrote, you are measuring the execution time between two points, not the lock contention.
You'd better use the Threads tab on Visual VM.
Take a look at this:
http://visualvm.java.net/threads.html

comparison of code performance, threaded versus non-threaded

I have some thread-related questions, assuming the following code. Please ignore the possible inefficiency of the code, I'm only interested in the thread part.
//code without thread use
public static int getNextPrime(int from) {
int nextPrime = from+1;
boolean superPrime = false;
while(!superPrime) {
boolean prime = true;
for(int i = 2;i &lt nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
}
}
if(prime) {
superPrime = true;
} else {
nextPrime++;
}
}
return nextPrime;
}
public static void main(String[] args) {
int primeStart = 5;
ArrayList list = new ArrayList();
for(int i = 0;i &lt 10000;i++) {
list.add(primeStart);
primeStart = getNextPrime(primeStart);
}
}
If I'm running the code like this and it takes about 56 seconds. If, however, I have the following code (as an alternative):
public class PrimeRunnable implements Runnable {
private int from;
private int lastPrime;
public PrimeRunnable(int from) {
this.from = from;
}
public boolean isPrime(int number) {
for(int i = 2;i &lt from;i++) {
if((number % i) == 0) {
return false;
}
}
lastPrime = number;
return true;
}
public int getLastPrime() {
return lastPrime;
}
public void run() {
while(!isPrime(++from))
;
}
}
public static void main(String[] args) {
int primeStart = 5;
ArrayList list = new ArrayList();
for(int i = 0;i &lt 10000;i++) {
PrimeRunnable pr = new PrimeRunnable(primeStart);
Thread t = new Thread(pr);
t.start();
t.join();
primeStart = pr.getLastPrime();
list.add(primeStart);
}
}
The whole operation takes about 7 seconds. I am almost certain that even though I only create one thread at a time, a thread doesn't always finish when another is created. Is that right? I am also curious: why is the operation ending so fast?
When I'm joining a thread, do other threads keep running in the background, or is the joined thread the only one that's running?

By putting the join() in the loop, you're starting a thread, then waiting for that thread to stop before running the next one. I think you probably want something more like this:
public static void main(String[] args) {
int primeStart = 5;
// Make thread-safe list for adding results to
List list = Collections.synchronizedList(new ArrayList());
// Pull thread pool count out into a value so you can easily change it
int threadCount = 10000;
Thread[] threads = new Thread[threadCount];
// Start all threads
for(int i = 0;i < threadCount;i++) {
// Pass list to each Runnable here
// Also, I added +i here as I think the intention is
// to test 10000 possible numbers>5 for primeness -
// was testing 5 in all loops
PrimeRunnable pr = new PrimeRunnable(primeStart+i, list);
Thread[i] threads = new Thread(pr);
threads[i].start(); // thread is now running in parallel
}
// All threads now running in parallel
// Then wait for all threads to complete
for(int i=0; i<threadCount; i++) {
threads[i].join();
}
}
By the way pr.getLastPrime() will return 0 in the case of no prime, so you might want to filter that out before adding it to your list. The PrimeRunnable has to absorb the work of adding to the final results list. Also, I think PrimeRunnable was actually broken by still having incrementing code in it. I think this is fixed, but I'm not actually compiling this.
public class PrimeRunnable implements Runnable {
private int from;
private List results; // shared but thread-safe
public PrimeRunnable(int from, List results) {
this.from = from;
this.results = results;
}
public void isPrime(int number) {
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return;
}
}
// found prime, add to shared results
this.results.add(number);
}
public void run() {
isPrime(from); // don't increment, just check one number
}
}
Running 10000 threads in parallel is not a good idea. It's a much better idea to create a reasonably sized fixed thread pool and have them pull work from a shared queue. Basically every worker pulls tasks from the same queue, works on them and saves the results somewhere. The closest port of this with Java 5+ is to use an ExecutorService backed by a thread pool. You could also use a CompletionService which combines an ExecutorService with a result queue.
An ExecutorService version would look like:
public static void main(String[] args) {
int primeStart = 5;
// Make thread-safe list for adding results to
List list = Collections.synchronizedList(new ArrayList());
int threadCount = 16; // Experiment with this to find best on your machine
ExecutorService exec = Executors.newFixedThreadPool(threadCount);
int workCount = 10000; // See how # of work is now separate from # of threads?
for(int i = 0;i < workCount;i++) {
// submit work to the svc for execution across the thread pool
exec.execute(new PrimeRunnable(primeStart+i, list));
}
// Wait for all tasks to be done or timeout to go off
exec.awaitTermination(1, TimeUnit.DAYS);
}
Hope that gave you some ideas. And I hope the last example seemed a lot better than the first.

You can test this better by making the exact code in your first example run with threads. Sub your main method with this:
private static int currentPrime;
public static void main(String[] args) throws InterruptedException {
for (currentPrime = 0; currentPrime < 10000; currentPrime++) {
Thread t = new Thread(new Runnable() {
public void run() {
getNextPrime(currentPrime);
}});
t.run();
t.join();
}
}
This will run in the same time as the original.
To answer your "join" question: yes, other threads can be running in the background when you use "join", but in this particular case you will only have one active thread at a time, because you are blocking the creation of new threads until the last thread is done executing.

JesperE is right, but I don't believe in only giving hints (at least outside a classroom):
Note this loop in the non-threaded version:
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
}
}
As opposed to this in the threaded version:
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return false;
}
}
The first loop will always run completely through, while the second will exit early if it finds a divisor.
You could make the first loop also exit early by adding a break statement like this:
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
break;
}
}

Read your code carefully. The two cases aren't doing the same thing, and it has nothing to do with threads.
When you join a thread, other threads will run in the background, yes.

Running a test, the second one doesn't seem to take 9 seconds--in fact, it takes at least as long as the first (which is to be expected, threding can't help the way it's implemented in your example.
Thread.join will only return when the thread.joined terminates, then the current thread will continue, the one you called join on will be dead.
For a quick reference--think threading when starting one iteration does not depend on the result of the previous one.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why the concurrent threads limit doesn't works as expected? - java

Related

Why do consumers decrease the producer's performance

How can I schedule some work in n threads separately

How can this loop ever exit?

Measure Contention Time In Locks

comparison of code performance, threaded versus non-threaded

Categories

Resources