Java, Thread and Priorities

Java, Thread and Priorities - java

I'm working on a problem, that was supposed to be VERY simple to solve, however I am not getting it done so easily.
The problem is quite simple: I have a Java Program running on Linux/x86 that can perform two basic functionalities F1 and F2. I would like to set F1 to have a higher priority, even though it is a MUST that F2 executes from times to times, i.e, the fact of having F1 requests on queue cannot put F2 requests waiting forever.
My first though was just having separate queues with a thread pool for each functionality, I set the F1 pool to have 8 threads while the F2 pool got only 2 threads.
On my expectaion linux would give fairly time share for each thread, so F1 would have 8 quantums while F2 would get just 2. If there was no F1 requests, F2 pool could get every quantum to itself, the same should be true for F1 just in case F2 has no requests.
However, the program is not behavingthat way, if I get a burst of F2 requests and just couple of F1 requets, the latter is taking a long time to get its turn.
Doest that make sense talking about Oracle HotSpot/linux scheduling? Or it should not be happening, what would point to an implementation error from my part?
PS: I've read about linux scheduling, and it seems that SCHED_OTHER (TS) gives time share for each task, however every time a task ready is not executed it gets a bigger quantum, and if that is happening to F2 pool, that might explain the above mentioned behavior.
Thanks and Regards.
Below there is a sample source code.
package test;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
/**
* Created by giscardff on 08/07/18.
*/
public class TestThread {
// Test Program
public static void main(String args[]) throws Exception {
// queues containing jobs to be done
ArrayBlockingQueue<MyDTO> queueA = new ArrayBlockingQueue<>(100);
ArrayBlockingQueue<MyDTO> queueB = new ArrayBlockingQueue<>(100);
// create pool for functionality A
for(int i = 1; i <= 8; i++){
MyThread thread = new MyThread("ThreadA" + i, queueA);
thread.start();
}
// create pool for functionality B
for(int i = 1; i <= 2; i++){
MyThread thread = new MyThread("ThreadB" + i, queueB);
thread.start();
}
// create producer for A
// it will take 100ms between requests
Producer producerA = new Producer(queueA, 0);
producerA.start();
// create producer for B
// it will take 0ms between requests
Producer producerB = new Producer(queueB, 0);
producerB.start();
}
}
/**
* Just put a request into a queue
*/
class Producer extends Thread {
private ArrayBlockingQueue<MyDTO> queue;
private long sleep;
public Producer(ArrayBlockingQueue<MyDTO> queue, long sleep){
this.queue = queue;
this.sleep = sleep;
}
#Override
public void run() {
try {
while (true) {
if(sleep > 0)Thread.sleep(sleep);
queue.put(new MyDTO());
}
}catch(Exception ex){}
}
}
/**
* Retrieve a request from a queue, calculate how long request took to
* be received for each 1M requests
*/
class MyThread extends Thread {
private ArrayBlockingQueue<MyDTO> queue;
private long delay = 0;
private int count = 0;
public MyThread(String name, ArrayBlockingQueue<MyDTO> queue){
super(name);
this.queue = queue;
}
#Override
public void run() {
try {
while (true) {
MyDTO input = queue.take();
delay += System.currentTimeMillis() - Long.parseLong(input.getTime());
if(++count % 1000 == 0){
System.out.printf("%s: %d\n", getName(), delay / 10);
count = 0;
}
}
}catch(Exception ex){ex.printStackTrace();}
}
}
/**
* Just a DTO representing a request
* NOTE: The time was set as String to force CPU to do something more than just math operations
*/
class MyDTO {
private String time;
public MyDTO(){
this.time = "" + System.currentTimeMillis();
}
public String getTime() {
return time;
}
}

It looks like you've got a few issues. I'll try to summarize them and provide starting point for a path forward:
Thread Contention
Using the BlockingQueue comes with a cost - every write operation (put & take) is lock contended between the producers or consumers. Your "A pool" has 9 threads contending over the write lock for queueA (1 producer, 8 consumers), while your "B pool" has 3 threads contending over the lock for queueB (1 producer, 2 consumers).
This related answer provides a bit more detail about contention. The simplest ways around this are to "use less threads" or use "lock-free" mechanisms to eliminate the contention.
Thread Scheduling
As mentioned in the comments, you're at the mercy of how the JVM is scheduling your threads.
If java thread scheduling used perfectly fair time shares on the CPU, you'd probably see consumption counts on each thread in the same pool extremely close to each other. You've probably noticed they're not - my runs of your (slightly modified) code occasionally give me a count spread of 300K or more across the threads.
You can often get this better when there are enough CPU cores for each CPU-bound thread (you've got 12 in your sample code), but it's far from ideal in many cases, especially in the face of thread contention.
What can you do?
Build your own fairness logic - don't rely on the JVM thread scheduler to be fair, because it won't be.
A simple idea for your case would be to keep both queues, but use a single pool to process both - use either round-robin or Math.random() (ie: if (rand < 0.8) { queueA.poll();}) to determine which queue to poll from. Note - Use poll so you can easily handle the case when a queue is empty without blocking.
Experiment with number of CPU-bound threads running on your hardware. With my suggestion for (1) above, you could even have just one worker thread fairly processing both queues. Remember, too many threads contending over the same resources will slow down your processing.
Isn't threading fun? :)

Related

Data structure to manage a maximum number of requests per minute for an api

I need to send data to an external api, but this API has a limit of requests per endpoint(i.e: 60 requests per minute).
The data come from Kafka, then every message goes to redis(because I can send a request with 200 items). So, I use a simple cache to help me, and I can guarantee that if my server goes down, I wont lose any message.
The problem is, that there are moments when the Kafka starts to send to many messages, then the redis starts to grow(more than 1 million of messages to send to the api), and we can not make requests too fast as messages come in. Then, we have a big delay.
My first code was simple: ExecutorService executor = Executors.newFixedThreadPool(1);
This works very well, when there are few messages, and the delay is minimal.
So, the first thing that I did was change the executor to: ExecutorService executor = Executors.newCachedThreadPool();
So I can demand new threads as I need to make the requests to the external api faster, but, I have the problem with the limit of requests per minute.
There are endpoints that I can make 300 requests per minutes, others 500, others 30 and so on.
The code that I did is not very good, and this is for the company that I work, so, I really need to make this better.
So, every time that I am going to request the external api, I call the makeRequest method, this method is synchronized, I know that I could use a synchonized list, but I think that a synchronized method is better at this situation.
// This is an inner class
private static class IntegrationType {
final Queue<Long> requests; // This queue is used to store the timestamp of the requests
final int maxRequestsPerMinute; // How many requests I can make per minute
public IntegrationType(final int maxRequestsPerMinute) {
this.maxRequestsPerMinute = maxRequestsPerMinute;
this.requests = new LinkedList<>();
}
synchronized void makeRequest() {
final long current = System.currentTimeMillis();
requests.add(current);
if (requests.size() >= maxRequestsPerMinute) {
long first = requests.poll(); // gets the first request
// The difference between the current request and the first request of the queue
final int differenceInSeconds = (int) (current - first) / 1000;
// if the difference is less than the maximum allowed
if (differenceInSeconds <= 60) {
// seconds to sleep.
final int secondsToSleep = 60 - differenceInSeconds;
sleep(secondsToSleep);
}
}
}
void sleep( int seconds){
try {
Thread.sleep(seconds * 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
So, there is a Data Structure that I could use?
What considerations should I take?
Thanks in advance.

If understand your problem correctly, you can use a BlockingQueue with a ScheduledExecutorService as follows.
BlockingQueues have the method put which will only add the given element at the queue if there is available space, otherwise the method call will wait (until there is free space). They also have the method take which will only remove an element from the queue if there are any elements at all, otherwise the method call will wait (until there is at least one element to take).
Specifically you can use a LinkedBlockingQueue or an ArrayBlockingQueue which can be given with a fixed size of elements to hold at any given time. This fixed size means that you can submit with put as many requests as you like, but you will only take requests and process them once every second or something (so as to make 60 requests per minute for example).
To instantiate a LinkedBlockingQueue with fixed size, just use the corresponding constructor (which accepts the size as the argument). LinkedBlockingQueue will take elements in FIFO order according to its documentation.
To instantiate an ArrayBlockingQueue with fixed size, use the constructor which accepts the size but also the boolean flag named fair. If this flag is true then the queue will take elements also in FIFO order.
Then you can have a ScheduledExecutorService (instead of waiting inside a loop) where you can submit a single Runnable which will take from the queue, make the communication with the external API and then wait for the required delay between communications.
Follows a simple demonstration example of the above:
import java.util.Objects;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static class RequestSubmitter implements Runnable {
private final BlockingQueue<Request> q;
public RequestSubmitter(final BlockingQueue<Request> q) {
this.q = Objects.requireNonNull(q);
}
#Override
public void run() {
try {
q.put(new Request()); //Will block until available capacity.
}
catch (final InterruptedException ix) {
System.err.println("Interrupted!"); //Not expected to happen under normal use.
}
}
}
public static class Request {
public void make() {
try {
//Let's simulate the communication with the external API:
TimeUnit.MILLISECONDS.sleep((long) (Math.random() * 100));
}
catch (final InterruptedException ix) {
//Let's say here we failed to communicate with the external API...
}
}
}
public static class RequestImplementor implements Runnable {
private final BlockingQueue<Request> q;
public RequestImplementor(final BlockingQueue<Request> q) {
this.q = Objects.requireNonNull(q);
}
#Override
public void run() {
try {
q.take().make(); //Will block until there is at least one element to take.
System.out.println("Request made.");
}
catch (final InterruptedException ix) {
//Here the 'taking' from the 'q' is interrupted.
}
}
}
public static void main(final String[] args) throws InterruptedException {
/*The following initialization parameters specify that we
can communicate with the external API 60 times per 1 minute.*/
final int maxRequestsPerTime = 60;
final TimeUnit timeUnit = TimeUnit.MINUTES;
final long timeAmount = 1;
final BlockingQueue<Request> q = new ArrayBlockingQueue<>(maxRequestsPerTime, true);
//final BlockingQueue<Request> q = new LinkedBlockingQueue<>(maxRequestsPerTime);
//Submit some RequestSubmitters to the pool...
final ExecutorService pool = Executors.newFixedThreadPool(100);
for (int i = 0; i < 50_000; ++i)
pool.submit(new RequestSubmitter(q));
System.out.println("Serving...");
//Find out the period between communications with the external API:
final long delayMicroseconds = TimeUnit.MICROSECONDS.convert(timeAmount, timeUnit) / maxRequestsPerTime;
//We could do the same with NANOSECONDS for more accuracy, but that would be overkill I think.
//The most important line probably:
Executors.newSingleThreadScheduledExecutor().scheduleWithFixedDelay(new RequestImplementor(q), 0L, delayMicroseconds, TimeUnit.MICROSECONDS);
}
}
Note that I used scheduleWithFixedDelay and not scheduleAtFixedRate. You can see in their documentation that the first one will wait for the delay between the end of the call of the submitted Runnable to start the next one, while the second one will not wait and just resubmit the Runnable every period time units. But we don't know how long does it take to communicate with the external API, so what if for example we scheduleAtFixedRate with a period of once every minute, but the request takes more than a minute to be completed?... Then a new request would be submitted while the first one is not yet finished. So that is why I used scheduleWithFixedDelay instead of scheduleAtFixedRate. But there is more: I used a single thread scheduled executor service. Does that mean that if the first call is not finished, then a second cannot be started?... Well it seems, if you take a look at the implementation of Executors#newSingleThreadScheduledExecutor(), that a second call may occur because single thread core pool size, does not mean that the pool is of fixed size.
Another reason that I used scheduleWithFixedDelay is because of underflow of requests. For example what about the queue being empty? Then the scheduling should also wait and not submit the Runnable again.
On the other hand, if we use scheduleWithFixedDelay, with say a delay of 1/60f seconds between scheduling, and there are submitted more than 60 requests in a single minute, then this will surely make our throughput to the external API drop, because with scheduleWithFixedDelay we can guarantee that at most 60 requests will be made to the external API. It can be less than that, but we don't want it to be. We would like to reach the limit every single time. If that's not a concern to you, then you can use the above implementation already.
But let's say you do care to reach as close to the limit as possible every time, in which case and as far as I know, you can do this with a custom scheduler, which would be less clean solution than the first, but more time accurate.
Bottomline, with the above implementation, you need to make sure that the communication with the external API to serve the requests is as fast as possible.
Finaly, I should warn you to consider that I couldn't find what happens if the BlockingQueue implementations I suggested are not puting in FIFO order. I mean, what if 2 requests arrive at almost the same time while the queue is full? They will both wait, but will the first one which arrived be waiting and get puted first, or the second one be puted first? I don't know. If you don't care about some requests being made at the external API out of order, then don't worry and use the code up to this point. If you do care however, and you are able to put for example a serial number at each request, then you can use a PriorityQueue after the BlockingQueue, or even experiment with PriorityBlockingQueue (which is unfortunately unbounded). That would complicate things even more, so I didn't post relevant code with the PriorityQueue. At least I did my best and I hope I shed some good ideas. I am not saying this post is a complete solution to all your problems, but it is some considerations to start with.

I implemented something different that what #gthanop suggested.
Something that I discover, is that the limits may change. So, I might need to grow or shrink the blocking list. Another reason, would not be so easily to adapt our current code to this. And another one, we might use more than one instance, so we will need a distributed lock.
So, I implement something more easily, but not so efficiently as the answer of #ghtanop.
Here is my code(adapted, cause I can not show the company code):
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.ScheduledExecutorService;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.Semaphore;
public class Teste {
private static enum ExternalApi {
A, B, C;
}
private static class RequestManager {
private long firstRequest; // First request in one minute
// how many request have we made
private int requestsCount = 0;
// A timer thread, it will execute at every minute, it will refresh the request count and the first request time
private final ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
RequestManager() {
final long initialDelay = 0L;
final long fixedRate = 60;
executor.scheduleAtFixedRate(() -> {
System.out.println("Clearing the current count!");
requestsCount = 0;
firstRequest = System.currentTimeMillis();
}, initialDelay, fixedRate, TimeUnit.SECONDS);
}
void incrementRequest() {
requestsCount++;
}
long getFirstRequest() {
return firstRequest;
}
boolean requestsExceeded(final int requestLimit) {
return requestsCount >= requestLimit;
}
}
public static class RequestHelper {
private static final byte SECONDS_IN_MINUTE = 60;
private static final short MILLISECONDS_IN_SECOND = 1000;
private static final byte ZERO_SECONDS = 0;
// Table to support the time, and count of the requests
private final Map<Integer, RequestManager> requests;
// Table that contains the limits of each type of request
private final Map<Integer, Integer> requestLimits;
/**
* We need an array of semaphores, because, we might lock the requests for ONE, but not for TWO
*/
private final Semaphore[] semaphores;
private RequestHelper(){
// one semaphore for type
semaphores = new Semaphore[ExternalApi.values().length];
requests = new ConcurrentHashMap<>();
requestLimits = new HashMap<>();
for (final ExternalApi type : ExternalApi.values()) {
// Binary semaphore, must be fair, because we are updating things.
semaphores[type.ordinal()] = new Semaphore(1, true);
}
}
/**
* When my token expire, I must update this, because the limits might change.
* #param limits the new api limits
*/
protected void updateLimits(final Map<ExternalApi, Integer> limits) {
limits.forEach((key, value) -> requestLimits.put(key.ordinal(), value));
}
/**
* Increments the counter for the type of the request,
* Using the mutual exclusion lock, we can handle and block other threads that are trying to
* do a request to the api.
* If the incoming requests are going to exceed the maximum, we will make the thread sleep for N seconds ( 60 - time since first request)
* since we are using a Binary Semaphore, it will block incoming requests until the thread that is sleeping, wakeup and release the semaphore lock.
*
* #param type of the integration, Supp, List, PET etc ...
*/
protected final void addRequest(final ExternalApi type) {
// the index of this request
final int requestIndex = type.ordinal();
// we get the permit for the semaphore of the type
final Semaphore semaphore = semaphores[requestIndex];
// Try to acquire a permit, if no permit is available, it will block until one is available.
semaphore.acquireUninterruptibly();
///gets the requestManager for the type
final RequestManager requestManager = getRequest(requestIndex);
// increments the number of requests
requestManager.incrementRequest();
if (requestManager.requestsExceeded(requestLimits.get(type.ordinal()))) {
// the difference in seconds between a minute - the time that we needed to reach the maximum of requests
final int secondsToSleep = SECONDS_IN_MINUTE - (int) (System.currentTimeMillis() - requestManager.getFirstRequest()) / MILLISECONDS_IN_SECOND;
// We reached the maximum in less than a minute
if (secondsToSleep > ZERO_SECONDS) {
System.out.printf("We reached the maximum of: %d per minute by: %s. We must wait for: %d before make a new request!\n", requestLimits.get(type.ordinal()), type.name(), secondsToSleep);
sleep(secondsToSleep * MILLISECONDS_IN_SECOND);
}
}
// releases the semaphore
semaphore.release();
}
private final void sleep(final long time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
/**
* Gets the first Request Manager, if it is the first request, it will create the
* RequestManager object
* #param index
* #return a RequestManager instance
*/
private RequestManager getRequest(final int index) {
RequestManager request = requests.get(index);
if(request == null) {
request = new RequestManager();
requests.put(index, request);
}
return request;
}
}
public static void main(String[] args) {
final RequestHelper requestHelper = new RequestHelper();
final Map<ExternalApi, Integer> apiLimits = Map.of(ExternalApi.A, 30, ExternalApi.B, 60, ExternalApi.C, 90);
// update the limits
requestHelper.updateLimits(apiLimits);
final ScheduledExecutorService executor = Executors.newScheduledThreadPool(3);
executor.scheduleWithFixedDelay(() -> {
System.out.println("A new request is going to happen");
requestHelper.addRequest(ExternalApi.A);
sleep(65);
}, 0, 100, TimeUnit.MILLISECONDS);
executor.scheduleWithFixedDelay(() -> {
System.out.println("B new request is going to happen");
requestHelper.addRequest(ExternalApi.B);
sleep(50);
}, 0, 200, TimeUnit.MILLISECONDS);
executor.scheduleWithFixedDelay(() -> {
System.out.println("C new request is going to happen");
requestHelper.addRequest(ExternalApi.C);
sleep(30);
}, 0, 300, TimeUnit.MILLISECONDS);
}
private static final void sleep(final long time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

is it safe to store threads in a ConcurrentMap?

I am building a backend service whereby a REST call to my service creates a new thread. The thread waits for another REST call if it does not receive anything by say 5 minutes the thread will die.
To keep track of all the threads I have a collection that keeps track of all the currently running threads so that when the REST call finally comes in such as a user accepting or declining an action, I can then identify that thread using the userID. If its declined we will just remove that thread from the collection if its accepted the thread can carry on doing the next action. i have implemented this using a ConcurrentMap to avoid concurrency issues.
Since this is my first time working with threads I want to make sure that I am not overlooking any issues that may arise. Please have a look at my code and tell me if I could do it better or if there's any flaws.
public class UserAction extends Thread {
int userID;
boolean isAccepted = false;
boolean isDeclined = false;
long timeNow = System.currentTimeMillis();
long timeElapsed = timeNow + 50000;
public UserAction(int userID) {
this.userID = userID;
}
public void declineJob() {
this.isDeclined = true;
}
public void acceptJob() {
this.isAccepted = true;
}
public boolean waitForApproval(){
while (System.currentTimeMillis() < timeElapsed){
System.out.println("waiting for approval");
if (isAccepted) {
return true;
} else if (declined) {
return false;
}
}
return isAccepted;
}
#Override
public void run() {
if (!waitForApproval) {
// mustve timed out or user declined so remove from list and return thread immediately
tCollection.remove(userID);
// end the thread here
return;
}
// mustve been accepted so continue working
}
}
public class Controller {
public static ConcurrentHashMap<Integer, Thread> tCollection = new ConcurrentHashMap<>();
public static void main(String[] args) {
int barberID1 = 1;
int barberID2 = 2;
tCollection.put(barberID1, new UserAction(barberID1));
tCollection.put(barberID2, new UserAction(barberID2));
tCollection.get(barberID1).start();
tCollection.get(barberID2).start();
Thread.sleep(1000);
// simulate REST call accepting/declining job after 1 second. Usually this would be in a spring mvc RESTcontroller in a different class.
tCollection.get(barberID1).acceptJob();
tCollection.get(barberID2).declineJob();
}
}

You don't need (explicit) threads for this. Just a shared pool of task objects that are created on the first rest call.
When the second rest call comes, you already have a thread to use (the one that's handling the rest call). You just need to retrieve the task object according to the user id. You also need to get rid of expired tasks, which can be done with for example a DelayQueue.
Pseudocode:
public void rest1(User u) {
UserTask ut = new UserTask(u);
pool.put(u.getId(), ut);
delayPool.put(ut); // Assuming UserTask implements Delayed with a 5 minute delay
}
public void rest2(User u, Action a) {
UserTask ut = pool.get(u.getId());
if(!a.isAccepted() || ut == null)
pool.remove(u.getId());
else
process(ut);
// Clean up the pool from any expired tasks, can also be done in the beginning
// of the method, if you want to make sure that expired actions aren't performed
while((UserTask u = delayPool.poll()) != null)
pool.remove(u.getId());
}

There's a synchronization issue that you should make your flags isAccepted and isDeclined of class AtomicBoolean.
A critical concept is that you need to take steps to make sure changes to memory in one thread are communicated to other threads that need that data. They're called memory fences and they often occur implicitly between synchronization calls.
The idea of a (simple) Von Neumann architecture with a 'central memory' is false for most modern machines and you need to know data is being shared between caches/threads correctly.
Also as others suggest, creating a thread for each task is a poor model. It scales badly and leaves your application vulnerable to keeling over if too many tasks are submitted. There is some limit to memory so you can only have so many pending tasks at a time but the ceiling for threads will be much lower.
That will be made all the worse because you're spin waiting. Spin waiting puts a thread into a loop waiting for a condition. A better model would wait on a ConditionVariable so threads not doing anything (other than waiting) could be suspended by the operating system until notified that the thing they're waiting for is (or may be) ready.
There are often significant overheads in time and resources to creating and destroying threads. Given that most platforms can be simultaneously only executing a relatively small number of threads creating lots of 'expensive' threads to have them spend most of their time swapped out (suspended) doing nothing is very inefficient.
The right model launches a pool of a fixed number of threads (or relatively fixed number) and places tasks in a shared queue that the threads 'take' work from and process.
That model is known generically as a "Thread Pool".
The entry level implementation you should look at is ThreadPoolExecutor:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

How to check if the threads have completed its task or not?

OK, I created couples of threads to do some complex task. Now How may I check each threads whether it has completed successfully or not??
class BrokenTasks extends Thread {
public BrokenTasks(){
super();
}
public void run(){
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
task1.start();
.....
task4.start();
So how can I check if these all tasks completed successfully from
i) Main Program (Main Thread)
ii)From each consecutive threads.For example: checking if task1 had ended or not from within task2..

A good way to use threads is not to use them, directly. Instead make a thread pool. Then in your POJO task encapsulation have a field that is only set at the end of computation.
There might be 3-4 milliseconds delay when another thread can see the status - but finally the JVM makes it so. As long as other threads do not over write it. That you can protect by making sure each task has a unique instance of work to do and status, and other threads only poll that every 1-5 seconds or have a listener that the worker calls after completion.
A library I have used is my own
https://github.com/tgkprog/ddt/tree/master/DdtUtils/src/main/java/org/s2n/ddt/util/threads
To use : in server start or static block :
package org.s2n.ddt.util;
import org.apache.log4j.Logger;
import org.junit.Test;
import org.s2n.ddt.util.threads.PoolOptions;
import org.s2n.ddt.util.threads.DdtPools;
public class PoolTest {
private static final Logger logger = Logger.getLogger(PoolTest.class);
#Test
public void test() {
PoolOptions options = new PoolOptions();
options.setCoreThreads(2);
options.setMaxThreads(33);
DdtPools.initPool("a", options);
Do1 p = null;
for (int i = 0; i < 10; i++) {
p = new Do1();
DdtPools.offer("a", p);
}
LangUtils.sleep(3 + (int) (Math.random() * 3));
org.junit.Assert.assertNotNull(p);
org.junit.Assert.assertEquals(Do1.getLs(), 10);
}
}
class Do1 implements Runnable {
volatile static long l = 0;
public Do1() {
l++;
}
public void run() {
// LangUtils.sleep(1 + (int) (Math.random() * 3));
System.out.println("hi " + l);
}
public static long getLs() {
return l;
}
}
Things you should not do:
* Don't do things every 10-15 milliseconds
* Unless academic do not make your own thread
* don't make it more complex then it needs for 97% of cases

You can use Callable and ForkJoinPool for this task.
class BrokenTasks implements Callable {
public BrokenTasks(){
super();
}
public Object call() thrown Exception {
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
ForkJoinPool pool = new ForkJoinPool(4);
Future result1 = pool.submit(task1);
Future result2 = pool.submit(task2);
Future result3 = pool.submit(task3);
Future result4 = pool.submit(task4);
value4 = result4.get();//blocking call
value3 = result3.get();//blocking call
value2 = result2.get();//blocking call
value1 = result1.get();//blocking call
And don't forget to shutdown pool after that.

Classically you simply join on the threads you want to finish. Your thread does not proceed until join completes. For example:
// await all threads
task1.join();
task2.join();
task3.join();
task4.join();
// continue with main thread logic
(I probably would have put the tasks in a list for cleaner handling)

If a thread has not been completed its task then it is still alive. So for testing whether the thread has completed its task you can use isAlive() method.

There are two different questions here
One is if the thread still working.
The other one is if the task still not finished.
Thread is a very expensive method to solve problem, when we start a thread in java, the VM has to store context informations and solve synchronize problems(such as lock). So we usually use thread pool instead of directly thread. The benefit of thread pool is that we can use few thread to handle many different tasks. That means few threads keeps alive, while many tasks are finished.
Don’t find task status from a thread.
Thread is a worker, and tasks are jobs.
A thread may work on many different jobs one by one.
I don’t think we should ask a worker if he has finished a job. I’d rather ask the job if it is finished.
When I want to check if a job is finished, I use signals.
Use signals (synchronization aid)
There are many synchronization aid tools since JDK 1.5 works like a signal.
CountDownLatch
This object provides a counter(can be set only once and count down many times). This counter allows one or more threads to wait until a set of operations being performed in other threads completes.
CyclicBarrier
This is another useful signal that allows a set of threads to all wait for each other to reach a common barrier point.
more tools
More tools could be found in JDK java.util.concurrent package.

You can use Thread.isAlive method, see API: "A thread is alive if it has been started and has not yet died". That is in task2 run() you test task1.isAlive()
To see task1 from task2 you need to pass it as an argument to task2's construtor, or make tasks fields instead of local vars

You can use the following..
task1.join();
task2.join();
task3.join();
task4.join();
// and then check every thread by using isAlive() method
e.g : task1.isAlive();
if it return false means that thread had completed it's task
otherwise it will true

I'm not sure of your exact needs, but some Java application frameworks have handy abstractions for dealing with individual units of work or "jobs". The Eclipse Rich Client Platform comes to mind with its Jobs API. Although it may be overkill.
For plain old Java, look at Future, Callable and Executor.

Java: How Executors relate to Queues

So in Java concurrency, there is the concept of a task which is really any implementing Runnable or Callable (and, more specifically, the overridden run() or call() method of that interface).
I'm having a tough time understanding the relationship between:
A task (Runnable/Callable); and
An ExecutorService the task is submitted to; and
An underlying, concurrent work queue or list structure used by the ExecutorService
I believe the relationship is something of the following:
You, the developer, must select which ExecutorService and work structure best suits the task at hand
You initialize the ExecutorService (say, as a ScheduledThreadPool) with the underlying structure to use (say, an ArrayBlockingQueue) (if so, how?!?!)
You submit your task to the ExecutorService which then uses its threading/pooling strategy to populate the given structure (ABQ or otherwise) with copies of the task
Each spawned/pooled thread now pulls copies of the task off of the work structure and executes it
First off, please correct/clarify any of the above assumptions if I am off-base on any of them!
Second, if the task is simply copied/replicated over and over again inside the underlying work structure (e.g., identical copies in each index of a list), then how do you ever decompose a big problem down into smaller (concurrent) ones? In other words, if the task simply does steps A - Z, and you have an ABQ with 1,000 of those tasks, then won't each thread just do A - Z as well? How do you say "some threads should work on A - G, while other threads should work on H, and yet other threads should work on I - Z", etc.?
For this second one I might need a code example to visualize how it all comes together. Thanks in advance.

Your last assumption is not quite right. The ExecutorService does not pull copies of the task. The program must supply all tasks individually to be performed by the ExecutorService. When a task has finished, the next task in the queue is executed.
An ExecutorService is an interface for working with a thread pool. You generally have multiple tasks to be executed on the pool, and each operates on a different part of the problem. As the developer, you must specify which parts of the problem each task should work on when creating it, before sending it to the ExecutorService. The results of each task (assuming they are working on a common problem) should be added to a BlockingQueue or other concurrent collection, where another thread may use the results or wait for all tasks to finish.
Here is an article you may want to read about how to use an ExecutorService: http://www.vogella.com/articles/JavaConcurrency/article.html#threadpools
Update: A common use of the ExecutorService is to implement the producer/consumer pattern. Here is an example I quickly threw together to get you started--it is intended for demonstration purposes only, as some details and concerns have been omitted for simplicity. The thread pool contains multiple producer threads and one consumer thread. The job being performed is to sum the numbers from 0...N. Each producer thread sums a smaller interval of numbers, and publishes the result to the BlockingQueue. The consumer thread processes each result added to the BlockingQueue.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class NumberCounter {
private final ExecutorService pool = Executors.newFixedThreadPool(2);
private final BlockingQueue<Integer> queue = new ArrayBlockingQueue(100);
public void startCounter(int max, int workers) {
// Create multiple tasks to add numbers. Each task submits the result
// to the queue.
int increment = max / workers;
for (int worker = 0; worker < workers; worker++) {
Runnable task = createProducer(worker * increment, (worker + 1) * increment);
pool.execute(task);
}
// Create one more task that will consume the numbers, adding them up
// and printing the results.
pool.execute(new Runnable() {
#Override
public void run() {
int sum = 0;
while (true) {
try {
Integer result = queue.take();
sum += result;
System.out.println("New sum is " + sum);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
}
private Runnable createProducer(final int start, final int stop) {
return new Runnable() {
#Override
public void run() {
System.out.println("Worker started counting from " + start + " to " + stop);
int count = 0;
for (int i = start; i < stop; i++) {
count += i;
}
queue.add(count);
}
};
}
public static void main(String[] args) throws InterruptedException {
NumberCounter counter = new NumberCounter();
counter.startCounter(10000, 5);
}
}

ExecutorService's surprising performance break-even point --- rules of thumb?

I'm trying to figure out how to correctly use Java's Executors. I realize submitting tasks to an ExecutorService has its own overhead. However, I'm surprised to see it is as high as it is.
My program needs to process huge amount of data (stock market data) with as low latency as possible. Most of the calculations are fairly simple arithmetic operations.
I tried to test something very simple: "Math.random() * Math.random()"
The simplest test runs this computation in a simple loop. The second test does the same computation inside a anonymous Runnable (this is supposed to measure the cost of creating new objects). The third test passes the Runnable to an ExecutorService (this measures the cost of introducing executors).
I ran the tests on my dinky laptop (2 cpus, 1.5 gig ram):
(in milliseconds)
simpleCompuation:47
computationWithObjCreation:62
computationWithObjCreationAndExecutors:422
(about once out of four runs, the first two numbers end up being equal)
Notice that executors take far, far more time than executing on a single thread. The numbers were about the same for thread pool sizes between 1 and 8.
Question: Am I missing something obvious or are these results expected? These results tell me that any task I pass in to an executor must do some non-trivial computation. If I am processing millions of messages, and I need to perform very simple (and cheap) transformations on each message, I still may not be able to use executors...trying to spread computations across multiple CPUs might end up being costlier than just doing them in a single thread. The design decision becomes much more complex than I had originally thought. Any thoughts?
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ExecServicePerformance {
private static int count = 100000;
public static void main(String[] args) throws InterruptedException {
//warmup
simpleCompuation();
computationWithObjCreation();
computationWithObjCreationAndExecutors();
long start = System.currentTimeMillis();
simpleCompuation();
long stop = System.currentTimeMillis();
System.out.println("simpleCompuation:"+(stop-start));
start = System.currentTimeMillis();
computationWithObjCreation();
stop = System.currentTimeMillis();
System.out.println("computationWithObjCreation:"+(stop-start));
start = System.currentTimeMillis();
computationWithObjCreationAndExecutors();
stop = System.currentTimeMillis();
System.out.println("computationWithObjCreationAndExecutors:"+(stop-start));
}
private static void computationWithObjCreation() {
for(int i=0;i<count;i++){
new Runnable(){
#Override
public void run() {
double x = Math.random()*Math.random();
}
}.run();
}
}
private static void simpleCompuation() {
for(int i=0;i<count;i++){
double x = Math.random()*Math.random();
}
}
private static void computationWithObjCreationAndExecutors()
throws InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(1);
for(int i=0;i<count;i++){
es.submit(new Runnable() {
#Override
public void run() {
double x = Math.random()*Math.random();
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
}
}

Using executors is about utilizing CPUs and / or CPU cores, so if you create a thread pool that utilizes the amount of CPUs at best, you have to have as many threads as CPUs / cores.
You are right, creating new objects costs too much. So one way to reduce the expenses is to use batches. If you know the kind and amount of computations to do, you create batches. So think about thousand(s) computations done in one executed task. You create batches for each thread. As soon as the computation is done (java.util.concurrent.Future), you create the next batch. Even the creation of new batches can be done in parralel (4 CPUs -> 3 threads for computation, 1 thread for batch provisioning). In the end, you may end up with more throughput, but with higher memory demands (batches, provisioning).
Edit: I changed your example and I let it run on my little dual-core x200 laptop.
provisioned 2 batches to be executed
simpleCompuation:14
computationWithObjCreation:17
computationWithObjCreationAndExecutors:9
As you see in the source code, I took the batch provisioning and executor lifecycle out of the measurement, too. That's more fair compared to the other two methods.
See the results by yourself...
import java.util.List;
import java.util.Vector;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ExecServicePerformance {
private static int count = 100000;
public static void main( String[] args ) throws InterruptedException {
final int cpus = Runtime.getRuntime().availableProcessors();
final ExecutorService es = Executors.newFixedThreadPool( cpus );
final Vector< Batch > batches = new Vector< Batch >( cpus );
final int batchComputations = count / cpus;
for ( int i = 0; i < cpus; i++ ) {
batches.add( new Batch( batchComputations ) );
}
System.out.println( "provisioned " + cpus + " batches to be executed" );
// warmup
simpleCompuation();
computationWithObjCreation();
computationWithObjCreationAndExecutors( es, batches );
long start = System.currentTimeMillis();
simpleCompuation();
long stop = System.currentTimeMillis();
System.out.println( "simpleCompuation:" + ( stop - start ) );
start = System.currentTimeMillis();
computationWithObjCreation();
stop = System.currentTimeMillis();
System.out.println( "computationWithObjCreation:" + ( stop - start ) );
// Executor
start = System.currentTimeMillis();
computationWithObjCreationAndExecutors( es, batches );
es.shutdown();
es.awaitTermination( 10, TimeUnit.SECONDS );
// Note: Executor#shutdown() and Executor#awaitTermination() requires
// some extra time. But the result should still be clear.
stop = System.currentTimeMillis();
System.out.println( "computationWithObjCreationAndExecutors:"
+ ( stop - start ) );
}
private static void computationWithObjCreation() {
for ( int i = 0; i < count; i++ ) {
new Runnable() {
#Override
public void run() {
double x = Math.random() * Math.random();
}
}.run();
}
}
private static void simpleCompuation() {
for ( int i = 0; i < count; i++ ) {
double x = Math.random() * Math.random();
}
}
private static void computationWithObjCreationAndExecutors(
ExecutorService es, List< Batch > batches )
throws InterruptedException {
for ( Batch batch : batches ) {
es.submit( batch );
}
}
private static class Batch implements Runnable {
private final int computations;
public Batch( final int computations ) {
this.computations = computations;
}
#Override
public void run() {
int countdown = computations;
while ( countdown-- > -1 ) {
double x = Math.random() * Math.random();
}
}
}
}

This is not a fair test for the thread pool for following reasons,
You are not taking advantage of the pooling at all because you only have 1 thread.
The job is too simple that the pooling overhead can't be justified. A multiplication on a CPU with FPP only takes a few cycles.
Considering following extra steps the thread pool has to do besides object creation and the running the job,
Put the job in the queue
Remove the job from queue
Get the thread from the pool and execute the job
Return the thread to the pool
When you have a real job and multiple threads, the benefit of the thread pool will be apparent.

The 'overhead' you mention is nothing to do with ExecutorService, it is caused by multiple threads synchronizing on Math.random, creating lock contention.
So yes, you are missing something (and the 'correct' answer below is not actually correct).
Here is some Java 8 code to demonstrate 8 threads running a simple function in which there is no lock contention:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.DoubleFunction;
import com.google.common.base.Stopwatch;
public class ExecServicePerformance {
private static final int repetitions = 120;
private static int totalOperations = 250000;
private static final int cpus = 8;
private static final List<Batch> batches = batches(cpus);
private static DoubleFunction<Double> performanceFunc = (double i) -> {return Math.sin(i * 100000 / Math.PI); };
public static void main( String[] args ) throws InterruptedException {
printExecutionTime("Synchronous", ExecServicePerformance::synchronous);
printExecutionTime("Synchronous batches", ExecServicePerformance::synchronousBatches);
printExecutionTime("Thread per batch", ExecServicePerformance::asynchronousBatches);
printExecutionTime("Executor pool", ExecServicePerformance::executorPool);
}
private static void printExecutionTime(String msg, Runnable f) throws InterruptedException {
long time = 0;
for (int i = 0; i < repetitions; i++) {
Stopwatch stopwatch = Stopwatch.createStarted();
f.run(); //remember, this is a single-threaded synchronous execution since there is no explicit new thread
time += stopwatch.elapsed(TimeUnit.MILLISECONDS);
}
System.out.println(msg + " exec time: " + time);
}
private static void synchronous() {
for ( int i = 0; i < totalOperations; i++ ) {
performanceFunc.apply(i);
}
}
private static void synchronousBatches() {
for ( Batch batch : batches) {
batch.synchronously();
}
}
private static void asynchronousBatches() {
CountDownLatch cb = new CountDownLatch(cpus);
for ( Batch batch : batches) {
Runnable r = () -> { batch.synchronously(); cb.countDown(); };
Thread t = new Thread(r);
t.start();
}
try {
cb.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void executorPool() {
final ExecutorService es = Executors.newFixedThreadPool(cpus);
for ( Batch batch : batches ) {
Runnable r = () -> { batch.synchronously(); };
es.submit(r);
}
es.shutdown();
try {
es.awaitTermination( 10, TimeUnit.SECONDS );
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static List<Batch> batches(final int cpus) {
List<Batch> list = new ArrayList<Batch>();
for ( int i = 0; i < cpus; i++ ) {
list.add( new Batch( totalOperations / cpus ) );
}
System.out.println("Batches: " + list.size());
return list;
}
private static class Batch {
private final int operationsInBatch;
public Batch( final int ops ) {
this.operationsInBatch = ops;
}
public void synchronously() {
for ( int i = 0; i < operationsInBatch; i++ ) {
performanceFunc.apply(i);
}
}
}
}
Result timings for 120 tests of 25k operations (ms):
Synchronous exec time: 9956
Synchronous batches exec time: 9900
Thread per batch exec time: 2176
Executor pool exec time: 1922
Winner: Executor Service.

I don't think this is at all realistic since you're creating a new executor service every time you make the method call. Unless you have very strange requirements that seems unrealistic - typically you'd create the service when your app starts up, and then submit jobs to it.
If you try the benchmarking again but initialise the service as a field, once, outside the timing loop; then you'll see the actual overhead of submitting Runnables to the service vs. running them yourself.
But I don't think you've grasped the point fully - Executors aren't meant to be there for efficiency, they're there to make co-ordinating and handing off work to a thread pool simpler. They will always be less efficient than just invoking Runnable.run() yourself (since at the end of the day the executor service still needs to do this, after doing some extra housekeeping beforehand). It's when you are using them from multiple threads needing asynchronous processing, that they really shine.
Also consider that you're looking at the relative time difference of a basically fixed cost (Executor overhead is the same whether your tasks take 1ms or 1hr to run) compared to a very small variable amount (your trivial runnable). If the executor service takes 5ms extra to run a 1ms task, that's not a very favourable figure. If it takes 5ms extra to run a 5 second task (e.g. a non-trivial SQL query), that's completely negligible and entirely worth it.
So to some extent it depends on your situation - if you have an extremely time-critical section, running lots of small tasks, that don't need to be executed in parallel or asynchronously then you'll get nothing from an Executor. If you're processing heavier tasks in parallel and want to respond asynchronously (e.g. a webapp) then Executors are great.
Whether they are the best choice for you depends on your situation, but really you need to try the tests with realistic representative data. I don't think it would be appropriate to draw any conclusions from the tests you've done unless your tasks really are that trivial (and you don't want to reuse the executor instance...).

Math.random() actually synchronizes on a single Random number generator. Calling Math.random() results in significant contention for the number generator. In fact the more threads you have, the slower it's going to be.
From the Math.random() javadoc:
This method is properly synchronized to allow correct use by more than
one thread. However, if many threads need to generate pseudorandom
numbers at a great rate, it may reduce contention for each thread to
have its own pseudorandom-number generator.

Firstly there's a few issues with the microbenchmark. You do a warm up, which is good. However, it is better to run the test multiple times, which should give a feel as to whether it has really warmed up and the variance of the results. It also tends to be better to do the test of each algorithm in separate runs, otherwise you might cause deoptimisation when an algorithm changes.
The task is very small, although I'm not entirely sure how small. So number of times faster is pretty meaningless. In multithreaded situations, it will touch the same volatile locations so threads could cause really bad performance (use a Random instance per thread). Also a run of 47 milliseconds is a bit short.
Certainly going to another thread for a tiny operation is not going to be fast. Split tasks up into bigger sizes if possible. JDK7 looks as if it will have a fork-join framework, which attempts to support fine tasks from divide and conquer algorithms by preferring to execute tasks on the same thread in order, with larger tasks pulled out by idle threads.

Here are results on my machine (OpenJDK 8 on 64-bit Ubuntu 14.0, Thinkpad W530)
simpleCompuation:6
computationWithObjCreation:5
computationWithObjCreationAndExecutors:33
There's certainly overhead. But remember what these numbers are: milliseconds for 100k iterations. In your case, the overhead was about 4 microseconds per iteration. For me, the overhead was about a quarter of a microsecond.
The overhead is synchronization, internal data structures, and possibly a lack of JIT optimization due to complex code paths (certainly more complex than your for loop).
The tasks that you'd actually want to parallelize would be worth it, despite the quarter microsecond overhead.
FYI, this would be a very bad computation to parallelize. I upped the thread to 8 (the number of cores):
simpleCompuation:5
computationWithObjCreation:6
computationWithObjCreationAndExecutors:38
It didn't make it any faster. This is because Math.random() is synchronized.

The Fixed ThreadPool's ultimate porpose is to reuse already created threads. So the performance gains are seen in the lack of the need to recreate a new thread every time a task is submitted. Hence the stop time must be taken inside the submitted task. Just with in the last statement of the run method.

You need to somehow group execution, in order to submit larger portions of computation to each thread (e.g. build groups based on stock symbol).
I got best results in similar scenarios by using the Disruptor. It has a very low per-job overhead. Still its important to group jobs, naive round robin usually creates many cache misses.
see http://java-is-the-new-c.blogspot.de/2014/01/comparision-of-different-concurrency.html

In case it is useful to others, here are test results with a realistic scenario - use ExecutorService repeatedly until the end of all tasks - on a Samsung Android device.
Simple computation (MS): 102
Use threads (MS): 31049
Use ExecutorService (MS): 257
Code:
ExecutorService executorService = Executors.newFixedThreadPool(1);
int count = 100000;
//Simple computation
Instant instant = Instant.now();
for (int i = 0; i < count; i++) {
double x = Math.random() * Math.random();
}
Duration duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Simple computation (MS): " + duration.toMillis());
//Use threads
instant = Instant.now();
for (int i = 0; i < count; i++) {
new Thread(() -> {
double x = Math.random() * Math.random();
}
).start();
}
duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Use threads (MS): " + duration.toMillis());
//Use ExecutorService
instant = Instant.now();
for (int i = 0; i < count; i++) {
executorService.execute(() -> {
double x = Math.random() * Math.random();
}
);
}
duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Use ExecutorService (MS): " + duration.toMillis());

I've faced a similar problem, but Math.random() was not the issue.
The problem is having many small tasks that take just a few milliseconds to complete. It is not much but a lot of small tasks in series ends up being a lot of time and I needed to parallelize.
So, the solution I found, and it might work for those of you facing this same problem: do not use any of the executor services. Instead create your own long living Threads and feed them tasks.
Here is an example, just as an idea don't try to copy paste it cause it probably won't work as I am using Kotlin and translating to Java in my head. The concept is what's important:
First, the Thread, a Thread that can execute a task and then continue there waiting for the next one:
public class Worker extends Thread {
private Callable task;
private Semaphore semaphore;
private CountDownLatch latch;
public Worker(Semaphore semaphore) {
this.semaphore = semaphore;
}
public void run() {
while (true) {
semaphore.acquire(); // this will block, the while(true) won't go crazy
if (task == null) continue;
task.run();
if (latch != null) latch.countDown();
task = null;
}
}
public void setTask(Callable task) {
this.task = task;
}
public void setCountDownLatch(CountDownLatch latch) {
this.latch = latch;
}
}
There is two things here that need explanation:
the Semaphore: gives you control over how many tasks and when they are executed by this thread
the CountDownLatch: is the way to notify someone else that a task was completed
So this is how you would use this Worker, first just a simple example:
Semaphore semaphore = new Semaphore(0); // initially the semaphore is closed
Worker worker = new Worker(semaphore);
worker.start();
worker.setTask( .. your callable task .. );
semaphore.release(); // this will allow one task to be processed by the worker
Now a more complicated example, with two Threads and waiting for both to complete using the CountDownLatch:
Semaphore semaphore1 = new Semaphore(0);
Worker worker1 = new Worker(semaphore1);
worker1.start();
Semaphore semaphore2 = new Semaphore(0);
Worker worker2 = new Worker(semaphore2);
worker2.start();
// same countdown latch for both workers, with a counter of 2
CountDownLatch countDownLatch = new CountDownLatch(2);
worker1.setCountDownLatch(countDownLatch);
worker2.setCountDownLatch(countDownLatch);
worker1.setTask( .. your callable task .. );
worker2.setTask( .. your callable task .. );
semaphore1.release();
semaphore2.release();
countDownLatch.await(); // this will block until 2 tasks have been completed
And after that code runs you could just add more tasks to the same threads and reuse them. That's the whole point of this, reusing the threads instead of creating new ones.
It is unpolished as f*** but hopefully this gives you an idea. For me this was an improvement compared to no multi threading. And it was much much better than any executor service with any number of threads in the pool by far.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.