I am building a backend service whereby a REST call to my service creates a new thread. The thread waits for another REST call if it does not receive anything by say 5 minutes the thread will die.
To keep track of all the threads I have a collection that keeps track of all the currently running threads so that when the REST call finally comes in such as a user accepting or declining an action, I can then identify that thread using the userID. If its declined we will just remove that thread from the collection if its accepted the thread can carry on doing the next action. i have implemented this using a ConcurrentMap to avoid concurrency issues.
Since this is my first time working with threads I want to make sure that I am not overlooking any issues that may arise. Please have a look at my code and tell me if I could do it better or if there's any flaws.
public class UserAction extends Thread {
int userID;
boolean isAccepted = false;
boolean isDeclined = false;
long timeNow = System.currentTimeMillis();
long timeElapsed = timeNow + 50000;
public UserAction(int userID) {
this.userID = userID;
}
public void declineJob() {
this.isDeclined = true;
}
public void acceptJob() {
this.isAccepted = true;
}
public boolean waitForApproval(){
while (System.currentTimeMillis() < timeElapsed){
System.out.println("waiting for approval");
if (isAccepted) {
return true;
} else if (declined) {
return false;
}
}
return isAccepted;
}
#Override
public void run() {
if (!waitForApproval) {
// mustve timed out or user declined so remove from list and return thread immediately
tCollection.remove(userID);
// end the thread here
return;
}
// mustve been accepted so continue working
}
}
public class Controller {
public static ConcurrentHashMap<Integer, Thread> tCollection = new ConcurrentHashMap<>();
public static void main(String[] args) {
int barberID1 = 1;
int barberID2 = 2;
tCollection.put(barberID1, new UserAction(barberID1));
tCollection.put(barberID2, new UserAction(barberID2));
tCollection.get(barberID1).start();
tCollection.get(barberID2).start();
Thread.sleep(1000);
// simulate REST call accepting/declining job after 1 second. Usually this would be in a spring mvc RESTcontroller in a different class.
tCollection.get(barberID1).acceptJob();
tCollection.get(barberID2).declineJob();
}
}
You don't need (explicit) threads for this. Just a shared pool of task objects that are created on the first rest call.
When the second rest call comes, you already have a thread to use (the one that's handling the rest call). You just need to retrieve the task object according to the user id. You also need to get rid of expired tasks, which can be done with for example a DelayQueue.
Pseudocode:
public void rest1(User u) {
UserTask ut = new UserTask(u);
pool.put(u.getId(), ut);
delayPool.put(ut); // Assuming UserTask implements Delayed with a 5 minute delay
}
public void rest2(User u, Action a) {
UserTask ut = pool.get(u.getId());
if(!a.isAccepted() || ut == null)
pool.remove(u.getId());
else
process(ut);
// Clean up the pool from any expired tasks, can also be done in the beginning
// of the method, if you want to make sure that expired actions aren't performed
while((UserTask u = delayPool.poll()) != null)
pool.remove(u.getId());
}
There's a synchronization issue that you should make your flags isAccepted and isDeclined of class AtomicBoolean.
A critical concept is that you need to take steps to make sure changes to memory in one thread are communicated to other threads that need that data. They're called memory fences and they often occur implicitly between synchronization calls.
The idea of a (simple) Von Neumann architecture with a 'central memory' is false for most modern machines and you need to know data is being shared between caches/threads correctly.
Also as others suggest, creating a thread for each task is a poor model. It scales badly and leaves your application vulnerable to keeling over if too many tasks are submitted. There is some limit to memory so you can only have so many pending tasks at a time but the ceiling for threads will be much lower.
That will be made all the worse because you're spin waiting. Spin waiting puts a thread into a loop waiting for a condition. A better model would wait on a ConditionVariable so threads not doing anything (other than waiting) could be suspended by the operating system until notified that the thing they're waiting for is (or may be) ready.
There are often significant overheads in time and resources to creating and destroying threads. Given that most platforms can be simultaneously only executing a relatively small number of threads creating lots of 'expensive' threads to have them spend most of their time swapped out (suspended) doing nothing is very inefficient.
The right model launches a pool of a fixed number of threads (or relatively fixed number) and places tasks in a shared queue that the threads 'take' work from and process.
That model is known generically as a "Thread Pool".
The entry level implementation you should look at is ThreadPoolExecutor:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
Related
I'm working on a problem, that was supposed to be VERY simple to solve, however I am not getting it done so easily.
The problem is quite simple: I have a Java Program running on Linux/x86 that can perform two basic functionalities F1 and F2. I would like to set F1 to have a higher priority, even though it is a MUST that F2 executes from times to times, i.e, the fact of having F1 requests on queue cannot put F2 requests waiting forever.
My first though was just having separate queues with a thread pool for each functionality, I set the F1 pool to have 8 threads while the F2 pool got only 2 threads.
On my expectaion linux would give fairly time share for each thread, so F1 would have 8 quantums while F2 would get just 2. If there was no F1 requests, F2 pool could get every quantum to itself, the same should be true for F1 just in case F2 has no requests.
However, the program is not behavingthat way, if I get a burst of F2 requests and just couple of F1 requets, the latter is taking a long time to get its turn.
Doest that make sense talking about Oracle HotSpot/linux scheduling? Or it should not be happening, what would point to an implementation error from my part?
PS: I've read about linux scheduling, and it seems that SCHED_OTHER (TS) gives time share for each task, however every time a task ready is not executed it gets a bigger quantum, and if that is happening to F2 pool, that might explain the above mentioned behavior.
Thanks and Regards.
Below there is a sample source code.
package test;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
/**
* Created by giscardff on 08/07/18.
*/
public class TestThread {
// Test Program
public static void main(String args[]) throws Exception {
// queues containing jobs to be done
ArrayBlockingQueue<MyDTO> queueA = new ArrayBlockingQueue<>(100);
ArrayBlockingQueue<MyDTO> queueB = new ArrayBlockingQueue<>(100);
// create pool for functionality A
for(int i = 1; i <= 8; i++){
MyThread thread = new MyThread("ThreadA" + i, queueA);
thread.start();
}
// create pool for functionality B
for(int i = 1; i <= 2; i++){
MyThread thread = new MyThread("ThreadB" + i, queueB);
thread.start();
}
// create producer for A
// it will take 100ms between requests
Producer producerA = new Producer(queueA, 0);
producerA.start();
// create producer for B
// it will take 0ms between requests
Producer producerB = new Producer(queueB, 0);
producerB.start();
}
}
/**
* Just put a request into a queue
*/
class Producer extends Thread {
private ArrayBlockingQueue<MyDTO> queue;
private long sleep;
public Producer(ArrayBlockingQueue<MyDTO> queue, long sleep){
this.queue = queue;
this.sleep = sleep;
}
#Override
public void run() {
try {
while (true) {
if(sleep > 0)Thread.sleep(sleep);
queue.put(new MyDTO());
}
}catch(Exception ex){}
}
}
/**
* Retrieve a request from a queue, calculate how long request took to
* be received for each 1M requests
*/
class MyThread extends Thread {
private ArrayBlockingQueue<MyDTO> queue;
private long delay = 0;
private int count = 0;
public MyThread(String name, ArrayBlockingQueue<MyDTO> queue){
super(name);
this.queue = queue;
}
#Override
public void run() {
try {
while (true) {
MyDTO input = queue.take();
delay += System.currentTimeMillis() - Long.parseLong(input.getTime());
if(++count % 1000 == 0){
System.out.printf("%s: %d\n", getName(), delay / 10);
count = 0;
}
}
}catch(Exception ex){ex.printStackTrace();}
}
}
/**
* Just a DTO representing a request
* NOTE: The time was set as String to force CPU to do something more than just math operations
*/
class MyDTO {
private String time;
public MyDTO(){
this.time = "" + System.currentTimeMillis();
}
public String getTime() {
return time;
}
}
It looks like you've got a few issues. I'll try to summarize them and provide starting point for a path forward:
Thread Contention
Using the BlockingQueue comes with a cost - every write operation (put & take) is lock contended between the producers or consumers. Your "A pool" has 9 threads contending over the write lock for queueA (1 producer, 8 consumers), while your "B pool" has 3 threads contending over the lock for queueB (1 producer, 2 consumers).
This related answer provides a bit more detail about contention. The simplest ways around this are to "use less threads" or use "lock-free" mechanisms to eliminate the contention.
Thread Scheduling
As mentioned in the comments, you're at the mercy of how the JVM is scheduling your threads.
If java thread scheduling used perfectly fair time shares on the CPU, you'd probably see consumption counts on each thread in the same pool extremely close to each other. You've probably noticed they're not - my runs of your (slightly modified) code occasionally give me a count spread of 300K or more across the threads.
You can often get this better when there are enough CPU cores for each CPU-bound thread (you've got 12 in your sample code), but it's far from ideal in many cases, especially in the face of thread contention.
What can you do?
Build your own fairness logic - don't rely on the JVM thread scheduler to be fair, because it won't be.
A simple idea for your case would be to keep both queues, but use a single pool to process both - use either round-robin or Math.random() (ie: if (rand < 0.8) { queueA.poll();}) to determine which queue to poll from. Note - Use poll so you can easily handle the case when a queue is empty without blocking.
Experiment with number of CPU-bound threads running on your hardware. With my suggestion for (1) above, you could even have just one worker thread fairly processing both queues. Remember, too many threads contending over the same resources will slow down your processing.
Isn't threading fun? :)
I have the following method:
void store(SomeObject o) {
}
The idea of this method is to store o to a permanent storage but the function should not block. I.e. I can not/must not do the actual storage in the same thread that called store.
I can not also start a thread and store the object from the other thread because store might be called a "huge" amount of times and I don't want to start spawning threads.
So I options which I don't see how they can work well:
1) Use a thread pool (Executor family)
2) In store store the object in an array list and return. When the array list reaches e.g. 1000 (random number) then start another thread to "flush" the array list to storage. But I would still possibly have the problem of too many threads (thread pool?)
So in both cases the only requirement I have is that I store persistantly the objects in exactly the same order that was passed to store. And using multiple threads mixes things up.
How can this be solved?
How can I ensure:
1) Non blocking store
2) Accurate insertion order
3) I don't care about any storage guarantees. If e.g. something crashes I don't care about losing data e.g. cached in the array list before storing them.
I would use a SingleThreadExecutor and a BlockingQueue.
SingleThreadExecutor as the name sais has one single Thread. Use it to poll from the Queue and persist objects, blocking if empty.
You can add not blocking to the queue in your store method.
EDIT
Actually, you do not even need that extra Queue - JavaDoc of newSingleThreadExecutor sais:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So I think it's exactly what you need.
private final ExecutorService persistor = Executors.newSingleThreadExecutor();
public void store( final SomeObject o ){
persistor.submit( new Runnable(){
#Override public void run(){
// your persist-code here.
}
} );
}
The advantage of using a Runnable that has a quasi-endless-loop and using an extra queue would be the possibility to code some "Burst"-functionality. For example you could make it wait to persist only when 10 elements are in queue or the oldest element has been added at least 1 minute ago ...
I suggest using a Chronicle-Queue which is a library I designed.
It allows you to write in the current thread without blocking. It was originally designed for low latency trading systems. For small messages it takes around 300 ns to write a message.
You don't need to use a back ground thread, or a on heap queue and it doesn't wait for the data to be written to disk by default. It also ensures consistent order for all readers. If the program dies at any point after you call finish() the message is not lost. (Unless the OS crashes/loses power) It also supports replication to avoid data loss.
Have one separate thread that gets items from the end of a queue (blocking on an empty queue), and writes them to disk. Your main thread's store() function just adds items to the beginning of the queue.
Here's a rough idea (though I assume there will be cleaner or faster ways for doing this in production code, depending on how fast you need things to be):
import java.util.*;
import java.io.*;
import java.util.concurrent.*;
class ObjectWriter implements Runnable {
private final Object END = new Object();
BlockingQueue<Object> queue = new LinkedBlockingQueue();
public void store(Object o) throws InterruptedException {
queue.put(o);
}
public ObjectWriter() {
new Thread(this).start();
}
public void close() throws InterruptedException {
queue.put(END);
}
public void run() {
while (true) {
try {
Object o = queue.take();
if (o == END) {
// close output file.
return;
}
System.out.println(o.toString()); // serialize as appropriate
} catch (InterruptedException e) {
}
}
}
}
public class Test {
public static void main(String[] args) throws Exception {
ObjectWriter w = new ObjectWriter();
w.store("hello");
w.store("world");
w.close();
}
}
The comments in your question make it sound like you are unfamilier with multi-threading, but it's really not that difficult.
You simply need another thread responsible for writing to the storage which picks items off a queue. - your store function just adds the objects to the in-memory queue and continues on it's way.
Some psuedo-ish code:
final List<SomeObject> queue = new List<SomeObject>();
void store(SomeObject o) {
// add it to the queue - note that modifying o after this will also alter the
// instance in the queue
synchronized(queue) {
queue.add(queue);
queue.notify(); // tell the storage thread there's something in the queue
}
}
void storageThread() {
SomeObject item;
while (notfinished) {
synchronized(queue) {
if (queue.length > 0) {
item = queue.get(0); // get from start to ensure same order
queue.removeAt(0);
} else {
// wait for something
queue.wait();
continue;
}
}
writeToStorage(item);
}
}
I have a ArrayBlocking queue, , upon which a single thread fixed rate Scheduled works.
I may have failed task. I want re-run that or re-insert in queue at high priority or top level
Some thoughts here -
Why are you using ArrayBlockingQueue and not PriorityBlockingQueue ? Sounds like what you need to me . At first set all your elements to be with equal priority.
In case you receive an exception - re-insert to the queue with a higher priority
Simplest thing might be a priority queue. Attach a retry number to the task. It starts as zero. After an unsuccessful run, throw away all the ones and increment the zeroes and put them back in the queue at a high priority. With this method, you can easily decide to run everything three times, or more, if you want to later. The down side is you have to modify the task class.
The other idea would be to set up another, non-blocking, thread-safe, high-priority queue. When looking for a new task, you check the non-blocking queue first and run what's there. Otherwise, go to the blocking queue. This might work for you as is, and so far it's the simplest solution. The problem is the high priority queue might fill up while the scheduler is blocked on the blocking queue.
To get around this, you'd have to do your own blocking. Both queues should be non-blocking. (Suggestion: java.util.concurrent.ConcurrentLinkedQueue.) After polling both queues with no results, wait() on a monitor. When anything puts something in a queue, it should call notifyAll() and the scheduler can start up again. Great care is needed lest the notification occur after the scheduler has checked both queues but before it calls wait().
Addition:
Prototype code for third solution with manual blocking. Some threading is suggested, but the reader will know his/her own situation best. Which bits of code are apt to block waiting for a lock, which are apt to tie up their thread (and core) for minutes while doing extensive work, and which cannot afford to sit around waiting for the other code to finish all needs to be considered. For instance, if a failed run can immediately be rerun on the same thread with no time-consuming cleanup, most of this code can be junked.
private final ConcurrentLinkedQueue mainQueue = new ConcurrentLinkedQueue();
private final ConcurrentLinkedQueue prioQueue = new ConcurrentLinkedQueue();
private final Object entryWatch = new Object();
/** Adds a new job to the queue. */
public void addjob( Runnable runjob ) {
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
/** The endless loop that does the work. */
public void schedule() {
for (;;) {
Runnable run = getOne(); // Avoids lock if successful.
if (run == null) {
// Both queues are empty.
synchronized (entryWatch) {
// Need to check again. Someone might have added and notifiedAll
// since last check. From this point until, wait, we can be sure
// entryWatch is not notified.
run = getOne();
if (run == null) {
// Both queues are REALLY empty.
try { entryWatch.wait(); }
catch (InterruptedException ie) {}
}
}
}
runit( run );
}
}
/** Helper method for the endless loop. */
private Runnable getOne() {
Runnable run = (Runnable) prioQueue.poll();
if (run != null) return run;
return (Runnable) mainQueue.poll();
}
/** Runs a new job. */
public void runit( final Runnable runjob ) {
// Do everthing in another thread. (Optional)
new Thread() {
#Override public void run() {
// Run run. (Possibly in own thread?)
// (Perhaps best in thread from a thread pool.)
runjob.run();
// Handle failure (runit only, NOT in runitLast).
// Defining "failure" left as exercise for reader.
if (failure) {
// Put code here to handle failure.
// Put back in queue.
prioQueue.add( runjob );
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
}
}.start();
}
/** Reruns a job. */
public void runitLast( final Runnable runjob ) {
// Same code as "runit", but don't put "runjob" in "prioQueue" on failure.
}
I have a queue that contains work items and I want to have multiple threads work in parallel on those items. When a work item is processed it may result in new work items. The problem I have is that I can't find a solution on how to determine if I'm done. The worker looks like that:
public class Worker implements Runnable {
public void run() {
while (true) {
WorkItem item = queue.nextItem();
if (item != null) {
processItem(item);
}
else {
// the queue is empty, but there may still be other workers
// processing items which may result in new work items
// how to determine if the work is completely done?
}
}
}
}
This seems like a pretty simple problem actually but I'm at a loss. What would be the best way to implement that?
thanks
clarification:
The worker threads have to terminate once none of them is processing an item, but as long as at least one of them is still working they have to wait because it may result in new work items.
What about using an ExecutorService which will allow you to wait for all tasks to finish: ExecutorService, how to wait for all tasks to finish
I'd suggest wait/notify calls. In the else case, your worker threads would wait on an object until notified by the queue that there is more work to do. When a worker creates a new item, it adds it to the queue, and the queue calls notify on the object the workers are waiting on. One of them will wake up to consume the new item.
The methods wait, notify, and notifyAll of class Object support an efficient transfer of control from one thread to another. Rather than simply "spinning" (repeatedly locking and unlocking an object to see whether some internal state has changed), which consumes computational effort, a thread can suspend itself using wait until such time as another thread awakens it using notify. This is especially appropriate in situations where threads have a producer-consumer relationship (actively cooperating on a common goal) rather than a mutual exclusion relationship (trying to avoid conflicts while sharing a common resource).
Source: Threads and Locks
I'd look at something higher level than wait/notify. It's very difficult to get right and avoid deadlocks. Have you looked at java.util.concurrent.CompletionService<V>? You could have a simpler manager thread that polls the service and take()s the results, which may or may not contain a new work item.
Using a BlockingQueue containing items to process along with a synchronized set that keeps track of all elements being processed currently:
BlockingQueue<WorkItem> bQueue;
Set<WorkItem> beingProcessed = new Collections.synchronizedSet(new HashMap<WorkItem>());
bQueue.put(workItem);
...
// the following runs over many threads in parallel
while (!(bQueue.isEmpty() && beingProcessed.isEmpty())) {
WorkItem currentItem = bQueue.poll(50L, TimeUnit.MILLISECONDS); // null for empty queue
if (currentItem != null) {
beingProcessed.add(currentItem);
processItem(currentItem); // possibly bQueue.add(newItem) is called from processItem
beingProcessed.remove(currentItem);
}
}
EDIT: as #Hovercraft Full Of Eels suggested, an ExecutorService is probably what you should really use. You can add new tasks as you go along. You can semi-busy wait for termination of all tasks at regular interval with executorService.awaitTermination(time, timeUnits) and kill all your threads after that.
Here's the beginnings of a queue to solve your problem. bascially, you need to track new work and in process work.
public class WorkQueue<T> {
private final List<T> _newWork = new LinkedList<T>();
private int _inProcessWork;
public synchronized void addWork(T work) {
_newWork.add(work);
notifyAll();
}
public synchronized T startWork() throws InterruptedException {
while(_newWork.isEmpty() && (_inProcessWork > 0)) {
wait();
if(!_newWork.isEmpty()) {
_inProcessWork++;
return _newWork.remove(0);
}
}
// everything is done
return null;
}
public synchronized void finishWork() {
_inProcessWork--;
if((_inProcessWork == 0) && _newWork.isEmpty()) {
notifyAll();
}
}
}
your workers will look roughly like:
public class Worker {
private final WorkQueue<T> _queue;
public void run() {
T work = null;
while((work = _queue.startWork()) != null) {
try {
// do work here...
} finally {
_queue.finishWork();
}
}
}
}
the one trick is that you need to add the first work item _before you start any workers (otherwise they will all immediately exit).
Right now I'm torn up in deciding the best way of handling request objects that I send up to a server. In other words, I have tracking request objects for things such as impression and click tracking within my app. Simple requests with very low payloads. There are places in my app where said objects that need to be tracked appear concurrently next to each other (at most three concurrent objects that I have to track), so every time said objects are visible for example, I have to create a tracking request object for each of them.
Now I already know that I can easily create a singleton queue thread which adds those objects into a vector and my thread either processes them in the main loop or calls wait on the queue until we have objects to process. While this sounds like a clear cut solution, the queue can accumulate into the dozens, which can be cumbersome at times, since it's making one connection for each request, thus it won't run concurrently.
What I had in mind was to create a thread pool which would allow me to create up two concurrent connections via semaphore and process thread objects that would contain my tracking event requests. In other words, I wanted to create a function that would create a new thread Object and add it into a Vector, in which the thread pool would iterate through the set of threads and process them two at a time. I know I can create a function that would add objects like so:
public boolean addThread(Runnable r){
synchronized(_queue){
while(!dead){
_queue.addElement(r);
//TODO: How would I notify my thread pool object to iterate through the list to process the queue? Do I call notify on the queue object, but that would only work on a thread right??
return true
}
return false;
}
What I am wondering is how will the threads themselves get executed. How can I write a function that would execute the thread pool after adding a thread to the list? Also, since the semaphore will block after the second connection, will that lock up my app until there is an open slot, or will it just lock up in the thread pool object while looping through the list?
As always, since I am targeting a J2ME/Blackberry environment, only pre-1.5 answers will be accepted, so no Generics or any class from the Concurrent package.
EDIT: So I take it that this is what it should look like more or less:
class MyThreadPool extends Thread{
private final Vector _queue = new Vector();
private CappedSemaphore _sem;
public MyWaitingThread (){
_sem = new CappedSemaphore(2);
this.start();
}
public void run(){
while(!dead){
Runnable r = null;
synchronized(_queue){
if(_queue.isEmpty()){
_queue.wait();
} else {
r = _queue.elementAt(0);
_queue.removeElement(0);
}
}
if(r != null){
_sem.take();
r.run();
_sem.release();
}
}
}
public boolean addThread(Runnable r){
synchronized(_queue){
if(!dead){
_queue.addElement(r);
_queue.notifyAll();
return true
}
return false;
}
}
What you would want to do, in on the thread side have the each thread wait on the queue. For example
class MyWaitingThread extends Thread{
private final Queue _queue;
public MyWaitingThread (Queue _queue){
this._queue = _queue;
}
public void run(){
while(true){
Runnable r = null;
synchronized(_queue){
if(_queue.isEmpty())
_queue.wait();
else
r = queue.pop();
}
if(r != null) r.run();
}
}
}
And in your other logic it would look like:
public void addThread(Runnable r){
if(!dead){
synchronized(_queue){
_queue.addElement(r);
_queue.notifyAll();
}
}
}
That _queue.notifyAll will wake up all threads waiting on the _queue instance. Also, notice I moved the while(!dead) outside of the synchronized block and changed it to if(!dead). I can imagine keeping it the way you originally had it wouldnt have worked exactly like you hoped.