Need algo for executing tasks sequentially submitted by a particular customer - java

Need high performance algorithm:
Scenario: Tasks are submitted in a thread pool for different customers at very frequent intervals
Requirement: Tasks need to be handle sequentially submitted by a particular customer, but different task submitted by different customer can be executed in parallel.

Here is an example of how can you write your worker class. The worker class keeps a concurrent map of all the customers being processed currently. Whenever the worker get the next work item off the list, it checks if this customer is currently being processed. If so, It re-enqueues the task at the end of the queue.
Let me know if you have any questions.
public class MyWorker extends Thread {
private static int instance = 0;
private final Queue<Task> queue;
// This is used to hold the customer that are in process at this time.
private final ConcurrentHashMap<String, Boolean> inProcessCustomers;
public MyWorker(Queue queue, ConcurrentHashMap<String, Boolean> inProcessCustomers) {
this.queue = queue;
this.inProcessCustomers = inProcessCustomers
setName("MyWorker:" + (instance++));
}
#Override
public void run() {
while ( true ) {
try {
Runnable work = null;
synchronized ( queue ) {
while ( queue.isEmpty() )
queue.wait();
// Get the next work item off of the queue
task = queue.remove();
// if the customer is in process, then add the task back to the end of the queue and return.
if(inProcessCustomers.containsKey(task.getCustomerId()) {
queue.add(task);
return;
}
inProcessCustomer.put(task.getCustomerId(), true);
}
// Process the work item
task.run();
inProcessCustomer.remove(task.getCustomerId());
}
catch ( InterruptedException ie ) {
break; // Terminate
}
}
}
private void doWork(Runnable work) { ... }
}

It sounds like you need a way to map your customer to a queue of tasks.
Assumption: Each customer has a way of being uniquely identified.
I would suggest implementing the hashCode method on whatever object represents your customer.
As the tasks are submitted you create a mapping (using HashMap) where the key is your customer and the value is a queue - I suggest ConcurrentLinkedQueue - then add either the task or the thread to the queue. As you process tasks remove them (or their thread depending on design choice) from the queue.
EDIT:
For the purposes of continued discussion I'm going to assume the tasks will be the objects stored in the queue.
Above when I wrote "As you process tasks remove them..." I meant that the task would remain in the queue until completed. You can do this using the peek method of the queue.
Regarding how to process tasks once they are added to the queue the task can be given a reference to the queue object so that once the task is completed it can trigger the next task. The basic algorithm for this piece would go something like this: the controller thread responsible for adding tasks to the queue would check to see if the queue is empty or not. If the queue is not empty it will only add the next task to the queue because the next task is triggered when the current task finishes. If the queue is empty the controller triggers the next task - which it should already have a reference to. When the current task finishes it will call its queue's poll method to remove itself from the head of the queue and then calls peek to obtain the next task. The next task is then executed.

Related

java - what is the best collection for this use case?

I have a list of intensive updates so I am grouping them together and executing them as a batch job in a single thread. Other threads can send their updates at any time.
class ItemUpdateJob {
int itemId;
int number;
}
When scheduling a job to be queued for updating, I want a collection where I can modify a job if it already exists (assuming itemId as the key). In this example:
existingItemJobInQueue.number += requestedItemJob.number;
so the queue doesn't start having thousands of jobs for the same item. When the jobs begin execution I will need to somehow loop through the queue, but while updating a job, it should not be updated (should each item have its own lock?).
for (ItemUpdateJob job : jobQueue) {
updateItem(job);
}
Once a job has been updated, it should immediately be removed from the queue. What is the best way to do this? Currently I am thinking of using a HashMap with the item id as the key, then each item has a lock which prevents an existing job from being modified while the item is being updated. Although, this will cause a halt as it waits for the update to complete (lock to be released).
It looks to me as if you need a combination of more than one collection. Perhaps something like this?
public class JobHandler {
//jobs still in the queue, map for a quick lookup
private final Map<Integer, ItemUpdateJob> waitingJobs;
//jobs still waiting to be run
private final Queue<ItemUpdateJob> jobQueue;
public JobHandler(Collection<ItemUpdateJob> jobs) {
this.waitingJobs = new HashMap<>();
this.jobQueue = new LinkedList<>();
this.init(jobs);
}
private void init(Collection<ItemUpdateJob> jobs) {
for (ItemUpdateJob job : jobs) {
this.waitingJobs.put(job.itemId, job);
this.jobQueue.add(job);
}
}
public ItemUpdateJob getNextJobToRun() {
ItemUpdateJob nextJob = this.jobQueue.poll();
if (nextJob != null) {
this.waitingJobs.remove(nextJob.itemId);
}
return nextJob;
}
public void addJob(ItemUpdateJob job) {
this.waitingJobs.put(job.itemId, job);
this.jobQueue.add(job);
}
public boolean updateJob(ItemUpdateJob updateJob) {
if (this.waitingJobs.containsKey(updateJob.itemId)) {
//job is currently waiting for execution, so update it
this.waitingJobs.get(updateJob.itemId).number += updateJob.number;
return true;
} else {
//job is currently being run, or no such job at all
//so adding it at the end of the queue to wait for it's turn
this.addJob(updateJob);
return false;
}
}
}
java.util.Queue looks like a good match - FIFO order of execution for jobs and a Map for quick lookups when updating currently waiting job. Keep in mind some Queue implementations have capacity restrictions, and obviously this needs synchronization.

Non blocking function that preserves order

I have the following method:
void store(SomeObject o) {
}
The idea of this method is to store o to a permanent storage but the function should not block. I.e. I can not/must not do the actual storage in the same thread that called store.
I can not also start a thread and store the object from the other thread because store might be called a "huge" amount of times and I don't want to start spawning threads.
So I options which I don't see how they can work well:
1) Use a thread pool (Executor family)
2) In store store the object in an array list and return. When the array list reaches e.g. 1000 (random number) then start another thread to "flush" the array list to storage. But I would still possibly have the problem of too many threads (thread pool?)
So in both cases the only requirement I have is that I store persistantly the objects in exactly the same order that was passed to store. And using multiple threads mixes things up.
How can this be solved?
How can I ensure:
1) Non blocking store
2) Accurate insertion order
3) I don't care about any storage guarantees. If e.g. something crashes I don't care about losing data e.g. cached in the array list before storing them.
I would use a SingleThreadExecutor and a BlockingQueue.
SingleThreadExecutor as the name sais has one single Thread. Use it to poll from the Queue and persist objects, blocking if empty.
You can add not blocking to the queue in your store method.
EDIT
Actually, you do not even need that extra Queue - JavaDoc of newSingleThreadExecutor sais:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So I think it's exactly what you need.
private final ExecutorService persistor = Executors.newSingleThreadExecutor();
public void store( final SomeObject o ){
persistor.submit( new Runnable(){
#Override public void run(){
// your persist-code here.
}
} );
}
The advantage of using a Runnable that has a quasi-endless-loop and using an extra queue would be the possibility to code some "Burst"-functionality. For example you could make it wait to persist only when 10 elements are in queue or the oldest element has been added at least 1 minute ago ...
I suggest using a Chronicle-Queue which is a library I designed.
It allows you to write in the current thread without blocking. It was originally designed for low latency trading systems. For small messages it takes around 300 ns to write a message.
You don't need to use a back ground thread, or a on heap queue and it doesn't wait for the data to be written to disk by default. It also ensures consistent order for all readers. If the program dies at any point after you call finish() the message is not lost. (Unless the OS crashes/loses power) It also supports replication to avoid data loss.
Have one separate thread that gets items from the end of a queue (blocking on an empty queue), and writes them to disk. Your main thread's store() function just adds items to the beginning of the queue.
Here's a rough idea (though I assume there will be cleaner or faster ways for doing this in production code, depending on how fast you need things to be):
import java.util.*;
import java.io.*;
import java.util.concurrent.*;
class ObjectWriter implements Runnable {
private final Object END = new Object();
BlockingQueue<Object> queue = new LinkedBlockingQueue();
public void store(Object o) throws InterruptedException {
queue.put(o);
}
public ObjectWriter() {
new Thread(this).start();
}
public void close() throws InterruptedException {
queue.put(END);
}
public void run() {
while (true) {
try {
Object o = queue.take();
if (o == END) {
// close output file.
return;
}
System.out.println(o.toString()); // serialize as appropriate
} catch (InterruptedException e) {
}
}
}
}
public class Test {
public static void main(String[] args) throws Exception {
ObjectWriter w = new ObjectWriter();
w.store("hello");
w.store("world");
w.close();
}
}
The comments in your question make it sound like you are unfamilier with multi-threading, but it's really not that difficult.
You simply need another thread responsible for writing to the storage which picks items off a queue. - your store function just adds the objects to the in-memory queue and continues on it's way.
Some psuedo-ish code:
final List<SomeObject> queue = new List<SomeObject>();
void store(SomeObject o) {
// add it to the queue - note that modifying o after this will also alter the
// instance in the queue
synchronized(queue) {
queue.add(queue);
queue.notify(); // tell the storage thread there's something in the queue
}
}
void storageThread() {
SomeObject item;
while (notfinished) {
synchronized(queue) {
if (queue.length > 0) {
item = queue.get(0); // get from start to ensure same order
queue.removeAt(0);
} else {
// wait for something
queue.wait();
continue;
}
}
writeToStorage(item);
}
}

Rejection handler in Executors.newScheduledThreadPool

I have a ArrayBlocking queue, , upon which a single thread fixed rate Scheduled works.
I may have failed task. I want re-run that or re-insert in queue at high priority or top level
Some thoughts here -
Why are you using ArrayBlockingQueue and not PriorityBlockingQueue ? Sounds like what you need to me . At first set all your elements to be with equal priority.
In case you receive an exception - re-insert to the queue with a higher priority
Simplest thing might be a priority queue. Attach a retry number to the task. It starts as zero. After an unsuccessful run, throw away all the ones and increment the zeroes and put them back in the queue at a high priority. With this method, you can easily decide to run everything three times, or more, if you want to later. The down side is you have to modify the task class.
The other idea would be to set up another, non-blocking, thread-safe, high-priority queue. When looking for a new task, you check the non-blocking queue first and run what's there. Otherwise, go to the blocking queue. This might work for you as is, and so far it's the simplest solution. The problem is the high priority queue might fill up while the scheduler is blocked on the blocking queue.
To get around this, you'd have to do your own blocking. Both queues should be non-blocking. (Suggestion: java.util.concurrent.ConcurrentLinkedQueue.) After polling both queues with no results, wait() on a monitor. When anything puts something in a queue, it should call notifyAll() and the scheduler can start up again. Great care is needed lest the notification occur after the scheduler has checked both queues but before it calls wait().
Addition:
Prototype code for third solution with manual blocking. Some threading is suggested, but the reader will know his/her own situation best. Which bits of code are apt to block waiting for a lock, which are apt to tie up their thread (and core) for minutes while doing extensive work, and which cannot afford to sit around waiting for the other code to finish all needs to be considered. For instance, if a failed run can immediately be rerun on the same thread with no time-consuming cleanup, most of this code can be junked.
private final ConcurrentLinkedQueue mainQueue = new ConcurrentLinkedQueue();
private final ConcurrentLinkedQueue prioQueue = new ConcurrentLinkedQueue();
private final Object entryWatch = new Object();
/** Adds a new job to the queue. */
public void addjob( Runnable runjob ) {
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
/** The endless loop that does the work. */
public void schedule() {
for (;;) {
Runnable run = getOne(); // Avoids lock if successful.
if (run == null) {
// Both queues are empty.
synchronized (entryWatch) {
// Need to check again. Someone might have added and notifiedAll
// since last check. From this point until, wait, we can be sure
// entryWatch is not notified.
run = getOne();
if (run == null) {
// Both queues are REALLY empty.
try { entryWatch.wait(); }
catch (InterruptedException ie) {}
}
}
}
runit( run );
}
}
/** Helper method for the endless loop. */
private Runnable getOne() {
Runnable run = (Runnable) prioQueue.poll();
if (run != null) return run;
return (Runnable) mainQueue.poll();
}
/** Runs a new job. */
public void runit( final Runnable runjob ) {
// Do everthing in another thread. (Optional)
new Thread() {
#Override public void run() {
// Run run. (Possibly in own thread?)
// (Perhaps best in thread from a thread pool.)
runjob.run();
// Handle failure (runit only, NOT in runitLast).
// Defining "failure" left as exercise for reader.
if (failure) {
// Put code here to handle failure.
// Put back in queue.
prioQueue.add( runjob );
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
}
}.start();
}
/** Reruns a job. */
public void runitLast( final Runnable runjob ) {
// Same code as "runit", but don't put "runjob" in "prioQueue" on failure.
}

How to implement a queue that can be processed by multiple threads?

I think I'm doing it wrong. I am creating threads that are suppose to crunch some data from a shared queue. My problem is the program is slow and a memory hog, I suspect that the queue may not be as shared as I hoped it would be. I suspect this because in my code I added a line that displayed the size of the queue and if I launch 2 threads then I get two outputs with completely different numbers and seem to increment on their own(I thought it could be the same number but maybe it was jumping from 100 to 2 and so on but after watching it shows 105 and 5 and goes at a different rate. If I have 4 threads then I see 4 different numbers).
Here's snippet of the relevant parts. I create a static class with the data I want in the queue at the top of the program
static class queue_class {
int number;
int[] data;
Context(int number, int[] data) {
this.number = number;
this.data = data;
}
}
Then I create the queue after sending some jobs to the callable..
static class process_threaded implements Callable<Void> {
// queue with contexts to process
private Queue<queue_class> queue;
process_threaded(queue_class request) {
queue = new ArrayDeque<queue_class>();
queue.add(request);
}
public Void call() {
while(!queue.isEmpty()) {
System.out.println("in contexts queue with a size of " + queue.size());
Context current = contexts.poll();
//get work and process it, if it work great then the solution goes elsewhere
//otherwise, depending on the data, its either discarded or parts of it is added back to queue
queue.add(new queue_class(k, data_list));
As you can see, there's 3 options for the data, get sent off if data is good, discard if its totally horrible or sent back to the queue. I think the queues are going when its getting sent back but I suspect because each thread is working on its own queue and not a shared one.
Is this guess correct and am I doing this wrong?
You are correct in your assessment that each thread is (probably) working with its own queue, since you are creating a queue in the constructor of your Callable. (It's actually very weird to have a Callable<Void> -- isn't that just a Runnable?)
There are other problems there, for example, the fact that you're working with a queue that isn't thread-safe, or the fact that your code won't compile as it is written.
The important question, though, is do you really need to explicitly create a queue in the first place? Why not have an ExecutorService to which you submit your Callables (or Runnables if you decide to make that switch): Pass a reference to the executor into your Callables, and they can add new Callables to the executor's queue of tasks to run. No need to reinvent the wheel.
For example:
static class process_threaded implements Runnable {
// Reference to an executor
private final ExecutorService exec;
// Reference to the job counter
private final AtomicInteger jobCounter;
// Request to process
private queue_class request;
process_threaded( ExecutorService exec, AtomicInteger counter, queue_class request) {
this.exec = exec;
this.jobCounter = counter;
this.jobCounter.incrementAndGet(); // Assuming that you will always
// submit the process_threaded to
// the executor if you create it.
this.request = request;
}
public run() {
//get work and process **request**, if it work great then the solution goes elsewhere
//otherwise, depending on the data, its either discarded or parts of are added back to the executor
exec.submit( new process_threaded( exec, new queue_class(k, data_list) ) );
// Can do some more work
// Always run before returning: counter update and notify the launcher
synchronized(jobCounter){
jobCounter.decrementAndGet();
jobCounter.notifyAll();
}
}
}
Edit:
To solve your problem of when to shut down the executor, I think the simplest solution is to have a job counter, and shutdown when it reaches 0. For thread-safety an AtomicInteger is probably the best choice. I added some code above to incorporate the change. Then your launching code would look something like this:
void theLauncher() {
AtomicInteger jobCounter = new AtomicInteger( 0 );
ExecutorService exec = Executors.newFixedThreadPool( Runtime.getRuntime().availableProcesses());
exec.submit( new process_threaded( exec, jobCounter, someProcessRequest ) );
// Can submit some other things here of course...
// Wait for jobs to complete:
for(;;jobCounter.get() > 0){
synchronized( jobCounter ){ // (I'm not sure if you have to have the synchronized block, but I think this is safer.
if( jobCounter.get() > 0 )
jobCounter.wait();
}
}
// Now you can shutdown:
exec.shutdown();
}
Don't reinvent the wheel! How about using ConcurrentLinkedQueue? From the javadocs:
An unbounded thread-safe queue based on linked nodes. This queue orders elements FIFO (first-in-first-out). The head of the queue is that element that has been on the queue the longest time. The tail of the queue is that element that has been on the queue the shortest time. New elements are inserted at the tail of the queue, and the queue retrieval operations obtain elements at the head of the queue. A ConcurrentLinkedQueue is an appropriate choice when many threads will share access to a common collection.

how to deal with multiple worker threads that may create new work items

I have a queue that contains work items and I want to have multiple threads work in parallel on those items. When a work item is processed it may result in new work items. The problem I have is that I can't find a solution on how to determine if I'm done. The worker looks like that:
public class Worker implements Runnable {
public void run() {
while (true) {
WorkItem item = queue.nextItem();
if (item != null) {
processItem(item);
}
else {
// the queue is empty, but there may still be other workers
// processing items which may result in new work items
// how to determine if the work is completely done?
}
}
}
}
This seems like a pretty simple problem actually but I'm at a loss. What would be the best way to implement that?
thanks
clarification:
The worker threads have to terminate once none of them is processing an item, but as long as at least one of them is still working they have to wait because it may result in new work items.
What about using an ExecutorService which will allow you to wait for all tasks to finish: ExecutorService, how to wait for all tasks to finish
I'd suggest wait/notify calls. In the else case, your worker threads would wait on an object until notified by the queue that there is more work to do. When a worker creates a new item, it adds it to the queue, and the queue calls notify on the object the workers are waiting on. One of them will wake up to consume the new item.
The methods wait, notify, and notifyAll of class Object support an efficient transfer of control from one thread to another. Rather than simply "spinning" (repeatedly locking and unlocking an object to see whether some internal state has changed), which consumes computational effort, a thread can suspend itself using wait until such time as another thread awakens it using notify. This is especially appropriate in situations where threads have a producer-consumer relationship (actively cooperating on a common goal) rather than a mutual exclusion relationship (trying to avoid conflicts while sharing a common resource).
Source: Threads and Locks
I'd look at something higher level than wait/notify. It's very difficult to get right and avoid deadlocks. Have you looked at java.util.concurrent.CompletionService<V>? You could have a simpler manager thread that polls the service and take()s the results, which may or may not contain a new work item.
Using a BlockingQueue containing items to process along with a synchronized set that keeps track of all elements being processed currently:
BlockingQueue<WorkItem> bQueue;
Set<WorkItem> beingProcessed = new Collections.synchronizedSet(new HashMap<WorkItem>());
bQueue.put(workItem);
...
// the following runs over many threads in parallel
while (!(bQueue.isEmpty() && beingProcessed.isEmpty())) {
WorkItem currentItem = bQueue.poll(50L, TimeUnit.MILLISECONDS); // null for empty queue
if (currentItem != null) {
beingProcessed.add(currentItem);
processItem(currentItem); // possibly bQueue.add(newItem) is called from processItem
beingProcessed.remove(currentItem);
}
}
EDIT: as #Hovercraft Full Of Eels suggested, an ExecutorService is probably what you should really use. You can add new tasks as you go along. You can semi-busy wait for termination of all tasks at regular interval with executorService.awaitTermination(time, timeUnits) and kill all your threads after that.
Here's the beginnings of a queue to solve your problem. bascially, you need to track new work and in process work.
public class WorkQueue<T> {
private final List<T> _newWork = new LinkedList<T>();
private int _inProcessWork;
public synchronized void addWork(T work) {
_newWork.add(work);
notifyAll();
}
public synchronized T startWork() throws InterruptedException {
while(_newWork.isEmpty() && (_inProcessWork > 0)) {
wait();
if(!_newWork.isEmpty()) {
_inProcessWork++;
return _newWork.remove(0);
}
}
// everything is done
return null;
}
public synchronized void finishWork() {
_inProcessWork--;
if((_inProcessWork == 0) && _newWork.isEmpty()) {
notifyAll();
}
}
}
your workers will look roughly like:
public class Worker {
private final WorkQueue<T> _queue;
public void run() {
T work = null;
while((work = _queue.startWork()) != null) {
try {
// do work here...
} finally {
_queue.finishWork();
}
}
}
}
the one trick is that you need to add the first work item _before you start any workers (otherwise they will all immediately exit).

Categories