I have a list of intensive updates so I am grouping them together and executing them as a batch job in a single thread. Other threads can send their updates at any time.
class ItemUpdateJob {
int itemId;
int number;
}
When scheduling a job to be queued for updating, I want a collection where I can modify a job if it already exists (assuming itemId as the key). In this example:
existingItemJobInQueue.number += requestedItemJob.number;
so the queue doesn't start having thousands of jobs for the same item. When the jobs begin execution I will need to somehow loop through the queue, but while updating a job, it should not be updated (should each item have its own lock?).
for (ItemUpdateJob job : jobQueue) {
updateItem(job);
}
Once a job has been updated, it should immediately be removed from the queue. What is the best way to do this? Currently I am thinking of using a HashMap with the item id as the key, then each item has a lock which prevents an existing job from being modified while the item is being updated. Although, this will cause a halt as it waits for the update to complete (lock to be released).
It looks to me as if you need a combination of more than one collection. Perhaps something like this?
public class JobHandler {
//jobs still in the queue, map for a quick lookup
private final Map<Integer, ItemUpdateJob> waitingJobs;
//jobs still waiting to be run
private final Queue<ItemUpdateJob> jobQueue;
public JobHandler(Collection<ItemUpdateJob> jobs) {
this.waitingJobs = new HashMap<>();
this.jobQueue = new LinkedList<>();
this.init(jobs);
}
private void init(Collection<ItemUpdateJob> jobs) {
for (ItemUpdateJob job : jobs) {
this.waitingJobs.put(job.itemId, job);
this.jobQueue.add(job);
}
}
public ItemUpdateJob getNextJobToRun() {
ItemUpdateJob nextJob = this.jobQueue.poll();
if (nextJob != null) {
this.waitingJobs.remove(nextJob.itemId);
}
return nextJob;
}
public void addJob(ItemUpdateJob job) {
this.waitingJobs.put(job.itemId, job);
this.jobQueue.add(job);
}
public boolean updateJob(ItemUpdateJob updateJob) {
if (this.waitingJobs.containsKey(updateJob.itemId)) {
//job is currently waiting for execution, so update it
this.waitingJobs.get(updateJob.itemId).number += updateJob.number;
return true;
} else {
//job is currently being run, or no such job at all
//so adding it at the end of the queue to wait for it's turn
this.addJob(updateJob);
return false;
}
}
}
java.util.Queue looks like a good match - FIFO order of execution for jobs and a Map for quick lookups when updating currently waiting job. Keep in mind some Queue implementations have capacity restrictions, and obviously this needs synchronization.
Related
Need high performance algorithm:
Scenario: Tasks are submitted in a thread pool for different customers at very frequent intervals
Requirement: Tasks need to be handle sequentially submitted by a particular customer, but different task submitted by different customer can be executed in parallel.
Here is an example of how can you write your worker class. The worker class keeps a concurrent map of all the customers being processed currently. Whenever the worker get the next work item off the list, it checks if this customer is currently being processed. If so, It re-enqueues the task at the end of the queue.
Let me know if you have any questions.
public class MyWorker extends Thread {
private static int instance = 0;
private final Queue<Task> queue;
// This is used to hold the customer that are in process at this time.
private final ConcurrentHashMap<String, Boolean> inProcessCustomers;
public MyWorker(Queue queue, ConcurrentHashMap<String, Boolean> inProcessCustomers) {
this.queue = queue;
this.inProcessCustomers = inProcessCustomers
setName("MyWorker:" + (instance++));
}
#Override
public void run() {
while ( true ) {
try {
Runnable work = null;
synchronized ( queue ) {
while ( queue.isEmpty() )
queue.wait();
// Get the next work item off of the queue
task = queue.remove();
// if the customer is in process, then add the task back to the end of the queue and return.
if(inProcessCustomers.containsKey(task.getCustomerId()) {
queue.add(task);
return;
}
inProcessCustomer.put(task.getCustomerId(), true);
}
// Process the work item
task.run();
inProcessCustomer.remove(task.getCustomerId());
}
catch ( InterruptedException ie ) {
break; // Terminate
}
}
}
private void doWork(Runnable work) { ... }
}
It sounds like you need a way to map your customer to a queue of tasks.
Assumption: Each customer has a way of being uniquely identified.
I would suggest implementing the hashCode method on whatever object represents your customer.
As the tasks are submitted you create a mapping (using HashMap) where the key is your customer and the value is a queue - I suggest ConcurrentLinkedQueue - then add either the task or the thread to the queue. As you process tasks remove them (or their thread depending on design choice) from the queue.
EDIT:
For the purposes of continued discussion I'm going to assume the tasks will be the objects stored in the queue.
Above when I wrote "As you process tasks remove them..." I meant that the task would remain in the queue until completed. You can do this using the peek method of the queue.
Regarding how to process tasks once they are added to the queue the task can be given a reference to the queue object so that once the task is completed it can trigger the next task. The basic algorithm for this piece would go something like this: the controller thread responsible for adding tasks to the queue would check to see if the queue is empty or not. If the queue is not empty it will only add the next task to the queue because the next task is triggered when the current task finishes. If the queue is empty the controller triggers the next task - which it should already have a reference to. When the current task finishes it will call its queue's poll method to remove itself from the head of the queue and then calls peek to obtain the next task. The next task is then executed.
I have the following method:
void store(SomeObject o) {
}
The idea of this method is to store o to a permanent storage but the function should not block. I.e. I can not/must not do the actual storage in the same thread that called store.
I can not also start a thread and store the object from the other thread because store might be called a "huge" amount of times and I don't want to start spawning threads.
So I options which I don't see how they can work well:
1) Use a thread pool (Executor family)
2) In store store the object in an array list and return. When the array list reaches e.g. 1000 (random number) then start another thread to "flush" the array list to storage. But I would still possibly have the problem of too many threads (thread pool?)
So in both cases the only requirement I have is that I store persistantly the objects in exactly the same order that was passed to store. And using multiple threads mixes things up.
How can this be solved?
How can I ensure:
1) Non blocking store
2) Accurate insertion order
3) I don't care about any storage guarantees. If e.g. something crashes I don't care about losing data e.g. cached in the array list before storing them.
I would use a SingleThreadExecutor and a BlockingQueue.
SingleThreadExecutor as the name sais has one single Thread. Use it to poll from the Queue and persist objects, blocking if empty.
You can add not blocking to the queue in your store method.
EDIT
Actually, you do not even need that extra Queue - JavaDoc of newSingleThreadExecutor sais:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So I think it's exactly what you need.
private final ExecutorService persistor = Executors.newSingleThreadExecutor();
public void store( final SomeObject o ){
persistor.submit( new Runnable(){
#Override public void run(){
// your persist-code here.
}
} );
}
The advantage of using a Runnable that has a quasi-endless-loop and using an extra queue would be the possibility to code some "Burst"-functionality. For example you could make it wait to persist only when 10 elements are in queue or the oldest element has been added at least 1 minute ago ...
I suggest using a Chronicle-Queue which is a library I designed.
It allows you to write in the current thread without blocking. It was originally designed for low latency trading systems. For small messages it takes around 300 ns to write a message.
You don't need to use a back ground thread, or a on heap queue and it doesn't wait for the data to be written to disk by default. It also ensures consistent order for all readers. If the program dies at any point after you call finish() the message is not lost. (Unless the OS crashes/loses power) It also supports replication to avoid data loss.
Have one separate thread that gets items from the end of a queue (blocking on an empty queue), and writes them to disk. Your main thread's store() function just adds items to the beginning of the queue.
Here's a rough idea (though I assume there will be cleaner or faster ways for doing this in production code, depending on how fast you need things to be):
import java.util.*;
import java.io.*;
import java.util.concurrent.*;
class ObjectWriter implements Runnable {
private final Object END = new Object();
BlockingQueue<Object> queue = new LinkedBlockingQueue();
public void store(Object o) throws InterruptedException {
queue.put(o);
}
public ObjectWriter() {
new Thread(this).start();
}
public void close() throws InterruptedException {
queue.put(END);
}
public void run() {
while (true) {
try {
Object o = queue.take();
if (o == END) {
// close output file.
return;
}
System.out.println(o.toString()); // serialize as appropriate
} catch (InterruptedException e) {
}
}
}
}
public class Test {
public static void main(String[] args) throws Exception {
ObjectWriter w = new ObjectWriter();
w.store("hello");
w.store("world");
w.close();
}
}
The comments in your question make it sound like you are unfamilier with multi-threading, but it's really not that difficult.
You simply need another thread responsible for writing to the storage which picks items off a queue. - your store function just adds the objects to the in-memory queue and continues on it's way.
Some psuedo-ish code:
final List<SomeObject> queue = new List<SomeObject>();
void store(SomeObject o) {
// add it to the queue - note that modifying o after this will also alter the
// instance in the queue
synchronized(queue) {
queue.add(queue);
queue.notify(); // tell the storage thread there's something in the queue
}
}
void storageThread() {
SomeObject item;
while (notfinished) {
synchronized(queue) {
if (queue.length > 0) {
item = queue.get(0); // get from start to ensure same order
queue.removeAt(0);
} else {
// wait for something
queue.wait();
continue;
}
}
writeToStorage(item);
}
}
In few words: I want to process large graph with circular references in parallel way. And also I don't have access to full graph, I have to crawl through it. And I want to organize effective queue to do that. I'm interested is there any best practices to do that?
I'm trying to organize infinite data processing flow for such strategy: each thread takes node to process from queue, processes it, after processing - some new nodes for processing might appears - so thread has to put them into queue. But I don't have to process each node more than once. Nodes are immutable entities.
As I understand - I have to use some threadsafe implementation of queue and set (for already visited instances).
I'm trying to avoid synchronized methods. So, my implementation of this flow:
When thread adding nodes to the queue, it checking each node: if visited-nodes-set contains this node, thread don't add it to
the queue. But that's not all
When thread takes node from the queue - it check if visited-nodes-set
contains this node. If contains, thread takes another
node from queue, until get node, which hasn't
been processed yet. After finding unprocessed node - thread also adding
it to the visited-nodes-set.
I've tried to use LinkedBlockingQueue and ConcurrentHashMap (as a set). I've used ConcurrentHashMap, because it contains method putIfAbsent(key, value) - which, as I understand, helps atomically: check if map contains key, and if doesn't contain - add it.
Here is implementation of described algorithm:
public class ParallelDataQueue {
private LinkedBlockingQueue<String> dataToProcess = new LinkedBlockingQueue<String>();
// using map as a set
private ConcurrentHashMap<String, Object> processedData = new ConcurrentHashMap<String, Object>( 1000000 );
private final Object value = new Object();
public String getNextDataInstance() {
while ( true ) {
try {
String data = this.dataToProcess.take();
Boolean dataIsAlreadyProcessed = ( this.processedData.putIfAbsent( data, this.value ) != null );
if ( dataIsAlreadyProcessed ) {
continue;
} else {
return data;
}
} catch ( InterruptedException e ) {
e.printStackTrace();
}
}
}
public void addData( Collection<String> data ) {
for ( String d : data ) {
if ( !this.processedData.containsKey( d ) ) {
try {
this.dataToProcess.put( d );
} catch ( InterruptedException e ) {
e.printStackTrace();
}
}
}
}
}
So my question - does current implementation avoid processing of repeatable nodes. And, maybe there is more elegant solution?
Thanks
P.S.
I understand, that such implementation doesn't avoid appearence duplicates of nodes in queue. But for me it is not critical - all I need, is to avoid processing each node more than once.
Your current implementation does not avoid repeated data instances. Assume that "Thread A" check whether data exist in concurrent map and find out it does not so it will report that data does not exist. But just before executing the if after putIfAbsent line, "Thread A" is suspended. At that time another threat, "Thread B", scheduled to be executed by cpu and check existing of same data element and finds out it does not exist and reports it as absent and it is added to queue. When the Thread A is rescheduled it will continue from the if line and will add it to queue again.
Yes. Use ConcurrentLinkedQueue ( http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html )
also
When thread adding data to the queue, it checking each instance of data: if set contains instance of this data, thread don't add it to the queue. But that's not all
is not a thread-safe approach, unless the underlying Collection is thread-safe. (which means it's synchronized internally) But then it's pointless to do the check, because it's already thread-safe...
If you need to process data in multithreaded manner, you maybe don't need collections at all. Didn't you think about using the Executors framework? :
public static void main(String[] args) throws InterruptedException {
ExecutorService exec = Executors.newFixedThreadPool(100);
while (true) { // provide data ininitely
for (int i = 0; i < 1000; i++)
exec.execute(new DataProcessor(UUID.randomUUID(), exec));
Thread.sleep(10000); // wait a bit, then continue;
}
}
static class DataProcessor implements Runnable {
Object data;
ExecutorService exec;
public DataProcessor(Object data, ExecutorService exec) {
this.data = data;
this.exec = exec;
}
#Override
public void run() {
System.out.println(data); // process data
if (new Random().nextInt(100) < 50) // add new data piece for execution if needed
exec.execute(new DataProcessor(UUID.randomUUID(), exec));
}
}
I have a ArrayBlocking queue, , upon which a single thread fixed rate Scheduled works.
I may have failed task. I want re-run that or re-insert in queue at high priority or top level
Some thoughts here -
Why are you using ArrayBlockingQueue and not PriorityBlockingQueue ? Sounds like what you need to me . At first set all your elements to be with equal priority.
In case you receive an exception - re-insert to the queue with a higher priority
Simplest thing might be a priority queue. Attach a retry number to the task. It starts as zero. After an unsuccessful run, throw away all the ones and increment the zeroes and put them back in the queue at a high priority. With this method, you can easily decide to run everything three times, or more, if you want to later. The down side is you have to modify the task class.
The other idea would be to set up another, non-blocking, thread-safe, high-priority queue. When looking for a new task, you check the non-blocking queue first and run what's there. Otherwise, go to the blocking queue. This might work for you as is, and so far it's the simplest solution. The problem is the high priority queue might fill up while the scheduler is blocked on the blocking queue.
To get around this, you'd have to do your own blocking. Both queues should be non-blocking. (Suggestion: java.util.concurrent.ConcurrentLinkedQueue.) After polling both queues with no results, wait() on a monitor. When anything puts something in a queue, it should call notifyAll() and the scheduler can start up again. Great care is needed lest the notification occur after the scheduler has checked both queues but before it calls wait().
Addition:
Prototype code for third solution with manual blocking. Some threading is suggested, but the reader will know his/her own situation best. Which bits of code are apt to block waiting for a lock, which are apt to tie up their thread (and core) for minutes while doing extensive work, and which cannot afford to sit around waiting for the other code to finish all needs to be considered. For instance, if a failed run can immediately be rerun on the same thread with no time-consuming cleanup, most of this code can be junked.
private final ConcurrentLinkedQueue mainQueue = new ConcurrentLinkedQueue();
private final ConcurrentLinkedQueue prioQueue = new ConcurrentLinkedQueue();
private final Object entryWatch = new Object();
/** Adds a new job to the queue. */
public void addjob( Runnable runjob ) {
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
/** The endless loop that does the work. */
public void schedule() {
for (;;) {
Runnable run = getOne(); // Avoids lock if successful.
if (run == null) {
// Both queues are empty.
synchronized (entryWatch) {
// Need to check again. Someone might have added and notifiedAll
// since last check. From this point until, wait, we can be sure
// entryWatch is not notified.
run = getOne();
if (run == null) {
// Both queues are REALLY empty.
try { entryWatch.wait(); }
catch (InterruptedException ie) {}
}
}
}
runit( run );
}
}
/** Helper method for the endless loop. */
private Runnable getOne() {
Runnable run = (Runnable) prioQueue.poll();
if (run != null) return run;
return (Runnable) mainQueue.poll();
}
/** Runs a new job. */
public void runit( final Runnable runjob ) {
// Do everthing in another thread. (Optional)
new Thread() {
#Override public void run() {
// Run run. (Possibly in own thread?)
// (Perhaps best in thread from a thread pool.)
runjob.run();
// Handle failure (runit only, NOT in runitLast).
// Defining "failure" left as exercise for reader.
if (failure) {
// Put code here to handle failure.
// Put back in queue.
prioQueue.add( runjob );
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
}
}.start();
}
/** Reruns a job. */
public void runitLast( final Runnable runjob ) {
// Same code as "runit", but don't put "runjob" in "prioQueue" on failure.
}
I have a queue that contains work items and I want to have multiple threads work in parallel on those items. When a work item is processed it may result in new work items. The problem I have is that I can't find a solution on how to determine if I'm done. The worker looks like that:
public class Worker implements Runnable {
public void run() {
while (true) {
WorkItem item = queue.nextItem();
if (item != null) {
processItem(item);
}
else {
// the queue is empty, but there may still be other workers
// processing items which may result in new work items
// how to determine if the work is completely done?
}
}
}
}
This seems like a pretty simple problem actually but I'm at a loss. What would be the best way to implement that?
thanks
clarification:
The worker threads have to terminate once none of them is processing an item, but as long as at least one of them is still working they have to wait because it may result in new work items.
What about using an ExecutorService which will allow you to wait for all tasks to finish: ExecutorService, how to wait for all tasks to finish
I'd suggest wait/notify calls. In the else case, your worker threads would wait on an object until notified by the queue that there is more work to do. When a worker creates a new item, it adds it to the queue, and the queue calls notify on the object the workers are waiting on. One of them will wake up to consume the new item.
The methods wait, notify, and notifyAll of class Object support an efficient transfer of control from one thread to another. Rather than simply "spinning" (repeatedly locking and unlocking an object to see whether some internal state has changed), which consumes computational effort, a thread can suspend itself using wait until such time as another thread awakens it using notify. This is especially appropriate in situations where threads have a producer-consumer relationship (actively cooperating on a common goal) rather than a mutual exclusion relationship (trying to avoid conflicts while sharing a common resource).
Source: Threads and Locks
I'd look at something higher level than wait/notify. It's very difficult to get right and avoid deadlocks. Have you looked at java.util.concurrent.CompletionService<V>? You could have a simpler manager thread that polls the service and take()s the results, which may or may not contain a new work item.
Using a BlockingQueue containing items to process along with a synchronized set that keeps track of all elements being processed currently:
BlockingQueue<WorkItem> bQueue;
Set<WorkItem> beingProcessed = new Collections.synchronizedSet(new HashMap<WorkItem>());
bQueue.put(workItem);
...
// the following runs over many threads in parallel
while (!(bQueue.isEmpty() && beingProcessed.isEmpty())) {
WorkItem currentItem = bQueue.poll(50L, TimeUnit.MILLISECONDS); // null for empty queue
if (currentItem != null) {
beingProcessed.add(currentItem);
processItem(currentItem); // possibly bQueue.add(newItem) is called from processItem
beingProcessed.remove(currentItem);
}
}
EDIT: as #Hovercraft Full Of Eels suggested, an ExecutorService is probably what you should really use. You can add new tasks as you go along. You can semi-busy wait for termination of all tasks at regular interval with executorService.awaitTermination(time, timeUnits) and kill all your threads after that.
Here's the beginnings of a queue to solve your problem. bascially, you need to track new work and in process work.
public class WorkQueue<T> {
private final List<T> _newWork = new LinkedList<T>();
private int _inProcessWork;
public synchronized void addWork(T work) {
_newWork.add(work);
notifyAll();
}
public synchronized T startWork() throws InterruptedException {
while(_newWork.isEmpty() && (_inProcessWork > 0)) {
wait();
if(!_newWork.isEmpty()) {
_inProcessWork++;
return _newWork.remove(0);
}
}
// everything is done
return null;
}
public synchronized void finishWork() {
_inProcessWork--;
if((_inProcessWork == 0) && _newWork.isEmpty()) {
notifyAll();
}
}
}
your workers will look roughly like:
public class Worker {
private final WorkQueue<T> _queue;
public void run() {
T work = null;
while((work = _queue.startWork()) != null) {
try {
// do work here...
} finally {
_queue.finishWork();
}
}
}
}
the one trick is that you need to add the first work item _before you start any workers (otherwise they will all immediately exit).