I'm looking for a utility class or a best practice pattern to handle lot's of incoming stateful events in my application.
Imagine a producer that produces many events that are then consumed by an application that acts upon these events. Now in some situation the producer is producing more events than the consumer can actually handle, but because all events are stateful, it doesn't matter if some events would be missed, because the latest event contains all the information the previous events conveyed.
I have now written the following java code to handle these situations, but I'm unsure if this is the correct way of doing this, and if there isn't an easier, nicer, more secure way of doing this.
private static ScheduledThreadPoolExecutor executorService = new ScheduledThreadPoolExecutor(1);
private final static Object lock = new Object();
private static List<EventData> lastEventData = null;
static {
executorService.scheduleWithFixedDelay(new Runnable() {
#Override
public void run() {
synchronized(lock) {
while(lastEventData == null && !executorService.isShutdown()) {
try {
lock.wait();
} catch (InterruptedException ex) { ... }
}
try {
actUponEvent(lastEventData);
} catch (Throwable ex) { ... }
lastEventData = null;
}
}
}, 250, 250, TimeUnit.MILLISECONDS);
}
public synchronized update(final List<EventData> data) {
synchronized(lock) {
lastEventData = data;
lock.notifyAll();
}
}
public void dispose() {
executorService.shutdown();
}
In order words, I'd like to get event notifications as soon as the arrive, but rate limit them to one event every 250ms and I'm only interested in the last incoming event.
I looked through java.util.concurrent for some hints / pre existing solutions but couldn't find anything that would fit my problem. The BlockingQueue seems to be very nice at first because it blocks if empty, but on the other hand, the queue itself is not important for me, as I'm only interested in the latest event anyway and the blocking on insert if full is not what I'm looking for either.
The following model can support very high update rates, (into the tens of millions per second) but you only need to keep the latest in memory.
If you are taking a snapshot every N ms, you can use this approach.
final AtomicReference<ConcurrentHashMap<Key, Event>> mapRef =
When you have an update, add it to a ConcurrentMap. The keys are chosen so that an event which should replace a previous one has the same key.
Key key = keyFor(event);
mapRef.get().put(key, event);
This way to map has the latest update for any key at a moment.
Have a task which runs every N ms. This task when it runs can swap the map for another one (or a previous empty one to avoid creating new ones)
ConcurrentMap<Key, Event> prev = mapRef.set(prevEmptyMap);
for(Event e: prev.values())
process(e);
prev.clear();
this.prevEmptymap = prev;
Related
I have to manage scheduled file replications in a system. The file replications are scheduled by users and I need to restrict the amount of system resources used during replication. The amount of time that each replication may take is not defined (i.e. a replication may be scheduled to run every 15 minutes and the previous run may still be running when the next run is due) and a replication should not be queued if it's already queued or running.
I have a scheduler that periodically checks for due file replications and, for each one, (1) add it to a blocking queue if it is not queued nor running or (2) drop it otherwise.
private final Object scheduledReplicationsLock = new Object();
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
private final Set<Long> queuedReplicationIds = new HashSet<>();
private final Set<Long> runningReplicationIds = new HashSet<>();
public boolean add(Replication replication) {
synchronized (scheduledReplicationsLock) {
// If the replication job is either still executing or is already queued, do not add it.
if (queuedReplicationIds.contains(replication.id) || runningReplicationIds.contains(replication.id)) {
return false;
}
replicationQueue.add(replication)
queuedReplicationIds.add(replication.id);
}
I also have a pool of threads that waits until there is a replication in the queue and executes it. Below is the main method of each thread in the thread pool:
public void run() {
while (True) {
Replication replication = null;
synchronized (scheduledReplicationsLock) {
// This will block until a replication job is ready to be run or the current thread is interrupted.
replication = replicationQueue.take();
// Move the ID value out of the queued set and into the active set
Long replicationId = replication.getId();
queuedReplicationIds.remove(replicationId);
runningReplicationIds.add(replicationId);
}
executeReplication(replication)
}
}
This code gets into a deadlock because the first thread in the thread poll will get scheduledLock and prevent the scheduler to add replications to the queue. Moving replicationQueue.take() out of the synchronized block will eliminate the deadlock, but then it's possible that a element is removed from the queue and the hash sets are not atomically updated with it, which could cause a replication to be incorrectly dropped.
Should I use BlockingQueue.poll() and release the lock + sleep if the queue is empty instead of using BlockingQueue.take() ?
Fixes to the current solution or other solutions that meet the requirements are welcome.
wait / notify
Keeping your same control flow, instead of blocking on the BlockingQueue instance while holding the mutex lock, you can wait on notifications for the scheduledReplicationsLock forcing the worker thread to release the lock and return to the waiting pool.
Here down a reduced sample of your producer:
private final List<Replication> replicationQueue = new LinkedList<>();
private final Set<Long> runningReplicationIds = new HashSet<>();
public boolean add(Replication replication) {
synchronized (replicationQueue) {
// If the replication job is either still executing or is already queued, do not add it.
if (replicationQueue.contains(replication) || runningReplicationIds.contains(replication.id)) {
return false;
} else {
replicationQueue.add(replication);
replicationQueue.notifyAll();
}
}
}
The worker Runnable would then be updated as follows:
public void run() {
synchronized (replicationQueue) {
while (true) {
if (replicationQueue.isEmpty()) {
scheduledReplicationsLock.wait();
}
if (!replicationQueue.isEmpty()) {
Replication replication = replicationQueue.poll();
runningReplicationIds.add(replication.getId())
executeReplication(replication);
}
}
}
}
BlockingQueue
Generally you are better off using the BlockingQueue to coordinate your producer and replicating worker pool.
The BlockingQueue is, as the name implies, blocking by nature and will cause the calling thread to block only if items cannot be pulled / pushed from / to the queue.
Meanwhile, note that you will have to update your running / enqueued state management as you will only synchronizing on the BlockingQueue items dropping any constraints. This then will depend on the context, whether this would be acceptable or not.
This way, you would drop all other used mutex(es) and use on the BlockingQueue as your synchronization state:
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
public boolean add(Replication replication) {
// not sure if this is the proper invariant to check as at some point the replication would be neither queued nor running while still have been processed
if (replicationQueue.contains(replication)) {
return false;
}
// use `put` instead of `add` as this will block waiting for free space
replicationQueue.put(replication);
return true;
}
The workers will then take indefinitely from the BlockingQueue:
public void run() {
while (true) {
Replication replication = replicationQueue.take();
executeReplication(replication);
}
}
You no need to use any additional synchronization block if you using BlockingQueue
Quote from docs (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html)
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control.
just use something like this
public void run() {
try {
while (replicationQueue.take()) { //Thread will be wait for the next element in the queue
Long replicationId = replication.getId();
queuedReplicationIds.remove(replicationId);
runningReplicationIds.add(replicationId);
executeReplication(replication);
}
} catch (InterruptedException ex) {
//if interrupted while waiting next element
}
}
}
look in javadoc https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/LinkedBlockingQueue.html#take()
Or you can use BlockinQueue.pool() with timeout settings
UPD: After discussion, I extend LinkedBlockingQueue with two ConcurrentHashSets and add method afterTake() to remove processed Replicas. You do not need an additional synchronizations outside the queue. Just put replica in the first thread and take it in another, and call afterTake() when replication finished. You need to override other method if you want to use it.
package ru.everytag;
import io.vertx.core.impl.ConcurrentHashSet;
import java.util.concurrent.LinkedBlockingQueue;
public class TwoPhaseBlockingQueue<E> extends LinkedBlockingQueue<E> {
private ConcurrentHashSet<E> items = new ConcurrentHashSet<>();
private ConcurrentHashSet<E> taken = new ConcurrentHashSet<>();
#Override
public void put(E e) throws InterruptedException {
if (!items.contains(e)) {
items.add(e);
super.put(e);
}
}
public E take() {
E item = take();
taken.add(item);
items.remove(item);
return item;
}
public void afterTake(E e) {
if (taken.contains(e)) {
taken.remove(e);
} else if (items.contains(e)) {
throw new IllegalArgumentException("Element still in the queue");
}
}
}
I've got the following code:
while(!currentBoard.boardIsValid()){
for (QueueLocation location : QueueLocation.values()){
while(!inbox.isEmpty(location)){
Cell c = inbox.dequeue(location);
notifyNeighbours(c.x, c.y, c.getCurrentState(),previousBoard);
}
}
}
I've got a consumer with a few queues (all of their methods are synchronised). One queue for each producer. The consumer loops over all the queues and checks if they've got a task for him to consume.
If the queue he's checking has a task in it, he consumes it. Otherwise, he goes to the check the next queue until he finishes iterating over all the queues.
As of now, if he iterates over all the queues and they're all empty, he keeps on looping rather than waiting for one of them to contain something (as seen by the outer while).
How can I make the consumer wait until one of the queues has something in it?
I'm having an issue with the following scenario: Lets say there are only 2 queues. The consumer checked the first one and it was empty. Just as he's checking the second one (which is also empty), the producer put something in the first queue. As far as the consumer is concerned, the queues are both empty and so he should wait (even though one of them isn't empty anymore and he should continue looping).
Edit:
One last thing. This is an exercise for me. I'm trying to implement the synchronisation myself. So if any of the java libraries have a solution that implements this I'm not interested in it. I'm trying to understand how I can implement this.
#Abe was close. I would use signal and wait - use the Object class built-ins as they are the lightest weight.
Object sync = new Object(); // Can use an existing object if there's an appropriate one
// On submit to queue
synchronized ( sync ) {
queue.add(...); // Must be inside to avoid a race condition
sync.notifyAll();
}
// On check for work in queue
synchronized ( sync ) {
item = null;
while ( item == null ) {
// Need to check all of the queues - if there will be a large number, this will be slow,
// and slow critical sections (synchronized blocks) are very bad for performance
item = getNextQueueItem();
if ( item == null ) {
sync.wait();
}
}
}
Note that sync.wait releases the lock on sync until the notify - and the lock on sync is required to successfully call the wait method (it's a reminder to the programmer that some type of critical section is really needed for this to work reliably).
By the way, I would recommend a queue dedicated to the consumer (or group of consumers) rather than a queue dedicated to the producer, if feasible. It will simplify the solution.
If you want to block across multiple queues, then one option is to use java's Lock and Condition objects and then use the signal method.
So whenever the producer has data, it should invoke the signallAll.
Lock fileLock = new ReentrantLock();
Condition condition = fileLock.newCondition();
...
// producer has to signal
condition.signalAll();
...
// consumer has to await.
condition.await();
This way only when the signal is provided will the consumer go and check the queues.
I solved a similar situation along the lines of what #Abe suggests, but settled on using a Semaphore in combination with an AtomicBoolean and called it a BinarySemaphore. It does require the producers to be modified so that they signal when there is something to do.
Below the code for the BinarySemaphore and a general idea of what the consumer work-loop should look like:
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
public class MultipleProdOneConsumer {
BinarySemaphore workAvailable = new BinarySemaphore();
class Consumer {
volatile boolean stop;
void loop() {
while (!stop) {
doWork();
if (!workAvailable.tryAcquire()) {
// waiting for work
try {
workAvailable.acquire();
} catch (InterruptedException e) {
if (!stop) {
// log error
}
}
}
}
}
void doWork() {}
void stopWork() {
stop = true;
workAvailable.release();
}
}
class Producer {
/* Must be called after work is added to the queue/made available. */
void signalSomethingToDo() {
workAvailable.release();
}
}
class BinarySemaphore {
private final AtomicBoolean havePermit = new AtomicBoolean();
private final Semaphore sync;
public BinarySemaphore() {
this(false);
}
public BinarySemaphore(boolean fair) {
sync = new Semaphore(0, fair);
}
public boolean release() {
boolean released = havePermit.compareAndSet(false, true);
if (released) {
sync.release();
}
return released;
}
public boolean tryAcquire() {
boolean acquired = sync.tryAcquire();
if (acquired) {
havePermit.set(false);
}
return acquired;
}
public boolean tryAcquire(long timeout, TimeUnit tunit) throws InterruptedException {
boolean acquired = sync.tryAcquire(timeout, tunit);
if (acquired) {
havePermit.set(false);
}
return acquired;
}
public void acquire() throws InterruptedException {
sync.acquire();
havePermit.set(false);
}
public void acquireUninterruptibly() {
sync.acquireUninterruptibly();
havePermit.set(false);
}
}
}
We need to lock a method responsible for loading database date into a HashMap based cache.
A possible situation is that a second thread tries to access the method while the first method is still loading cache.
We consider the second thread's effort in this case to be superfluous. We would therefore like to have that second thread wait until the first thread is finished, and then return (without loading the cache again).
What I have works, but it seems quite inelegant. Are there better solutions?
private static final ReentrantLock cacheLock = new ReentrantLock();
private void loadCachemap() {
if (cacheLock.tryLock()) {
try {
this.cachemap = retrieveParamCacheMap();
} finally {
cacheLock.unlock();
}
} else {
try {
cacheLock.lock(); // wait until thread doing the load is finished
} finally {
try {
cacheLock.unlock();
} catch (IllegalMonitorStateException e) {
logger.error("loadCachemap() finally {}",e);
}
}
}
}
I prefer a more resilient approach using read locks AND write locks. Something like:
private static final ReadWriteLock cacheLock = new ReentrantReadWriteLock();
private static final Lock cacheReadLock = cacheLock.readLock();
private static final Lock cacheWriteLock = cacheLock.writeLock();
private void loadCache() throws Exception {
// Expiry.
while (storeCache.expired(CachePill)) {
/**
* Allow only one in - all others will wait for 5 seconds before checking again.
*
* Eventually the one that got in will finish loading, refresh the Cache pill and let all the waiting ones out.
*
* Also waits until all read locks have been released - not sure if that might cause problems under busy conditions.
*/
if (cacheWriteLock.tryLock(5, TimeUnit.SECONDS)) {
try {
// Got a lock! Start the rebuild if still out of date.
if (storeCache.expired(CachePill)) {
rebuildCache();
}
} finally {
cacheWriteLock.unlock();
}
}
}
}
Note that the storeCache.expired(CachePill) detects a stale cache which may be more than you are wanting but the concept here is the same, establish a write lock before updating the cache which will deny all read attempts until the rebuild is done. Also, manage multiple attempts at write in a loop of some sort or just drop out and let the read lock wait for access.
A read from the cache now looks like this:
public Object load(String id) throws Exception {
Store store = null;
// Make sure cache is fresh.
loadCache();
try {
// Establish a read lock so we do not attempt a read while teh cache is being updated.
cacheReadLock.lock();
store = storeCache.get(storeId);
} finally {
// Make sure the lock is cleared.
cacheReadLock.unlock();
}
return store;
}
The primary benefit of this form is that read access does not block other read access but everything stops cleanly during a rebuild - even other rebuilds.
You didn't say how complicated your structure is and how much concurrency / congestion you need. There are many ways to address your need.
If your data is simple, use a ConcurrentHashMap or similar to hold your data. Then just read and write in threads regardlessly.
Another alternative is to use actor model and put read/write on the same queue.
If all you need is to fill a read-only map which is initialized from database once requested, you could use any form of double-check locking which may be implemented in a number of ways. The easiest variant would be the following:
private volatile Map<T, V> cacheMap;
public void loadCacheMap() {
if (cacheMap == null) {
synchronized (this) {
if (cacheMap == null) {
cacheMap = retrieveParamCacheMap();
}
}
}
}
But I would personally prefer to avoid any form of synchronization here and just make sure that the initialization is done before any other thread can access it (for example in a form of init method in a DI container). In this case you would even avoid overhead of volatile.
EDIT: The answer works only when initial load is expected. In case of multiple updates, you could try to replace the tryLock by some other form of test and test-and-set, for example using something like this:
private final AtomicReference<CountDownLatch> sync =
new AtomicReference<>(new CountDownLatch(0));
private void loadCacheMap() {
CountDownLatch oldSync = sync.get();
if (oldSync.getCount() == 0) { // if nobody updating now
CountDownLatch newSync = new CountDownLatch(1);
if (sync.compareAndSet(oldSync, newSync)) {
cacheMap = retrieveParamCacheMap();
newSync.countDown();
return;
}
}
sync.get().await();
}
I have 90 IDs that I need to something like on the image below. I want the last ID to be popped first and if there are new IDs added to the stack I want to push them on the end of it. Last In Last Out. Does something like this exists already? I know I could use other collection implementations but I wonder if there is a stack like this already made.
Queue is an interface with multiple implementations (including such things as blocking queues suitable for multi-threaded solutions)
You probably want to have a FIFO (first-in-first-out) queue.
First have a look at Javadoc from java.util.Queue.
There exist several implementations:
java.util.LinkedList
java.util.concurrent.LinkedBlockingQueue
java.util.concurrent.ArrayBlockingQueue
You could use a Queue<E>.
looks like a normal queue implementation, with the elements added to the queue in reverse order to start off with.
You may use sort of a Queue<E> implementation which is provided by java (see queue implementations).
Another possible Option would be to use a LinkedList<E> (see.: http://download.oracle.com/javase/1.4.2/docs/api/java/util/LinkedList.html)
It offers all methods you need. Especially because your description seems so if you are not totally sure about the behavior you want.
A Queue<E> should be preferred over a LinkedList<E> at least for large collections without the need of random access.
Here's some code to get you started:
private static BlockingQueue<String> queue = new LinkedBlockingQueue<String>();
public static void main(String args[]) throws InterruptedException {
// Start a thread that puts stuff on the queue
new Thread(new Runnable() {
public void run() {
while (true) {
try {
queue.put("Some message");
Thread.sleep(100);
}
catch (InterruptedException e) {
// Handle interruption
}
}
}
}).start();
// Start a thread that takes stuff from the queue (LILO)
new Thread(new Runnable() {
public void run() {
while (true) {
try {
String message = queue.take(); // Waits if necessary for something to arrive
// Do something with message
Thread.sleep(100);
}
catch (InterruptedException e) {
// Handle interruption
}
}
}
}).start();
Thread.currentThread().wait();
}
I have an BlockingQueue<Runnable>(taken from ScheduledThreadPoolExecutor) in producer-consumer environment. There is one thread adding tasks to the queue, and a thread pool executing them.
I need notifications on two events:
First item added to empty queue
Last item removed from queue
Notification = writing a message to database.
Is there any sensible way to implement that?
A simple and naïve approach would be to decorate your BlockingQueue with an implementation that simply checks the underlying queue and then posts a task to do the notification.
NotifyingQueue<T> extends ForwardingBlockingQueue<T> implements BlockingQueue<T> {
private final Notifier notifier; // injected not null
…
#Override public void put(T element) {
if (getDelegate().isEmpty()) {
notifier.notEmptyAnymore();
}
super.put(element);
}
#Override public T poll() {
final T result = super.poll();
if ((result != null) && getDelegate().isEmpty())
notifier.nowEmpty();
}
… etc
}
This approach though has a couple of problems. While the empty -> notEmpty is pretty straightforward – particularly for a single producer case, it would be easy for two consumers to run concurrently and both see the queue go from non-empty -> empty.
If though, all you want is to be notified that the queue became empty at some time, then this will be enough as long as your notifier is your state machine, tracking emptiness and non-emptiness and notifying when it changes from one to the other:
AtomicStateNotifier implements Notifier {
private final AtomicBoolean empty = new AtomicBoolean(true); // assume it starts empty
private final Notifier delegate; // injected not null
public void notEmptyAnymore() {
if (empty.get() && empty.compareAndSet(true, false))
delegate.notEmptyAnymore();
}
public void nowEmpty() {
if (!empty.get() && empty.compareAndSet(false, true))
delegate.nowEmpty();
}
}
This is now a thread-safe guard around an actual Notifier implementation that perhaps posts tasks to an Executor to asynchronously write the events to the database.
The design is most likely flawed but you can do it relatively simple:
You have a single thread adding, so you can check before adding. i.e. pool.getQueue().isEmpty() - w/ one producer, this is safe.
Last item removed cannot be guaranteed but you can override beforeExecute and check the queue again. Possibly w/ a small timeout after isEmpty() returns true. Probably the code below will be better off executed in afterExecute instead.
protected void beforeExecute(Thread t, Runnable r) {
if (getQueue().isEmpty()){
try{
Runnable r = getQueue().poll(200, TimeUnit.MILLISECONDS);
if (r!=null){
execute(r);
} else{
//last message - or on after execute by Setting a threadLocal and check it there
//alternatively you may need to do so ONLY in after execute, depending on your needs
}
}catch(InterruptedException _ie){
Thread.currentThread().interrupt();
}
}
}
sometime like that
I can explain why doing notifications w/ the queue itself won't work well: imagine you add a task to be executed by the pool, the task is scheduled immediately, the queue is empty again and you will need notification.