I have an issue and a vague idea for how to fix it, but I'll try to overshare on the context to avoid an XY problem.
I have an asychronous method that immediately returns a guava ListenableFuture that I need to call hundreds of thousands or millions of times (the future itself can take a little while to complete). I can't really change the internals of that method. There's some serious resource contention involved in it internally, so I'd like to limit the number of calls to that method that happen at once. So I tried using a Semaphore:
public class ConcurrentCallsLimiter<In, Out> {
private final Function<In, ListenableFuture<Out>> fn;
private final Semaphore semaphore;
private final Executor releasingExecutor;
public ConcurrentCallsLimiter(
int limit, Executor releasingExecutor,
Function<In, ListenableFuture<Out>> fn) {
this.semaphore = new Semaphore(limit);
this.fn = fn;
this.releasingExecutor = releasingExecutor;
}
public ListenableFuture<Out> apply(In in) throws InterruptedException {
semaphore.acquire();
ListenableFuture<Out> result = fn.apply(in);
result.addListener(() -> semaphore.release(), releasingExecutor);
return result;
}
}
So then I can just wrap my call in this class and call that instead:
ConcurrentLimiter<Foo, Bar> cl =
new ConcurrentLimiter(10, executor, someService::turnFooIntoBar);
for (Foo foo : foos) {
ListenableFuture<Bar> bar = cl.apply(foo);
// handle bar (no pun intended)
}
This sort of works. The issue is that the tail latency is really bad. Some calls get "unlucky" and end up taking a long time trying to acquire resources within that method call. This is exacerbated by some internal exponential-backoff logic such that the unlucky call gets less and less of a chance to acquire the needed resources, compared to new calls that are more eager and wait a much shorter time before trying again.
What would be ideal to fix it is if there was something similar to that semaphore but that had a notion of order. For instance if the limit is 10, then the 11th call currently must wait for any of the first 10 to complete. What I'd like is for the 11th call to have to wait for the very first call to complete. That way "unlucky" calls aren't continuing to be starved by new calls constantly coming in.
It seems like I could assign an integer sequence number to the calls and somehow keep track of the lowest one yet to finish, but couldn't quite see how to make that work, especially since there's not really any useful "waiting" methods on AtomicInteger or whatever.
You can create semaphore with fairness paramter:
Semaphore(int permits, boolean fair)
//fair - true if this semaphore will guarantee first-in first-out granting of permits under contention, else false
One less-than-ideal solution I thought of was using a Queue to store all of the futures:
public class ConcurrentCallsLimiter<In, Out> {
private final Function<In, ListenableFuture<Out>> fn;
private final int limit;
private final Queue<ListenableFuture<Out>> queue = new ArrayDeque<>();
public ConcurrentCallsLimiter(int limit, Function<In, ListenableFuture<Out>> fn) {
this.limit = limit;
this.fn = fn;
}
public ListenableFuture<Out> apply(In in) throws InterruptedException {
if (queue.size() == limit) {
queue.remove().get();
}
ListenableFuture<Out> result = fn.apply(in);
queue.add(result);
return result;
}
}
However, this seems like a big waste of memory. The objects involved could be a bit big and the limit could be set pretty high. So I'm open to better answers that don't have O(n) memory usage. It seems like it should be doable in O(1).
"if there was something similar to that semaphore but that had a notion of order."
There exists such a beast. It is called Blocking queue. Create such a queue, put 10 items there, and use take instead of acquire and put instead of release.
Related
Working on something where I'm trying to count the number of times something is happening. Instead of spamming the database with millions of calls, I'm trying to sum the updates in-memory and then dumping the results into the database once per second (so like turning 10 +1s into a single +10)
I've noticed some strange inconsistency with the counts (like there should be exactly 1 million transactions but instead there are 1,000,016 or something).
I'm looking into other possible causes but I wanted to check that this is the correct way of doing things. The use case is that it needs to be eventually correct, so it's okay as long as the counts aren't double counted or dropped.
Here is my sample implementation.
public class Aggregator {
private Map<String, LongAdder> transactionsPerUser = new ConcurrentHashMap<>();
private StatisticsDAO statisticsDAO;
public Aggregator(StatisticsDAO statisticsDAO) {
this.statisticsDAO = statisticsDAO;
}
public void incrementCount(String userId) {
transactionsPerId.computeIfAbsent(userId, k -> new LongAdder()).increment();
}
#Scheduled(every = "1s")
public void sendAggregatedStatisticsToDatabase() {
for (String userId : transactionsPerUser.keySet()) {
long count = transactionsPerUser.remove(userId).sum();
statisticsDAO.updateCount(userId, count);
}
}
}
You will have updates dropped in the following scenario:
Thread A calls incrementCount, and finds an already existing LongAdder instance for the given userId, this instance is returned from computeIfAbsent.
Thread B is at the same time handling a sendAggregatedStatisticsToDatabase call, which removes that LongAdder instance from the map.
Thread B calls sum() on the LongAdder instance.
Thread A, still executing that same incrementCount invocation, now calls increment() on the LongAdder instance.
This update is now dropped. It will not be seen by the next invocation of sendAggregatedStatisticsToDatabase, because the increment() call happened on an instance that was removed from the map in between the calls to computeIfAbsent() and increment() in the incrementCount method.
You might be better off reusing the LongAdder instances by doing something like this in sendAggregatedStatisticsToDatabase:
LongAdder longAdder = transactionsPerUser.get(userId);
long count = longAdder.sum();
longAdder.add(-count);
I agree with the answer of #NorthernSky. My answer should be seen as an alternative solution to the problem. Specifically addressing the comments on the accepted answer, saying that a correct and performant solution would be more complex.
I would propose to use a producer/consumer pattern here, using an unbounded blocking queue. The producers call incrementCount() which just adds a userId to the queue.
The consumer is scheduled to run every second and reads the queue into a HashMap, and then pushes the map's data to the DAO.
public class Aggregator {
private final Queue<String> queue = new LinkedBlockingQueue<>();
private final StatisticsDao statisticsDAO;
public Aggregator(StatisticsDao statisticsDAO) {
this.statisticsDAO = statisticsDAO;
}
public void incrementCount(String userId) {
queue.add(userId);
}
#Scheduled(every = "1s")
public void sendAggregatedStatisticsToDatabase() {
int size = queue.size();
HashMap<String, LongAdder> counts = new HashMap<>();
for (int i = 0; i < size; i++) {
counts.computeIfAbsent(queue.remove(), k -> new LongAdder()).increment();
}
counts.forEach((userId, adder) -> statisticsDAO.updateCount(userId, adder.sum()));
}
}
Even better would be to not have a scheduled consumer, but one that keeps reading from the queue into a local HashMap until a timout happens or a size threshold is reached, or even when the queue is empty.
Then it would process the current map and push it entirly into the DAO, clear the map and start reading the queue again until the next time there's enough data to process.
Goal: To know, as I fork off a thread, which processor it's going to land on. Is that possible? Regardless of whether the underlying approach is valid, is there a good answer to that narrow question? Thanks.
(Right now I need to make a copy of one of our classes for each thread, write to it in that thread and merge them all later. Using a synchronized approach is not possible because my Java expert boss thinks it's a bad idea, and after a lot of discussion I agree. If I knew which processor each thread would land on, I would only need to make as many copies of that class as there are processors.)
We use Apache Spark to get our jobs spread across a cluster, but in our application is makes sense to run one big executor and then do some multi-threading of our own out on each machine in the cluster.
I could save a lot of deep copying if I could know which processor a thread is being sent to, is that possible? I threw in our code but it's probably more of a conceptual question:
When I get down to the "do task" part of compute(), can I know which processor it's running on?
public class TholdExecutor extends RecursiveTask<TholdDropEvaluation> {
final static Logger logger = LoggerFactory.getLogger(TholdExecutor.class);
private List<TholdDropResult> partitionOfN = new ArrayList<>();
private int coreCount;
private int desiredPartitionSize; // will be updated by whatever is passed into the constructor per-chromosome
private TholdDropEvaluation localDropEvaluation; // this DropEvaluation
private TholdDropResult mSubI_DR;
public TholdExecutor(List<TholdDropResult> subsetOfN, int cores, int partSize, TholdDropEvaluation passedDropEvaluation, TholdDropResult mDrCopy) {
partitionOfN = subsetOfN;
coreCount = cores;
desiredPartitionSize = partSize;
// the TholdDropEvaluation needs to be a copy for each thread? It can't be the same one passed to threads ... so ...
TholdDropEvaluation localDropEvaluation = makeDECopy(passedDropEvaluation); // THIS NEEDS TO BE A DEEP COPY OF THE DROP EVAL!!! NOT THE ORIGINAL!!
// we never modify the TholdDropResult that is passed in, we just need to read it all on the same JVM/worker, so
mSubI_DR = mDrCopy; // this is purely a reference and can point to the passed in value (by reference, right?)
}
// this makes a deep copy of the TholdDropEvaluation for each thread, we copy the SharingRun's startIndex and endIndex only,
// as LEG events will be calculated during the subsequent dropComparison. The constructor for TholdDropEvaluation must set
// LEG events to zero.
private void makeDECopy(TholdDropEvaluation passedDropEvaluation) {
TholdDropEvaluation tholdDropEvaluation = new TholdDropEvaluation();
// iterate through the SharingRuns in the SharingRunList from the TholdDropEval that was passed in
for (SharingRun sr : passedDropEvaluation.getSharingRunList()) {
SharingRun ourSharingRun = new SharingRun();
ourSharingRun.startIndex = sr.startIndex;
ourSharingRun.endIndex = sr.endIndex;
tholdDropEvaluation.addSharingRun(ourSharingRun);
}
return tholdDropEvaluation
}
#Override
protected TholdDropEvaluation compute() {
int simsToDo = partitionOfN.size();
UUID tag = UUID.randomUUID();
long computeStartTime = System.nanoTime();
if (simsToDo <= desiredPartitionSize) {
logger.debug("IN MULTI-THREAD compute() --- UUID {}:Evaluating partitionOfN sublist length", tag, simsToDo);
// job within size limit, do the task and return the completed TholdDropEvaluation
// iterate through each TholdDropResult in the sub-partition and do the dropComparison to the refernce mSubI_DR,
// writing to the copy of the DropEval in tholdDropEvaluation
for (TholdDropResult currentResult : partitionOfN) {
mSubI_DR.dropComparison(currentResult, localDropEvaluation);
}
} else {
// job too large, subdivide and call this recursively
int half = simsToDo / 2;
logger.info("Splitting UUID = {}, half is {} and simsToDo is {}", tag, half, simsToDo );
TholdExecutor nextExec = new TholdExecutor(partitionOfN.subList(0, half), coreCount, desiredPartitionSize, tholdDropEvaluation, mSubI_DR);
TholdExecutor futureExec = new TholdExecutor(partitionOfN.subList(half, simsToDo), coreCount, desiredPartitionSize, tholdDropEvaluation, mSubI_DR);
nextExec.fork();
TholdDropEvaluation futureEval = futureExec.compute();
TholdDropEvaluation nextEval = nextExec.join();
tholdDropEvaluation.merge(futureEval);
tholdDropEvaluation.merge(nextEval);
}
logger.info("{} Compute time is {} ns",tag, System.nanoTime() - computeStartTime);
// NOTE: this was inside the else block in Rob's example, but don't we want it outside the block so it's returned
// whether
return tholdDropEvaluation;
}
}
Even if you could figure out where a thread would run initially there's no reason to assume it would live on that processor/core for the rest of its life. In all probability for any task big enough to be worth the cost of spawning a thread it won't, so you'd need to control where it ran completely to offer that level of assurance.
As far as I know there's no standard mechanism for controlling mappings from threads to processor cores inside Java. Typically that's known as "thread affinity" or "processor affinity". On Windows and Linux for example you can control that using:
Windows: SetThreadAffinityMask
Linux: sched_setaffinity or pthread_setaffinity_np
so in theory you could write some C and JNI code that allowed you to abstract this enough on the Java hosts you cared about to make it work.
That feels like the wrong solution to the real problem you seem to be facing, because you end up withdrawing options from the OS scheduler, which potentially doesn't allow it to make the smartest scheduling decisions causing total runtime to increase. Unless you're pushing an unusual workload and modelling/querying processor information/topology down to the level of NUMA and shared caches it ought to do a better job of figuring out where to run threads for most workloads than you could. Your JVM typically runs a large number of additional threads besides just the ones you explicitly create from after main() gets called. Additionally I wouldn't like to promise anything about what the JVM you run today (or even tomorrow) might decide to do on its own about thread affinity.
Having said that it seems like the underlying problem is that you want to have one instance of an object per thread. Typically that's much easier than predicting where a thread will run and then manually figuring out a mapping between N processors and M threads at any point in time. Usually you'd use "thread local storage" (TLS) to solve this problem.
Most languages provide this concept in one form or another. In Java this is provided via the ThreadLocal class. There's an example in the linked document given:
public class ThreadId {
// Atomic integer containing the next thread ID to be assigned
private static final AtomicInteger nextId = new AtomicInteger(0);
// Thread local variable containing each thread's ID
private static final ThreadLocal<Integer> threadId =
new ThreadLocal<Integer>() {
#Override protected Integer initialValue() {
return nextId.getAndIncrement();
}
};
// Returns the current thread's unique ID, assigning it if necessary
public static int get() {
return threadId.get();
}
}
Essentially there are two things you care about:
When you call get() it returns the value (Object) belonging to the current thread
If you call get in a thread which currently has nothing it will call initialValue() you implement, which allows you to construct or obtain a new object.
So in your scenario you'd probably want to deep copy the initial version of some local state from a read-only global version.
One final point of note: if your goal is to divide and conquer; do some work on lots of threads and then merge all their results to one answer the merging part is often known as a reduction. In that case you might be looking for MapReduce which is probably the most well known form of parallelism using reductions.
Working with the classic multiple Consumer/Producer problem, and I have an issue that is driving me around the bend, regarding how to avoid race conditions when inserting/removing from a circular buffer. Appreciate any help in advance!
Sample code for circular buffer for example purposes. Similar to my implementation (Note: I cannot use collection types, only arrays for this):
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
public class BoundedBuffer {
private final String[] buffer;
private final int capacity;
private int front;
private int rear;
private int count;
private final Lock lock = new ReentrantLock();
private final Condition notFull = lock.newCondition();
private final Condition notEmpty = lock.newCondition();
public BoundedBuffer(int capacity) {
super();
this.capacity = capacity;
buffer = new String[capacity];
}
public void deposit(String data) throws InterruptedException {
lock.lock();
try {
while (count == capacity) {
notFull.await();
}
buffer[rear] = data;
rear = (rear + 1) % capacity;
count++;
notEmpty.signal();
} finally {
lock.unlock();
}
}
public String fetch() throws InterruptedException {
lock.lock();
try {
while (count == 0) {
notEmpty.await();
}
String result = buffer[front];
front = (front + 1) % capacity;
count--;
notFull.signal();
return result;
} finally {
lock.unlock();
}
}
}
What I need to know is how can I implement a method for checking if the buffer is full/Empty? This method needs to be included in this BoundedBuffer and must be called from another class (Producer/Consumer) before giving the go ahead for/Calling Inserting/Writing methods.
Pseudocode for method in Producer class.
if (!bufferFull) {
buffer.addelement;
}
else {
thread.sleep(5)
threadHasSleptFor++;
}
I am using threads, and there are multiple producers/consumers (In this case 2 producers/consumers, but I may require more). I need it so that if the buffer is full, the thread has to wait until it becomes available for insertion, and the time it waits needs to be stored for output purposes (Not debug, part of the core features). The issue I am having is this:
Thread 1 Producer checks is bufferfull condition, it's false.
Scheduler switches to Thread 2 midway.
Thread 2 also checks bufferfull condition, it's false.
thread 2 proceeds to insert.
Scheduler switches back to Thread 1.
Thread 1 now goes to insert line, as it already checked, but Thread 2 beat it.
Booom.
Somewhat new to Java, though as I understand this is the "time-of-check/time-of-use" race condition issue.
Can someone please advise as to how this can be implemented safely, and how would I loop the code so the threadHasSleptFor variable keeps incrementing on every fail (Providing the methods would be great). I want it so that only the Thread that has requested the check can begin to insert item; the second producer must wait for the lock.
Thanks.
This is by definition impossible to do without higher level locking.
You have to guarantee that the check of whether the buffer is full or not and the following insert are atomic from the thread's perspective which means you have to acquire some common lock to do so. This general problem is indeed called Time of check to time to use and leads to many interesting race conditions down the line.
The solution to these problems is to not check if you can do an operation and then do it, but to just try the operation and handle the error case. So if you don't want to block if the buffer is full with your operation, just implement a tryDeposit method that throws an exception if it can't store a value, or return a boolean success value.
Although in your case if you have to store the time necessary before you could push the value onto the stack, I don't see why a simple:
long start = System.nanotime();
queue.deposit();
long end = System.nanotime();
wouldn't do the trick as well.
If I understand you correctly, you are asking how to make a thread wait until it's OK to call deposit() or wait until it's OK to call fetch(). But, there's no need for that. Your deposit() method will block the calling thread until there is room in the queue, and your fetch() method will block the caller until there is something to fetch. That's what the notFull.await() and notEmpty.await() calls do.
await() unlocks the lock, sleeps until the condition is signalled by another thread, and then it re-locks the lock. The condition may or may not still be true when the caller finally gets the lock again, but that's why you have the await() calls in loops, so that the thread keeps trying until finally it has the lock and the condition is true. Then it does its work (add an item or remove an item), unlocks the lock, and returns.
I have a Java method that performs two computations over an input set: an estimated and an accurate answer. The estimate can always be computed cheaply and in reliable time. The accurate answer can sometimes be computed in acceptable time and sometimes not (not known a priori ... have to try and see).
What I want to set up is some framework where if the accurate answer takes too long (a fixed timeout), the pre-computed estimate is used instead. I figured I'd use a thread for this. The main complication is that the code for computing the accurate answer relies on an external library, and hence I cannot "inject" Interrupt support.
A standalone test-case for this problem is here, demonstrating my problem:
package test;
import java.util.Random;
public class InterruptableProcess {
public static final int TIMEOUT = 1000;
public static void main(String[] args){
for(int i=0; i<10; i++){
getAnswer();
}
}
public static double getAnswer(){
long b4 = System.currentTimeMillis();
// have an estimate pre-computed
double estimate = Math.random();
//try to get accurate answer
//can take a long time
//if longer than TIMEOUT, use estimate instead
AccurateAnswerThread t = new AccurateAnswerThread();
t.start();
try{
t.join(TIMEOUT);
} catch(InterruptedException ie){
;
}
if(!t.isFinished()){
System.err.println("Returning estimate: "+estimate+" in "+(System.currentTimeMillis()-b4)+" ms");
return estimate;
} else{
System.err.println("Returning accurate answer: "+t.getAccurateAnswer()+" in "+(System.currentTimeMillis()-b4)+" ms");
return t.getAccurateAnswer();
}
}
public static class AccurateAnswerThread extends Thread{
private boolean finished = false;
private double answer = -1;
public void run(){
//call to external, non-modifiable code
answer = accurateAnswer();
finished = true;
}
public boolean isFinished(){
return finished;
}
public double getAccurateAnswer(){
return answer;
}
// not modifiable, emulate an expensive call
// in practice, from an external library
private double accurateAnswer(){
Random r = new Random();
long b4 = System.currentTimeMillis();
long wait = r.nextInt(TIMEOUT*2);
//don't want to use .wait() since
//external code doesn't support interruption
while(b4+wait>System.currentTimeMillis()){
;
}
return Math.random();
}
}
}
This works fine outputting ...
Returning estimate: 0.21007465651836377 in 1002 ms
Returning estimate: 0.5303547292361411 in 1001 ms
Returning accurate answer: 0.008838428149438915 in 355 ms
Returning estimate: 0.7981717302567681 in 1001 ms
Returning estimate: 0.9207406241557682 in 1000 ms
Returning accurate answer: 0.0893839926072787 in 175 ms
Returning estimate: 0.7310211480220586 in 1000 ms
Returning accurate answer: 0.7296754467596422 in 530 ms
Returning estimate: 0.5880164300851529 in 1000 ms
Returning estimate: 0.38605296260291233 in 1000 ms
However, I have a very large input set (in the order of billions of items) to run my analysis over, and I'm uncertain as to how to clean up the threads that do not finish (I do not want them running in the background).
I know that various methods to destroy threads are deprecated with good reason. I also know that the typical way to stop a thread is to use interrupts. However, in this case, I don't see that I can use an interrupt since the run() method passes a single call to an external library.
How can I kill/clean-up threads in this case?
If you know enough about the external library, such as:
never acquires any locks;
never opens any files/network connections;
never involves any I/O whatsoever, not even logging;
then it may be safe to use Thread#stop on it. You could try it and do extensive stress testing. Any resource leaks should manifest themselves soon enough.
I'd try it to see if it will respond to an Thread.interrupt(). Reduce your data of course so it doesn't run forever, but if it responds to an interrupt() then you're home free. If they lock anything, perform a wait(), or sleep() the code will have to handle the InterruptedException and it's possible the author did what was right. They may swallow it and continue, but it's possible they didn't.
While technically you can call Thread.stop() you'll need to know everything about that code to know for sure if it's safe and you won't leak resources. However, doing that research will clue you into how you could easily modify the code to look for interrupt() as well. You'll pretty much have to have the source code to audit it to know for sure which means you could easily do the right thing and add the checks there without involving as much research to know if its safe to call Thread.stop().
The other option is to cause a RuntimeException in the thread. Try nulling a reference it might have or closing some IO (socket, file handle, etc). Modify the array of data it's walking over by changing the size or null out the data. There's something you can do to cause it to throw an exception and that is not handled and it will shutdown.
Extending on the answer by chubbsondubs, if the third-party library uses some well-defined API (such as java.util.List or some library-specific API) to access the input data set, you could wrap the input data set that you pass to the third-party code with a wrapper class that will throw exceptions, e.g. in the List.get method, after a cancel flag is set.
For instance, if you pass a List to your third-party library, then it might be possible to do something along the lines of:
class CancelList<T> implements List<T> {
private final List<T> wrappedList;
private volatile boolean canceled = false;
public CancelList(List<T> wrapped) { this.wrappedList = wrapped; }
public void cancel() { this.canceled = true; }
public T get(int index) {
if (canceled) { throw new RuntimeException("Canceled!"); }
return wrappedList.get(index);
}
// Other List method implementations here...
}
public double getAnswer(List<MyType> inputList) {
CancelList<MyType> cancelList = new CancelList<MyType>(inputList);
AccurateAnswerThread t = new AccurateAnswerThread(cancelList);
t.start();
try{
t.join(TIMEOUT);
} catch(InterruptedException ie){
cancelList.cancel();
}
// Get the result of your calculation here...
}
Of course, this approach depends on a few things:
You must know the third-party code well-enough to know what methods it calls that you can control through input parameters.
The third-party code would need to make frequent calls to these methods throughout the computation process (i.e. it won't work if it copies all the data at once into an internal structure and does its computation there).
Obviously this won't work if the library catches and handles runtime exceptions and continues processing.
I have a multithreaded application, where a shared list has write-often, read-occasionally behaviour.
Specifically, many threads will dump data into the list, and then - later - another worker will grab a snapshot to persist to a datastore.
This is similar to the discussion over on this question.
There, the following solution is provided:
class CopyOnReadList<T> {
private final List<T> items = new ArrayList<T>();
public void add(T item) {
synchronized (items) {
// Add item while holding the lock.
items.add(item);
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>();
synchronized (items) {
// Make a copy while holding the lock.
for (T t : items) copy.add(t);
}
return copy;
}
}
However, in this scenario, (and, as I've learned from my question here), only one thread can write to the backing list at any given time.
Is there a way to allow high-concurrency writes to the backing list, which are locked only during the makeSnapshot() call?
synchronized (~20 ns) is pretty fast and even though other operations can allow concurrency, they can be slower.
private final Lock lock = new ReentrantLock();
private List<T> items = new ArrayList<T>();
public void add(T item) {
lock.lock();
// trivial lock time.
try {
// Add item while holding the lock.
items.add(item);
} finally {
lock.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(), ret;
lock.lock();
// trivial lock time.
try {
ret = items;
items = copy;
} finally {
lock.unlock();
}
return ret;
}
public static void main(String... args) {
long start = System.nanoTime();
Main<Integer> ints = new Main<>();
for (int j = 0; j < 100 * 1000; j++) {
for (int i = 0; i < 1000; i++)
ints.add(i);
ints.makeSnapshot();
}
long time = System.nanoTime() - start;
System.out.printf("The average time to add was %,d ns%n", time / 100 / 1000 / 1000);
}
prints
The average time to add was 28 ns
This means if you are creating 30 million entries per second, you will have one thread accessing the list on average. If you are creating 60 million per second, you will have concurrency issues, however you are likely to be having many more resourcing issue at this point.
Using Lock.lock() and Lock.unlock() can be faster when there is a high contention ratio. However, I suspect your threads will be spending most of the time building the objects to be created rather than waiting to add the objects.
You could use a ConcurrentDoublyLinkedList. There is an excellent implementation here ConcurrentDoublyLinkedList.
So long as you iterate forward through the list when you make your snapshot all should be well. This implementation preserves the forward chain at all times. The backward chain is sometimes inaccurate.
First of all, you should investigate if this really is too slow. Adds to ArrayLists are O(1) in the happy case, so if the list has an appropriate initial size, CopyOnReadList.add is basically just a bounds check and an assignment to an array slot, which is pretty fast. (And please, do remember that CopyOnReadList was written to be understandable, not performant.)
If you need a non-locking operation, you can have something like this:
class ConcurrentStack<T> {
private final AtomicReference<Node<T>> stack = new AtomicReference<>();
public void add(T value){
Node<T> tail, head;
do {
tail = stack.get();
head = new Node<>(value, tail);
} while (!stack.compareAndSet(tail, head));
}
public Node<T> drain(){
// Get all elements from the stack and reset it
return stack.getAndSet(null);
}
}
class Node<T> {
// getters, setters, constructors omitted
private final T value;
private final Node<T> tail;
}
Note that while adds to this structure should deal pretty well with high contention, it comes with several drawbacks. The output from drain is quite slow to iterate over, it uses quite a lot of memory (like all linked lists), and you also get things in the opposite insertion order. (Also, it's not really tested or verified, and may actually suck in your application. But that's always the risk with using code from some random dude on the intertubes.)
Yes, there is a way. It is similar to the way ConcurrentHashMap made, if you know.
You should make your own data structure not from one list for all writing threads, but use several independent lists. Each of such lists should be guarded by it's own lock. .add() method should choose list for append current item based on Thread.currentThread.id (for example, just id % listsCount). This will gives you good concurrency properties for .add() -- at best, listsCount threads will be able to write without contention.
On makeSnapshot() you should just iterate over all lists, and for each list you grab it's lock and copy content.
This is just an idea -- there are many places to improve it.
You can use a ReadWriteLock to allow multiple threads to perform add operations on the backing list in parallel, but only one thread to make the snapshot. While the snapshot is being prepared all other add and snapshot request are put on hold.
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
class CopyOnReadList<T> {
// free to use any concurrent data structure, ConcurrentLinkedQueue used as an example
private final ConcurrentLinkedQueue<T> items = new ConcurrentLinkedQueue<T>();
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock shared = rwLock.readLock();
private final Lock exclusive = rwLock.writeLock();
public void add(T item) {
shared.lock(); // multiple threads can attain the read lock
// try-finally is overkill if items.add() never throws exceptions
try {
// Add item while holding the lock.
items.add(item);
} finally {
shared.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(); // probably better idea to use a LinkedList or the ArrayList constructor with initial size
exclusive.lock(); // only one thread can attain write lock, all read locks are also blocked
// try-finally is overkill if for loop never throws exceptions
try {
// Make a copy while holding the lock.
for (T t : items) {
copy.add(t);
}
} finally {
exclusive.unlock();
}
return copy;
}
}
Edit:
The read-write lock is so named because it is based on the readers-writers problem not on how it is used. Using the read-write lock we can have multiple threads achieve read locks but only one thread achieve the write lock exclusively. In this case the problem is reversed - we want multiple threads to write (add) and only thread to read (make the snapshot). So, we want multiple threads to use the read lock even though they are actually mutating. Only thread is exclusively making the snapshot using the write lock even though snapshot only reads. Exclusive means that during making the snapshot no other add or snapshot requests can be serviced by other threads at the same time.
As #PeterLawrey pointed out, the Concurrent queue will serialize the writes aqlthough the locks will be used for as minimal a duration as possible. We are free to use any other concurrent data structure, e.g. ConcurrentDoublyLinkedList. The queue is used only as an example. The main idea is the use of read-write locks.