Java concurrency - improving a copy-on-read collection

Java concurrency - improving a copy-on-read collection - java

I have a multithreaded application, where a shared list has write-often, read-occasionally behaviour.
Specifically, many threads will dump data into the list, and then - later - another worker will grab a snapshot to persist to a datastore.
This is similar to the discussion over on this question.
There, the following solution is provided:
class CopyOnReadList<T> {
private final List<T> items = new ArrayList<T>();
public void add(T item) {
synchronized (items) {
// Add item while holding the lock.
items.add(item);
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>();
synchronized (items) {
// Make a copy while holding the lock.
for (T t : items) copy.add(t);
}
return copy;
}
}
However, in this scenario, (and, as I've learned from my question here), only one thread can write to the backing list at any given time.
Is there a way to allow high-concurrency writes to the backing list, which are locked only during the makeSnapshot() call?

synchronized (~20 ns) is pretty fast and even though other operations can allow concurrency, they can be slower.
private final Lock lock = new ReentrantLock();
private List<T> items = new ArrayList<T>();
public void add(T item) {
lock.lock();
// trivial lock time.
try {
// Add item while holding the lock.
items.add(item);
} finally {
lock.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(), ret;
lock.lock();
// trivial lock time.
try {
ret = items;
items = copy;
} finally {
lock.unlock();
}
return ret;
}
public static void main(String... args) {
long start = System.nanoTime();
Main<Integer> ints = new Main<>();
for (int j = 0; j < 100 * 1000; j++) {
for (int i = 0; i < 1000; i++)
ints.add(i);
ints.makeSnapshot();
}
long time = System.nanoTime() - start;
System.out.printf("The average time to add was %,d ns%n", time / 100 / 1000 / 1000);
}
prints
The average time to add was 28 ns
This means if you are creating 30 million entries per second, you will have one thread accessing the list on average. If you are creating 60 million per second, you will have concurrency issues, however you are likely to be having many more resourcing issue at this point.
Using Lock.lock() and Lock.unlock() can be faster when there is a high contention ratio. However, I suspect your threads will be spending most of the time building the objects to be created rather than waiting to add the objects.

You could use a ConcurrentDoublyLinkedList. There is an excellent implementation here ConcurrentDoublyLinkedList.
So long as you iterate forward through the list when you make your snapshot all should be well. This implementation preserves the forward chain at all times. The backward chain is sometimes inaccurate.

First of all, you should investigate if this really is too slow. Adds to ArrayLists are O(1) in the happy case, so if the list has an appropriate initial size, CopyOnReadList.add is basically just a bounds check and an assignment to an array slot, which is pretty fast. (And please, do remember that CopyOnReadList was written to be understandable, not performant.)
If you need a non-locking operation, you can have something like this:
class ConcurrentStack<T> {
private final AtomicReference<Node<T>> stack = new AtomicReference<>();
public void add(T value){
Node<T> tail, head;
do {
tail = stack.get();
head = new Node<>(value, tail);
} while (!stack.compareAndSet(tail, head));
}
public Node<T> drain(){
// Get all elements from the stack and reset it
return stack.getAndSet(null);
}
}
class Node<T> {
// getters, setters, constructors omitted
private final T value;
private final Node<T> tail;
}
Note that while adds to this structure should deal pretty well with high contention, it comes with several drawbacks. The output from drain is quite slow to iterate over, it uses quite a lot of memory (like all linked lists), and you also get things in the opposite insertion order. (Also, it's not really tested or verified, and may actually suck in your application. But that's always the risk with using code from some random dude on the intertubes.)

Yes, there is a way. It is similar to the way ConcurrentHashMap made, if you know.
You should make your own data structure not from one list for all writing threads, but use several independent lists. Each of such lists should be guarded by it's own lock. .add() method should choose list for append current item based on Thread.currentThread.id (for example, just id % listsCount). This will gives you good concurrency properties for .add() -- at best, listsCount threads will be able to write without contention.
On makeSnapshot() you should just iterate over all lists, and for each list you grab it's lock and copy content.
This is just an idea -- there are many places to improve it.

You can use a ReadWriteLock to allow multiple threads to perform add operations on the backing list in parallel, but only one thread to make the snapshot. While the snapshot is being prepared all other add and snapshot request are put on hold.
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
class CopyOnReadList<T> {
// free to use any concurrent data structure, ConcurrentLinkedQueue used as an example
private final ConcurrentLinkedQueue<T> items = new ConcurrentLinkedQueue<T>();
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock shared = rwLock.readLock();
private final Lock exclusive = rwLock.writeLock();
public void add(T item) {
shared.lock(); // multiple threads can attain the read lock
// try-finally is overkill if items.add() never throws exceptions
try {
// Add item while holding the lock.
items.add(item);
} finally {
shared.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(); // probably better idea to use a LinkedList or the ArrayList constructor with initial size
exclusive.lock(); // only one thread can attain write lock, all read locks are also blocked
// try-finally is overkill if for loop never throws exceptions
try {
// Make a copy while holding the lock.
for (T t : items) {
copy.add(t);
}
} finally {
exclusive.unlock();
}
return copy;
}
}
Edit:
The read-write lock is so named because it is based on the readers-writers problem not on how it is used. Using the read-write lock we can have multiple threads achieve read locks but only one thread achieve the write lock exclusively. In this case the problem is reversed - we want multiple threads to write (add) and only thread to read (make the snapshot). So, we want multiple threads to use the read lock even though they are actually mutating. Only thread is exclusively making the snapshot using the write lock even though snapshot only reads. Exclusive means that during making the snapshot no other add or snapshot requests can be serviced by other threads at the same time.
As #PeterLawrey pointed out, the Concurrent queue will serialize the writes aqlthough the locks will be used for as minimal a duration as possible. We are free to use any other concurrent data structure, e.g. ConcurrentDoublyLinkedList. The queue is used only as an example. The main idea is the use of read-write locks.

Related

Using Lock to lock a position in an array instead of using synchronized for concurrency (Fine Grained Locking)

I have been asked to implement fine grained locking on a hashlist. I have done this using synchronized but the questions tells me to use Lock instead.
I have created a hashlist of objects in the constructor
private LinkedList<E> data[];;
private Lock lock[];
private Lock lockR = new ReentrantLock();
// The constructors ensure that both the data and the dataLock are the same size
#SuppressWarnings("unchecked")
public ConcurrentHashList(int n){
if(n > 1000) {
data = (LinkedList<E>[])(new LinkedList[n/10]);
lock = new Lock [n/10];
}
else {
data = (LinkedList<E>[])(new LinkedList[100]);
lock = new Lock [100]; ;
}
for(int j = 0; j < data.length;j++) {
data[j] = new LinkedList<E>();
lock[j] = new ReentrantLock();// Adding a lock to each bucket index
}
}
The original method
public void add(E x){
if(x != null){
lock.lock();
try{
int index = hashC(x);
if(!data[index].contains(x))
data[index].add(x);
}finally{lock.unlock();}
}
}
Using synchronization to grab a handle on the object hashlist to allow mutable Threads to work on mutable indexes concurrently.
public void add(E x){
if(x != null){
int index = hashC(x);
synchronized (dataLock[index]) { // Getting handle before adding
if(!data[index].contains(x))
data[index].add(x);
}
}
}
I do not know how to implement it using Lock though I can not lock a single element in a array only the whole method which means it is not coarse grained.
Using an array of ReentrantLock
public void add(E x){
if(x != null){
int index = hashC(x);
dataLock[index].lock();
try {
// Getting handle before adding
if(!data[index].contains(x))
data[index].add(x);
}finally {dataLock[index].unlock();}
}
}
The hash function
private int hashC(E x){
int k = x.hashCode();
int h = Math.abs(k % data.length);
return(h);
}

Presumably, hashC() is a function that is highly likely to produce unique numbers. As in, you have no guarantee that the hashes are unique, but the incidence of non-unique hashes is extremely low. For a data structure with a few million entries, you have a literal handful of collisions, and any given collision always consists of only a pair or maybe 3 conflicts (2 to 3 objects in your data structure have the same hash, but not 'thousands').
Also, assumption: the hash for a given object is constant. hashC(x) will produce the same value no matter how many times you call it, assuming you provide the same x.
Then, you get some fun conclusions:
The 'bucket' (The LinkedList instance found at array slot hashC(x) in data) that your object should go into, is always the same - you know which one it should be based solely on the result of hashC.
Calculating hashC does not require a lock of any sort. It has no side effects whatsoever.
Thus, knowing which bucket you need for a given operation on a single value (Be it add, remove, or check-if-in-collection) can be done without locking anything.
Now, once you know which bucket you need to look at / mutate, okay, now locking is involved.
So, just have 1 lock for each bucket. Not a List<Object> locks[];, that's a whole list worth of locks per bucket. Just Object[] locks is all you need, or ReentrantLock[] locks if you prefer to use lock/unlock instead of synchronized (lock[bucketIdx]) { ... }.
This is effectively fine-grained: After all, the odds that one operation needs to twiddle its thumbs because another thread is doing something, even though that other thread is operating on a different object, is very low; it would require the two different objects to have a colliding hash, which is possible, but extremely rare - as per assumption #1.
NB: Note that therefore lock can go away entirely, you don't need it, unless you want to build into your code that the code may completely re-design its bucket structure. For example, 1000 buckets feels a bit meh if you end up with a billion objects. I don't think 'rebucket everything' is part of the task here, though.

Missing updates with locks and ConcurrentHashMap

I have a scenario where I have to maintain a Map which can be populated by multiple threads, each modifying their respective List (unique identifier/key being the thread name), and when the list size for a thread exceeds a fixed batch size, we have to persist the records to the database.
Aggregator class
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReentrantLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.unlock();
}
}
There is one more separate thread running after every 2 minutes (using the same lock) to persist all the records in Map (to make sure we have something persisted after every 2 minutes and the map size does not gets too big)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().lock();
List<T> instrumentList = instrumentMap.values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().unlock();
}
}
This solution is working fine in almost for every scenario that we tested, except sometimes we see some of the records went missing, i.e. they are not persisted at all, although they were added fine to the Map.
My questions are:
What is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does the List that is used with the ConcurrentHashMap have an issue?
Should I use the compute method of ConcurrentHashMap here (no need I think, as ReentrantLock is already doing the same job)?

The answer provided by #Slaw in the comments did the trick. We were letting the instrumentList instance escape in non-synchronized way i.e. access/operations are happening over list without any synchonization. Fixing the same by passing the copy to further methods did the trick.
Following line of code is the one where this issue was happening
recordSaver.persist(instrumentList);
instrumentList.clear();
Here we are allowing the instrumentList instance to escape in non-synchronized way i.e. it is passed to another class (recordSaver.persist) where it was to be actioned on but we are also clearing the list in very next line(in Aggregator class) and all of this is happening in non-synchronized way. List state can't be predicted in record saver... a really stupid mistake.
We fixed the issue by passing a cloned copy of instrumentList to recordSaver.persist(...) method. In this way instrumentList.clear() has no affect on list available in recordSaver for further operations.

I see, that you are using ConcurrentHashMap's parallelStream within a lock. I am not knowledgeable about Java 8+ stream support, but quick searching shows, that
ConcurrentHashMap is a complex data structure, that used to have concurrency bugs in past
Parallel streams must abide to complex and poorly documented usage restrictions
You are modifying your data within a parallel stream
Based on that information (and my gut-driven concurrency bugs detector™), I wager a guess, that removing the call to parallelStream might improve robustness of your code. In addition, as mentioned by #Slaw, you should use ordinary HashMap in place of ConcurrentHashMap if all instrumentMap usage is already guarded by lock.
Of course, since you don't post the code of recordSaver, it is possible, that it too has bugs (and not necessarily concurrency-related ones). In particular, you should make sure, that the code that reads records from persistent storage — the one, that you are using to detect loss of records — is safe, correct, and properly synchronized with rest of your system (preferably by using a robust, industry-standard SQL database).

It looks like this was an attempt at optimization where it was not needed. In that case, less is more and simpler is better. In the code below, only two concepts for concurrency are used: synchronized to ensure a shared list is properly updated and final to ensure all threads see the same value.
import java.util.ArrayList;
import java.util.List;
public class Aggregator<T> implements Runnable {
private final List<T> instruments = new ArrayList<>();
private final RecordSaver recordSaver;
private final int batchSize;
public Aggregator(RecordSaver recordSaver, int batchSize) {
super();
this.recordSaver = recordSaver;
this.batchSize = batchSize;
}
public synchronized void addAll(List<T> moreInstruments) {
instruments.addAll(moreInstruments);
if (instruments.size() >= batchSize) {
storeInstruments();
}
}
public synchronized void storeInstruments() {
if (instruments.size() > 0) {
// in case recordSaver works async
// recordSaver.persist(new ArrayList<T>(instruments));
// else just:
recordSaver.persist(instruments);
instruments.clear();
}
}
#Override
public void run() {
while (true) {
try { Thread.sleep(1L); } catch (Exception ignored) {
break;
}
storeInstruments();
}
}
class RecordSaver {
void persist(List<?> l) {}
}
}

Concurrent ArrayList

I need an ArrayList-like structure allowing just the following operations
get(int index)
add(E element)
set(int index, E element)
iterator()
Because of the iterator being used in many places, using Collections#synchronizedList would be too error-prone. The list can grow to a few thousand elements and gets used a lot, so I'm pretty sure, that CopyOnWriteArrayList will be too slow. I'll start with it to avoid premature optimizations, but I'd bet it won't work well.
Most accesses will be single-threaded reads. So I'm asking what's the proper data structure for this.
I though that wrapping the synchronizedList in something providing a synchronized iterator would do, but it won't because of the ConcurrentModificationException. Concenrning concurrent behavior, I obviously need that all changes will be visible by subsequent reads and iterators.
The iterator doesn't have to show a consistent snapshot, it may or may not see the updates via set(int index, E element) as this operation gets used only to replace an item with its updated version (containing some added information, which is irrelevant for the user of the iterator). The items are fully immutable.
I clearly stated why CopyOnWriteArrayList would not do. ConcurrentLinkedQueue is out of question as it lacks an indexed access. I need just a couple of operations rather than a fully fledged ArrayList. So unless any java concurrent list-related question is a duplicate of this question, this one is not.

In your case you can use a ReadWriteLock to access a backed List, this allows multiple Threads to read your list. Only if one Thread needs write access all reader-Thread must wait for the operation to complete. The JavaDoc make's it clear:
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
Here is a sample:
public class ConcurrentArrayList<T> {
/** use this to lock for write operations like add/remove */
private final Lock readLock;
/** use this to lock for read operations like get/iterator/contains.. */
private final Lock writeLock;
/** the underlying list*/
private final List<T> list = new ArrayList();
{
ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
readLock = rwLock.readLock();
writeLock = rwLock.writeLock();
}
public void add(T e){
writeLock.lock();
try{
list.add(e);
}finally{
writeLock.unlock();
}
}
public void get(int index){
readLock.lock();
try{
list.get(index);
}finally{
readLock.unlock();
}
}
public Iterator<T> iterator(){
readLock.lock();
try {
return new ArrayList<T>( list ).iterator();
//^ we iterate over an snapshot of our list
} finally{
readLock.unlock();
}
}

Synchronizing searches and modifications

What's a good way of allowing searches from multiple threads on a list (or other data structure), but preventing searches on the list and edits to the list on different threads from interleaving? I tried using synchronized blocks in the searching and editing methods, but that can cause unnecessary blocking when trying to run searches in multiple threads.
EDIT: The ReadWriteLock is exactly what I was looking for! Thanks.

Usually, yes ReadWriteLock is good enough.
But, if you're using Java 8 you can get a performance boost with the new StampedLock that lets you avoid the read lock. This applies when you have much more frequent reads(searches) compared with writes(edits).
private StampedLock sl = new StampedLock();
public void edit() { // write method
long stamp = sl.writeLock();
try {
doEdit();
} finally {
sl.unlockWrite(stamp);
}
}
public Object search() { // read method
long stamp = sl.tryOptimisticRead();
Object result = doSearch(); //first try without lock, search ideally should be fast
if (!sl.validate(stamp)) { //if something has modified
stamp = sl.readLock(); //acquire read lock and search again
try {
result = doSearch();
} finally {
sl.unlockRead(stamp);
}
}
return result;
}

Thread safety in multithreaded access to LinkedList

My application needs to keep an access log of requests to a certain resource and multiple threads will be recording log entries. The only pertinent piece of information is the timestamp of the request and the stats being retrieved will be how many requests occurred in the last X seconds. The method that returns the stats for a given number of seconds also needs to support multiple threads.
I was thinking of approaching the concurrency handling using the Locks framework, with which I am not the most familiar, hence this question. Here is my code:
import java.util.LinkedList;
import java.util.concurrent.locks.ReentrantLock;
public class ConcurrentRecordStats
{
private LinkedList<Long> recLog;
private final ReentrantLock lock = new ReentrantLock();
public LinkedConcurrentStats()
{
this.recLog = new LinkedList<Long>();
}
//this method will be utilized by multiple clients concurrently
public void addRecord(int wrkrID)
{
long crntTS = System.currentTimeMillis();
this.lock.lock();
this.recLog.addFirst(crntTS);
this.lock.unlock();
}
//this method will be utilized by multiple clients concurrently
public int getTrailingStats(int lastSecs)
{
long endTS = System.currentTimeMillis();
long bgnTS = endTS - (lastSecs * 1000);
int rslt = 0;
//acquire the lock only until we have read
//the first (latest) element in the list
this.lock.lock();
for(long crntRec : this.recLog)
{
//release the lock upon fetching the first element in the list
if(this.lock.isLocked())
{
this.lock.unlock();
}
if(crntRec > bgnTS)
{
rslt++;
}
else
{
break;
}
}
return rslt;
}
}
My questions are:
Will this use of ReentrantLock insure thread safety?
Is it needed to use a lock in getTrailingStats?
Can I do all this using synchronized blocks? The reason I went with locks is because I wanted to have the same lock in both R and W sections so that both writes and reading of the first element in the list (most recently added entry) is done a single thread at a time and I couldn't do that with just synchronized.
Should I use the ReentrantReadWriteLock instead?

The locks can present a major performance bottleneck. An alternative is to use a ConcurrentLinkedDeque: use offerFirst to add a new element, and use the (weakly consistent) iterator (that won't throw a ConcurrentModificationException) in place of your for-each loop. The advantage is that this will perform much better than your implementation or than the synchronizedList implementation, but the disadvantage is that the iterator is weakly consistent - thread1 might add elements to the list while thread2 is iterating through it, which means that thread2 won't count those new elements. However, this is functionally equivalent to having thread2 lock the list so that thread1 can't add to it - either way thread2 isn't counting the new elements.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.