CopyOnWriteArrayList is too slow

CopyOnWriteArrayList is too slow - java

I have following case,
public class Test {
private static final int MAX_NUMBER = 10_00_00;
public static void main(String[] args) {
List<Integer> list = new CopyOnWriteArrayList<>();
long start = System.nanoTime();
for(int i = 0; i < MAX_NUMBER; i++) {
list.add(i * 2);
}
long end = System.nanoTime();
System.out.println(((end - start) / Math.pow(10, 9)));
}
}
OUTPUT
6.861539857
It adds element quite slowly compared to ArrayList which took approximately 0.004690843. I came to know the reason in the documentation,
A thread-safe variant of ArrayList in which all mutative operations
(add, set, and so on) are implemented by making a fresh copy of the
underlying array.
So, my understanding is, whenever I add new element in this list it will create new fresh array and add element at last index of this array. I find a lock in add method and apart from that that method is actually creating new array every time.
When I increased MAX_NUMBER to 10_00_000 my programs keeps on running and never end (it would but I can't wait for so long).
I think Collections.synchronizedList is the better choice when you want thread safety with speed. I used it and it took about 0.007673728.
My questions :
Why internally it create new array does thread safety is related to this ?
Why it took so much time in case of MAX_NUMBER = 10_00_000 ? (as it took about 6 seconds with MAX_NUMBER = 10_00_00) Is this happening because mutative operation creates new array every time ?
Does this mean CopyOnWriteArrayList has performance drawback when you have huge number of elements and better to choose something else (i.e. Collections.synchronizedList)?
Is this the reason we usually don't see CopyOnWriteArrayList in public APIs ? Is there any drawbacks other than this ?

CopyOnWriteArrayList is the preferred option ONLY WHEN there are very less number of writes and huge number of reads (if multiple threads are accessing this list)

Related

Updating a Shared Resource in Multithreaded Program

Can someone explain the output of the following program:
public class DataRace extends Thread {
static ArrayList<Integer> arr = new ArrayList<>();
public void run() {
Random random = new Random();
int local = random.nextInt(10) + 1;
arr.add(local);
}
public static void main(String[] args) {
DataRace t1 = new DataRace();
DataRace t2 = new DataRace();
DataRace t3 = new DataRace();
DataRace t4 = new DataRace();
t1.start();
t2.start();
t3.start();
t4.start();
try {
t1.join();
t2.join();
t3.join();
t4.join();
} catch (InterruptedException e) {
System.out.println("interrupted");
}
System.out.println(DataRace.arr);
}
}
Output:
[8, 5]
[9, 2, 2, 8]
[2]
I am having trouble understanding the varying number of values in my output. I would expect the main thread to either wait until all threads have finished execution as I am joining them in the try-catch block and then output four values, one from each thread, or print to the console in case of an interruption. Neither of which is really happening here.
How does it come into play here if this is due to data race in multithreading?

The main problem is that multiple threads are adding to the same shared ArrayList concurrently. ArrayList is not thread-safe. From source one can read:
Note that this implementation is not synchronized.
If multiple threads
access an ArrayList instance concurrently, and at least one of the
threads modifies the list structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more elements, or explicitly resizes the backing array;
merely setting the value of an element is not a structural
modification.) This is typically accomplished by synchronizing on some
object that naturally encapsulates the list. If no such object exists,
the list should be "wrapped" using the Collections.synchronizedList
method. This is best done at creation time, to prevent accidental
unsynchronized access to the list:
In your code every time you call
arr.add(local);
inside the add method implementation, among others, a variable that keeps track of the size of the array will be updated. Below is shown the relevant part of the add method of the ArrayList:
private void add(E e, Object[] elementData, int s) {
if (s == elementData.length)
elementData = grow();
elementData[s] = e;
size = s + 1; // <--
}
where the variable field size is:
/**
* The size of the ArrayList (the number of elements it contains).
*
* #serial
*/
private int size;
Notice that neither is the add method synchronized nor the variable size is marked with the volatile clause. Hence, suitable to race-conditions.
Therefore, because you did not ensure mutual exclusion on the accesses to that ArrayList (e.g., surrounding the calls to the ArrayList with the synchronized clause), and because the ArrayList does not ensure that the size variable is updated atomically, each thread might see (or not) the last updated value of that variable. Hence, threads might see outdated values of the size variable, and add elements into positions that already other threads have added before. In the extreme, all threads might end-up adding an element into the same position (e.g., as one of your outputs [2]).
The aforementioned race-condition leads to undefined behavior, hence the reason why:
System.out.println(DataRace.arr);
outputs different number of elements in different execution of your code.
To make the ArrayList thread-safe or for alternatives have a look at the following SO thread: How do I make my ArrayList Thread-Safe?, where it showcases the use of Collections.synchronizedList()., CopyOnWriteArrayList among others.
An example of ensuring mutual exclusion of the accesses to the arr structure:
public void run() {
Random random = new Random();
int local = random.nextInt(10) + 1;
synchronized (arr) {
arr.add(local);
}
}
or :
static final List<Integer> arr = Collections.synchronizedList(new ArrayList<Integer>());
public void run() {
Random random = new Random();
int local = random.nextInt(10) + 1;
arr.add(local);
}

TL;DR
ArrayList is not Thread-Safe. Therefore it's behaviour in a race-condition is undefined. Use synchronized or CopyOnWriteArrayList instead.
Longer answer
ArrayList.add ultimately calls this private method:
private void add(E e, Object[] elementData, int s) {
if (s == elementData.length)
elementData = grow();
elementData[s] = e;
size = s + 1;
}
When two Threads reach this same point at the "same" time, they would have the same size (s), and both will try add an element on the same position and update the size to s + 1, thus likely keeping the result of the second.
If the size limit of the ArrayList is reached, and it has to grow(), a new bigger array is created and the contents copied, likely causing any other changes made concurrently to be lost (is possible that multiple threads will be trying to grow).
Alternatives here are to use monitors - a.k.a. synchronized, or to use Thread-Safe alternatives like CopyOnWriteArrayList.

I think there is a lot of similar or closely related questions. For example see this.
Basically the reason of this "unexpected" behabiour is because ArrayList is not thread-safe. You can try List<Integer> arr = new CopyOnWriteArrayList<>() and it will work as expected. This data structure is recommended when we want to perform read operation frequently and the number of write operations is relatively rare. For good explanation see What is CopyOnWriteArrayList in Java - Example Tutorial.
Another option is to use List<Integer> arr = Collections.synchronizedList(new ArrayList<>()).
You can also use Vector but it is not recommended (see here).
This article also will be useful - Vector vs ArrayList in Java.

What is a good parallel program [with Java Thread]?

I am learning Thread-ing in Java in order to create some program run in parallel. To design programs with parallelism is something I never had a chance to learn back at my school programming class. I know how to create threads and make them run, but I have no idea how to use them efficiently. After all I know it is not actually using threads that makes a program fast but a good parallel design. So I did some experiment to test my knowledge. However, my paralleled version actually runs slower than an unparalleled one. I start to doubt if I really get the idea. If you could be so kind, would you mind having a look my following program:
I made a program to fill an array in a divide-and-conquer fashion (I know Java has a Arrays.fill utility, but I just want to test my knowledge in multithreading):
public class ParalledFill
{
private static fill(final double [] array,
final double value,
final int start,
final int size)
{
if (size > 1000)
{ // Each thread handles at most 1000 elements
Runnable task = new Runnable() { // Fork the task
public void run() {
fill(array, value, start, 1000); // Fill the first 1000 elements
}};
// Create the thread
Thread fork = new Thread(task);
fork.start();
// Fill the rest of the array
fill(array, value, start+1000, size-1000);
// Join the task
try {
fork.join();
}
catch (InterruptedException except)
{
System.err.println(except);
}
}
else
{ // The array is small enough, fill it via a normal loop
for (int i = start; i < size; ++i)
array[i] = value;
}
} // fill
public static void main(String [] args)
{
double [] bigArray = new double[1000*1000];
double value = 3;
fill(bigArray, value, 0, bigArray.length);
}
}
I tested this program, but it turns out to be even slower than just doing something like:
for (int i = 0; i < bigArray.length; ++i)
bigArray[i] = value;
I had my guess, it could be that java does some optimisation for filling an array using a loop which makes it much faster than my threaded version. But other than that, I feel more strongly that my way to handle threads/parallelism could be wrong. I have never designed anything using threads (always relied on compiler optimisation or OpenMP in C). Could anyone help me explain why my paralleled version isn’t faster? Was the program just too bad in terms of designing paralleled program?
Thanks,
Xing.

Unless you have multiple CPUs, or long running tasks like I/O, I'm guessing that all you're doing is time slicing between threads. If there's a single CPU that has so much work to do, adding threads doesn't decrease the work that has to be done. All you end up doing is adding overhead due to context switching.
You ought to read "Java Concurrency In Practice". Better to learn how to do things with the modern concurrency package rather than raw threads.

reduce in performance when used multithreading in java

I am new to multi-threading and I have to write a program using multiple threads to increase its efficiency. At my first attempt what I wrote produced just opposite results. Here is what I have written:
class ThreadImpl implements Callable<ArrayList<Integer>> {
//Bloom filter instance for one of the table
BloomFilter<Integer> bloomFilterInstance = null;
// Data member for complete data access.
ArrayList< ArrayList<UserBean> > data = null;
// Store the result of the testing
ArrayList<Integer> result = null;
int tableNo;
public ThreadImpl(BloomFilter<Integer> bloomFilterInstance,
ArrayList< ArrayList<UserBean> > data, int tableNo) {
this.bloomFilterInstance = bloomFilterInstance;
this.data = data;
result = new ArrayList<Integer>(this.data.size());
this.tableNo = tableNo;
}
public ArrayList<Integer> call() {
int[] tempResult = new int[this.data.size()];
for(int i=0; i<data.size() ;++i) {
tempResult[i] = 0;
}
ArrayList<UserBean> chkDataSet = null;
for(int i=0; i<this.data.size(); ++i) {
if(i==tableNo) {
//do nothing;
} else {
chkDataSet = new ArrayList<UserBean> (data.get(i));
for(UserBean toChk: chkDataSet) {
if(bloomFilterInstance.contains(toChk.getUserId())) {
++tempResult[i];
}
}
}
this.result.add(new Integer(tempResult[i]));
}
return result;
}
}
In the above class there are two data members data and bloomFilterInstance and they(the references) are passed from the main program. So actually there is only one instance of data and bloomFilterInstance and all the threads are accessing it simultaneously.
The class that launches the thread is(few irrelevant details have been left out, so all variables etc. you can assume them to be declared):
class MultithreadedVrsion {
public static void main(String[] args) {
if(args.length > 1) {
ExecutorService es = Executors.newFixedThreadPool(noOfTables);
List<Callable<ArrayList<Integer>>> threadedBloom = new ArrayList<Callable<ArrayList<Integer>>>(noOfTables);
for (int i=0; i<noOfTables; ++i) {
threadedBloom.add(new ThreadImpl(eval.bloomFilter.get(i),
eval.data, i));
}
try {
List<Future<ArrayList<Integer>>> answers = es.invokeAll(threadedBloom);
long endTime = System.currentTimeMillis();
System.out.println("using more than one thread for bloom filters: " + (endTime - startTime) + " milliseconds");
System.out.println("**Printing the results**");
for(Future<ArrayList<Integer>> element: answers) {
ArrayList<Integer> arrInt = element.get();
for(Integer i: arrInt) {
System.out.print(i.intValue());
System.out.print("\t");
}
System.out.println("");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
I did the profiling with jprofiler and
![here]:(http://tinypic.com/r/wh1v8p/6)
is a snapshot of cpu threads where red color shows blocked, green runnable and yellow is waiting. I problem is that threads are running one at a time I do not know why?
Note:I know that this is not thread safe but I know that I will only be doing read operations throughout now and just want to analyse raw performance gain that can be achieved, later I will implement a better version.

Can anyone please tell where I have missed
One possibility is that the cost of creating threads is swamping any possible performance gains from doing the computations in parallel. We can't really tell if this is a real possibility because you haven't included the relevant code in the question.
Another possibility is that you only have one processor / core available. Threads only run when there is a processor to run them. So your expectation of a linear speed with the number of threads and only possibly achieved (in theory) if is a free processor for each thread.
Finally, there could be memory contention due to the threads all attempting to access a shared array. If you had proper synchronization, that would potentially add further contention. (Note: I haven't tried to understand the algorithm to figure out if contention is likely in your example.)
My initial advice would be to profile your code, and see if that offers any insights.
And take a look at the way you are measuring performance to make sure that you aren't just seeing some benchmarking artefact; e.g. JVM warmup effects.

That process looks CPU bound. (no I/O, database calls, network calls, etc.) I can think of two explanations:
How many CPUs does your machine have? How many is Java allowed to use? - if the threads are competing for the same CPU, you've added coordination work and placed more demand on the same resource.
How long does the whole method take to run? For very short times, the additional work in context switching threads could overpower the actual work. The way to deal with this is to make a longer job. Also, run it a lot of times in a loop not counting the first few iterations (like a warm up, they aren't representative.)

Several possibilities come to mind:
There is some synchronization going on inside bloomFilterInstance's implementation (which is not given).
There is a lot of memory allocation going on, e.g., what appears to be an unnecessary copy of an ArrayList when chkDataSet is created, use of new Integer instead of Integer.valueOf. You may be running into overhead costs for memory allocation.
You may be CPU-bound (if bloomFilterInstance#contains is expensive) and threads are simply blocking for CPU instead of executing.
A profiler may help reveal the actual problem.

Java concurrency - improving a copy-on-read collection

I have a multithreaded application, where a shared list has write-often, read-occasionally behaviour.
Specifically, many threads will dump data into the list, and then - later - another worker will grab a snapshot to persist to a datastore.
This is similar to the discussion over on this question.
There, the following solution is provided:
class CopyOnReadList<T> {
private final List<T> items = new ArrayList<T>();
public void add(T item) {
synchronized (items) {
// Add item while holding the lock.
items.add(item);
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>();
synchronized (items) {
// Make a copy while holding the lock.
for (T t : items) copy.add(t);
}
return copy;
}
}
However, in this scenario, (and, as I've learned from my question here), only one thread can write to the backing list at any given time.
Is there a way to allow high-concurrency writes to the backing list, which are locked only during the makeSnapshot() call?

synchronized (~20 ns) is pretty fast and even though other operations can allow concurrency, they can be slower.
private final Lock lock = new ReentrantLock();
private List<T> items = new ArrayList<T>();
public void add(T item) {
lock.lock();
// trivial lock time.
try {
// Add item while holding the lock.
items.add(item);
} finally {
lock.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(), ret;
lock.lock();
// trivial lock time.
try {
ret = items;
items = copy;
} finally {
lock.unlock();
}
return ret;
}
public static void main(String... args) {
long start = System.nanoTime();
Main<Integer> ints = new Main<>();
for (int j = 0; j < 100 * 1000; j++) {
for (int i = 0; i < 1000; i++)
ints.add(i);
ints.makeSnapshot();
}
long time = System.nanoTime() - start;
System.out.printf("The average time to add was %,d ns%n", time / 100 / 1000 / 1000);
}
prints
The average time to add was 28 ns
This means if you are creating 30 million entries per second, you will have one thread accessing the list on average. If you are creating 60 million per second, you will have concurrency issues, however you are likely to be having many more resourcing issue at this point.
Using Lock.lock() and Lock.unlock() can be faster when there is a high contention ratio. However, I suspect your threads will be spending most of the time building the objects to be created rather than waiting to add the objects.

You could use a ConcurrentDoublyLinkedList. There is an excellent implementation here ConcurrentDoublyLinkedList.
So long as you iterate forward through the list when you make your snapshot all should be well. This implementation preserves the forward chain at all times. The backward chain is sometimes inaccurate.

First of all, you should investigate if this really is too slow. Adds to ArrayLists are O(1) in the happy case, so if the list has an appropriate initial size, CopyOnReadList.add is basically just a bounds check and an assignment to an array slot, which is pretty fast. (And please, do remember that CopyOnReadList was written to be understandable, not performant.)
If you need a non-locking operation, you can have something like this:
class ConcurrentStack<T> {
private final AtomicReference<Node<T>> stack = new AtomicReference<>();
public void add(T value){
Node<T> tail, head;
do {
tail = stack.get();
head = new Node<>(value, tail);
} while (!stack.compareAndSet(tail, head));
}
public Node<T> drain(){
// Get all elements from the stack and reset it
return stack.getAndSet(null);
}
}
class Node<T> {
// getters, setters, constructors omitted
private final T value;
private final Node<T> tail;
}
Note that while adds to this structure should deal pretty well with high contention, it comes with several drawbacks. The output from drain is quite slow to iterate over, it uses quite a lot of memory (like all linked lists), and you also get things in the opposite insertion order. (Also, it's not really tested or verified, and may actually suck in your application. But that's always the risk with using code from some random dude on the intertubes.)

Yes, there is a way. It is similar to the way ConcurrentHashMap made, if you know.
You should make your own data structure not from one list for all writing threads, but use several independent lists. Each of such lists should be guarded by it's own lock. .add() method should choose list for append current item based on Thread.currentThread.id (for example, just id % listsCount). This will gives you good concurrency properties for .add() -- at best, listsCount threads will be able to write without contention.
On makeSnapshot() you should just iterate over all lists, and for each list you grab it's lock and copy content.
This is just an idea -- there are many places to improve it.

You can use a ReadWriteLock to allow multiple threads to perform add operations on the backing list in parallel, but only one thread to make the snapshot. While the snapshot is being prepared all other add and snapshot request are put on hold.
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
class CopyOnReadList<T> {
// free to use any concurrent data structure, ConcurrentLinkedQueue used as an example
private final ConcurrentLinkedQueue<T> items = new ConcurrentLinkedQueue<T>();
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock shared = rwLock.readLock();
private final Lock exclusive = rwLock.writeLock();
public void add(T item) {
shared.lock(); // multiple threads can attain the read lock
// try-finally is overkill if items.add() never throws exceptions
try {
// Add item while holding the lock.
items.add(item);
} finally {
shared.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(); // probably better idea to use a LinkedList or the ArrayList constructor with initial size
exclusive.lock(); // only one thread can attain write lock, all read locks are also blocked
// try-finally is overkill if for loop never throws exceptions
try {
// Make a copy while holding the lock.
for (T t : items) {
copy.add(t);
}
} finally {
exclusive.unlock();
}
return copy;
}
}
Edit:
The read-write lock is so named because it is based on the readers-writers problem not on how it is used. Using the read-write lock we can have multiple threads achieve read locks but only one thread achieve the write lock exclusively. In this case the problem is reversed - we want multiple threads to write (add) and only thread to read (make the snapshot). So, we want multiple threads to use the read lock even though they are actually mutating. Only thread is exclusively making the snapshot using the write lock even though snapshot only reads. Exclusive means that during making the snapshot no other add or snapshot requests can be serviced by other threads at the same time.
As #PeterLawrey pointed out, the Concurrent queue will serialize the writes aqlthough the locks will be used for as minimal a duration as possible. We are free to use any other concurrent data structure, e.g. ConcurrentDoublyLinkedList. The queue is used only as an example. The main idea is the use of read-write locks.

What is the use of AtomicReferenceArray?

When is it a good idea to use AtomicReferenceArray? Please explain with an example.

looks like it's functionally equivalent to AtomicReference[], occupying a little less memory though.
So it's useful when you need more than a million atomic references - can't think of any use case.

If you had a shared array of object references, then you would use an AtomicReferenceArray to ensure that the array couldn't be updated simultaneously by different threads i.e. only one element can be updated at a time.
However, in an AtomicReference[] (array of AtomicReference) multiple threads can still update different elements simulateously, because the atomicity is on the elements, not on the array as a whole.
More info here.

It could be useful if you have a large number of objects that are updated concurrently, for example in a large multiplayer game.
An update of reference i would follow the pattern
boolean success = false;
while (!success)
{
E previous = atomicReferenceArray.get(i);
E next = ... // compute updated object
success = atomicReferenceArray.compareAndSet(i, previous, next);
}
Depending on the circumstances this may be faster and/or easier to use than locking (synchronized).

One possible use case would have been ConcurrentHashMap which extensively uses array internally. Array can be volatile but at per element level sematics can't be volatile. it's one of the reason automic array came into existence.

some notes from a C++ programmer below, please don't condemn my Java much :)
AtomicReferenceArray allows to avoid false sharing, when multiple CPU logical cores access the same cache line that is changed by one of the thread. Invalidating and re-fetching the cache is very expensive. Unfortunately there is no sizeof in Java, so we don't know how many bytes each AtomicReference takes, but assuming it's at least 8 bytes (the size of a pointer on 64-bit architectures), you can allocate as follows:
// a lower bound is enough
private final int sizeofAtomicReference = 8;
// good for x86/x64
private final int sizeofCacheLine = 64;
// the number of CPU cores
private final int nLogicalCores = Runtime.getRuntime().availableProcessors();
private final int refsPerCacheLine = (sizeofCacheLine + sizeofAtomicReference - 1) / sizeofAtomicReference;
private AtomicReferenceArray<Task> tasks = new AtomicReferenceArray<Task>(nLogicalCores * refsPerCacheLine);
Now if you assign a task to i-th thread via
tasks.compareAndSet(i*refsPerCacheLine, null, new Task(/*problem definition here*/));
you guarantee that the task references are assigned to different CPU cache lines. Thus there is no expensive false sharing. So the latency of passing tasks from the producer thread to the consumer threads is minimal (for Java, but not for C++/Assembly).
Bonus:
You then poll the tasks array in the worker threads like this:
// consider iWorker is the 0-based index of the logical core this thread is assigned to
final int myIndex = iWorker*refsPerCacheLine;
while(true) {
Task curTask = tasks.get(myIndex);
if(curTask == null) continue;
if(curTask.isTerminator()) {
return; // exit the thread
}
// ... Process the task here ...
// Signal the producer thread that the current worker is free
tasks.set(myIndex, null);
}

import java.util.concurrent.atomic.AtomicReferenceArray;
public class AtomicReferenceArrayExample {
AtomicReferenceArray<String> arr = new AtomicReferenceArray<String>(10);
public static void main(String... args) {
new Thread(new AtomicReferenceArrayExample().new AddThread()).start();
new Thread(new AtomicReferenceArrayExample().new AddThread()).start();
}
class AddThread implements Runnable {
#Override
public void run() {
// Sets value at the index 1
arr.set(0, "A");
// At index 0, if current reference is "A" then it changes as "B".
arr.compareAndSet(0, "A", "B");
// At index 0, if current value is "B", then it is sets as "C".
arr.weakCompareAndSet(0, "B", "C");
System.out.println(arr.get(0));
}
}
}
// Result:
// C
// C

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

CopyOnWriteArrayList is too slow - java

CopyOnWriteArrayList is the preferred option ONLY WHEN there are very less number of writes and huge number of reads (if multiple threads are accessing this list)

Related

Updating a Shared Resource in Multithreaded Program

What is a good parallel program [with Java Thread]?

reduce in performance when used multithreading in java

Java concurrency - improving a copy-on-read collection

What is the use of AtomicReferenceArray?

Categories

Resources