I am studying java.util.concurrent now. I am trying to understand CopyOnWriteArrayList.
As I understood this class looks like ArrayList, but thread-safe. This class is very useful if you have a lot of reading and less writing.
Here is my example. How can I use it (just for study purpose)?
Can I use it that way?
package Concurrency;
import java.util.concurrent.*;
class Entry {
private static int count;
private final int index = count++;
public String toString() {
return String.format(
"index:%-3d thread:%-3d",
index,
Thread.currentThread().getId());
}
}
class Reader implements Runnable {
private CopyOnWriteArrayList<Entry> list;
Reader(CopyOnWriteArrayList<Entry> list) { this.list = list; }
public void run() {
try {
while(true) {
if(!list.isEmpty())
System.out.println("-out " + list.remove(0));
TimeUnit.MILLISECONDS.sleep(100);
}
} catch (InterruptedException e) {
return;
}
}
}
class Writer implements Runnable {
private CopyOnWriteArrayList<Entry> list;
Writer(CopyOnWriteArrayList<Entry> list) { this.list = list; }
public void run() {
try {
while(true) {
Entry tmp = new Entry();
System.out.println("+in " + tmp);
list.add(tmp);
TimeUnit.MILLISECONDS.sleep(10);
}
} catch (InterruptedException e) {
return;
}
}
}
public class FourtyOne {
static final int nThreads = 7;
public static void main(String[] args) throws InterruptedException {
CopyOnWriteArrayList<Entry> list = new CopyOnWriteArrayList<>();
ExecutorService exec = Executors.newFixedThreadPool(nThreads);
exec.submit(new Writer(list));
for(int i = 0; i < nThreads; i++)
exec.submit(new Reader(list));
TimeUnit.SECONDS.sleep(1);
exec.shutdownNow();
}
}
Please note in your example your one writer is writing at 10x the speed of a given reader, causing a lot of copies to be made. Also note that your reader(s) are performing a write operation (remove()) upon the list as well.
Under this situation, you are writing to the list at a astonishingly high rate causing severe performance issues as large amounts of memory is being used everytime you update this list.
CopyOnWriteArrayList is only used when synchronization overheads are an issue and the ratio of reads vs structural modification is high. The cost of a total array copy is amortized by the performance gains seen when one or more readers try to access the list at the same time. This contrasts that of a traditional synchronized list where each access (read or write) is controlled under some mutex such that only one thread can perform some operation upon the list at once.
If a simple thread-safe list is required, consider synchronized list as provided by Collections.synchronizedList().
Please also note:
if(!list.isEmpty()){
System.out.println("-out " + list.remove(0));
}
is not effective programming as there is no guarantee the list will not be empty after the if statement evaluates. To guarantee consistent effect, you'd need to either directly check the return value of list.remove() or wrap the whole segment in a synchronized block (defeating the purpose of using a thread-safe structure).
The remove() call, being a structurally modifying call should also be replaced a method like get() to ensure no structural modifications are being done whilst the data is being read.
In all, I believe the CopyOnWriteArrayList need only be used in a very specific way and only when traditional synchronization becomes unacceptably slow. Whilst your example may work fine on your own computer, scaling the magnitude of access any larger and you'll be causing the gc to be doing too much work to maintain the heap space.
Related
i am currently using a ConcurrentLinkedQueue, so that I can use natural order FIFO and also use it in a thread safe application . I have a requirement to log the size of the queue every minute and given that this collection does not guarantee size and also cost to calculate size is O(N), is there any alternative bounded non blocking concurrent queue that I can use where in obtaining size will not be a costly operation and at the same time the add/remove operation is not expensive either?
If there is no collection, do I need to use LinkedList with locks?
If you really (REALLY) need to log a correct, current size of the Queue you are currently dealing with - you need to block. There is simply no other way. You can think that maintaining a separate LongAdder field might help, may be making your own interface as a wrapper around ConcurrentLinkedQueue, something like:
interface KnownSizeQueue<T> {
T poll();
long size();
}
And an implementation:
static class ConcurrentKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private final LongAdder currentSize = new LongAdder();
#Override
public T poll() {
T result = queue.poll();
if(result != null){
currentSize.decrement();
}
return result;
}
#Override
public long size() {
return currentSize.sum();
}
}
I just encourage you to add one more method, like remove into the interface and try to reason about the code. You will, very shortly realize, that such implementations will still give you a wrong result. So, do not do it.
The only reliable way to get the size, if you really need it, is to block for each operation. This comes at a high price, because ConcurrentLinkedQueue is documented as:
This implementation employs an efficient non-blocking...
You will lose those properties, but if that is a hard requirement that does not care about that, you could write your own:
static class ParallelKnownSizeQueue<T> implements KnownSizeQueue<T> {
private final Queue<T> queue = new ArrayDeque<>();
private final ReentrantLock lock = new ReentrantLock();
#Override
public T poll() {
try {
lock.lock();
return queue.poll();
} finally {
lock.unlock();
}
}
#Override
public long size() {
try {
lock.lock();
ConcurrentLinkedQueue
return queue.size();
} finally {
lock.unlock();
}
}
}
Or, of course, you can use an already existing structure, like LinkedBlockingDeque or ArrayBlockingQueue, etc - depending on what you need.
Sorry if this is a dumb question. But could someone explain me what could happens in a scenario like this?
List<Integer> scores = new Arraylist<>() ;
scores =
Collections.synchronizedList(scores)
public void add(int element) {
...
scores.add(element)
...
}
public String retrieve(int element) {
...
For (Integer e : scores)....
....
Return something
}
Let's assume that this class is a singelton and that scores is global. Multiple thread can add and retrieve the scores at the same time
In this scenario when starting the for loop and at the same time a thread is adding (or removing an element from the list) will it throw a concurrent modification exeption ?
Thank you
Bad things will happen, given the way you've written your example.
Your retrieve() method doesn't have its loop in a synchronized block, and both of your methods are accessing scores directly, instead of using the List returned by the Collections.synchronizedList() method.
If you take a look at the API for Collections.synchronizedList(), you'll notice that it says
In order to guarantee serial access, it is critical that all access to the backing list is accomplished through the returned list.
It is imperative that the user manually synchronize on the returned list when iterating over it:
Failure to follow this advice may result in non-deterministic behavior.
So you might get a ConcurrentModificationException, or something else weird might happen.
Edit
Even if all your access is via the synchronized List, you can still end up getting a ConcurrentModificationException thrown at you if you modify the List while iterating over it in another thread. That's why the Collections.synchronizedList() documentation insists that you manually wrap your iteration inside a block that is synchronized on the List it returns.
The API for ConcurrentModificationException says
For example, it is not generally permissible for one thread to modify a Collection while another thread is iterating over it. In general, the results of the iteration are undefined under these circumstances. Some Iterator implementations (including those of all the general purpose collection implementations provided by the JRE) may choose to throw this exception if this behavior is detected. Iterators that do this are known as fail-fast iterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Your add method won't need to be changed, but your retrieve() method should look something like:
public String retrieve(int element) {
// stuff
synchronized (scores) { // prevent scores from being modified while iterating
for (Integer e : scores) {
// looping stuff
}
}
// more stuff
return something;
}
Sample Program
Here's a small sample program which demonstrates the behavior of safe vs unsafe access:
public class Scratch {
private List<Integer> scores = Collections.synchronizedList(new ArrayList<Integer>());
public static void main(String[] args) throws Exception {
final Scratch s = new Scratch();
s.scores.add(1);
s.scores.add(2);
s.scores.add(3);
// keep adding things to the list forever
new Thread(new Runnable() {
#Override
public void run() {
try {
int i=100;
while (true) {
Thread.sleep(100);
s.scores.add(i++);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}).start();
System.out.println("This will run fine");
s.safeLoop();
System.out.println("This will cause a ConcurrentModificationException");
s.unsafeLoop();
}
public void safeLoop() throws InterruptedException {
synchronized (scores) {
for (int i : scores) {
System.out.println("i="+i);
Thread.sleep(100);
}
}
}
public void unsafeLoop() throws InterruptedException {
for (int i : scores) {
System.out.println("i="+i);
Thread.sleep(100);
}
}
}
static final Collection<String> FILES = new ArrayList<String>(1);
for (final String s : list) {
new Thread(new Runnable() {
public void run() {
List<String> file2List = getFileAsList(s);
FILES.addAll(file2List);
}
}).start();
}
This collections gets very big, but the code works perfect. I thought I will get a concurrent modifcation exception, because the FILES list has to extend its size, but it has never happened.
is this code 100% threadsafe ?
The code takes a 12 seconds to load up and a few threads are adding elements at the same time.
I tried to first create thread and later run them, but I got same results (both time and correctness)
No, the code is not thread-safe. It may or may not throw a ConcurrentModificationException, but you may end up with elements missing or elements being added twice. Changing the list to be a
Collection<String> FILES = Collections.synchronizedList(new ArrayList<String>());
might already be a solution, assuming that the most time-consuming part is the getFilesAsList method (and not adding the resulting elements to the FILES list).
BTW, an aside: When getFileAsList is accessing the hard-drive, you should perform detailed performance tests. Multi-threaded hard-drive accesses may be slower than a single-threaded one, because the hard drive head might have to jump around the drive and not be able to read data as a contiguous block.
EDIT: In response to the comment: This program will "very likely" produce ArrayIndexOutOfBoundsExceptions from time to time:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
public class ConcurrentListTest
{
public static void main(String[] args) throws InterruptedException
{
for (int i=0; i<1000; i++)
{
runTest();
}
}
private static void runTest() throws InterruptedException
{
final Collection<String> FILES = new ArrayList<String>(1);
// With this, it will always work:
// final Collection<String> FILES = Collections.synchronizedList(new ArrayList<String>(1));
List<String> list = Arrays.asList("A", "B", "C", "D");
List<Thread> threads = new ArrayList<Thread>();
for (final String s : list)
{
Thread thread = new Thread(new Runnable()
{
#Override
public void run()
{
List<String> file2List = getFileAsList(s);
FILES.addAll(file2List);
}
});
threads.add(thread);
thread.start();
}
for (Thread thread : threads)
{
thread.join();
}
System.out.println(FILES.size());
}
private static List<String> getFileAsList(String s)
{
List<String> list = Collections.nCopies(10000, s);
return list;
}
}
But of course, there is no strict guarantee that it will. If it does not create such an exception for you, you should consider playing the lottery, because you must be remarkably lucky.
It is not thread safety at all even if you only add elements. In case that you only increase your FILES there is also some multi access problem if your collection is not thread safe.
When collection exceeds their size it has to be copied to new space and in that moment you can have problems with concurrent access, because in the moment that one thread will do the copy stuff... another can be trying add at the same time element to that collection, resizing is done by internal arraylist implementation but it is not thread-safe at all.
Check that code and lets assume that more than one thread execute it when collection capacity is full.
private int size; //it is not thread safe value!
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e; //size is not volatile value it might be cached by thread
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
public void ensureCapacity(int minCapacity) {
if (minCapacity > 0)
ensureCapacityInternal(minCapacity);
}
private void ensureCapacityInternal(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
In the moment that collection need to exceed its capacity and copy existed element to new internal array you can have real problems with multi-access, also size is not thread because it is not volatile and it can be cached by some thread and you can have overrides :) and this is answer why it might be not thread safe even if you use only use add operation on non synchronized collection.
You should consider using FILES=new CopyOnWriteArrayList();, orFILES= Collections.synchronizedList(new ArrayList()); where add operation is thread-safe.
Yes, you need a concurrent list to prevent a ConcurrentModificationException.
Here are some ways to initialize a concurrent list in Java:
Collections.newSetFromMap(new ConcurrentHashMap<>());
Collections.synchronizedList(new ArrayList<Object>());
new CopyOnWriteArrayList<>();
is this code 100% threadsafe ?
This code is 0% threadsafe, even by the weakest standard of interleaved operation. You are mutating shared state under a data race.
You most definitely need some kind of concurrent control; it is not obvious whether a concurrent collection is the right choice, though. A simple synchronizedList might fit the bill even better because you have a lot of processing and then a quick transfer to the accumulator list. The lock will not be contended much.
I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.
Is there anything wrong with the thread safety of this java code? Threads 1-10 add numbers via sample.add(), and Threads 11-20 call removeAndDouble() and print the results to stdout. I recall from the back of my mind that someone said that assigning item in same way as I've got in removeAndDouble() using it outside of the synchronized block may not be thread safe. That the compiler may optimize the instructions away so they occur out of sequence. Is that the case here? Is my removeAndDouble() method unsafe?
Is there anything else wrong from a concurrency perspective with this code? I am trying to get a better understanding of concurrency and the memory model with java (1.6 upwards).
import java.util.*;
import java.util.concurrent.*;
public class Sample {
private final List<Integer> list = new ArrayList<Integer>();
public void add(Integer o) {
synchronized (list) {
list.add(o);
list.notify();
}
}
public void waitUntilEmpty() {
synchronized (list) {
while (!list.isEmpty()) {
try {
list.wait(10000);
} catch (InterruptedException ex) { }
}
}
}
public void waitUntilNotEmpty() {
synchronized (list) {
while (list.isEmpty()) {
try {
list.wait(10000);
} catch (InterruptedException ex) { }
}
}
}
public Integer removeAndDouble() {
// item declared outside synchronized block
Integer item;
synchronized (list) {
waitUntilNotEmpty();
item = list.remove(0);
}
// Would this ever be anything but that from list.remove(0)?
return Integer.valueOf(item.intValue() * 2);
}
public static void main(String[] args) {
final Sample sample = new Sample();
for (int i = 0; i < 10; i++) {
Thread t = new Thread() {
public void run() {
while (true) {
System.out.println(getName()+" Found: " + sample.removeAndDouble());
}
}
};
t.setName("Consumer-"+i);
t.setDaemon(true);
t.start();
}
final ExecutorService producers = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
final int j = i * 10000;
Thread t = new Thread() {
public void run() {
for (int c = 0; c < 1000; c++) {
sample.add(j + c);
}
}
};
t.setName("Producer-"+i);
t.setDaemon(false);
producers.execute(t);
}
producers.shutdown();
try {
producers.awaitTermination(600, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
sample.waitUntilEmpty();
System.out.println("Done.");
}
}
It looks thread safe to me. Here is my reasoning.
Everytime you access list you do it synchronized. This is great. Even though you pull out a part of the list in item, that item is not accessed by multiple threads.
As long as you only access list while synchronized, you should be good (in your current design.)
Your synchronization is fine, and will not result in any out-of-order execution problems.
However, I do notice a few issues.
First, your waitUntilEmpty method would be much more timely if you add a list.notifyAll() after the list.remove(0) in removeAndDouble. This will eliminate an up-to 10 second delay in your wait(10000).
Second, your list.notify in add(Integer) should be a notifyAll, because notify only wakes one thread, and it may wake a thread that is waiting inside waitUntilEmpty instead of waitUntilNotEmpty.
Third, none of the above is terminal to your application's liveness, because you used bounded waits, but if you make the two above changes, your application will have better threaded performance (waitUntilEmpty) and the bounded waits become unnecessary and can become plain old no-arg waits.
Your code as-is is in fact thread safe. The reasoning behind this is two part.
The first is mutual exclusion. Your synchronization correctly ensures that only one thread at a time will modify the collections.
The second has to do with your concern about compiler reordering. Youre worried that the compile can in fact re order the assigning in which it wouldnt be thread safe. You dont have to worry about it in this case. Synchronizing on the list creates a happens-before relationship. All removes from the list happens-before the write to Integer item. This tells the compiler that it cannot re order the write to item in that method.
Your code is thread-safe, but not concurrent (as in parallel). As everything is accessed under a single mutual exclusion lock, you are serialising all access, in effect access to the structure is single-threaded.
If you require the functionality as described in your production code, the java.util.concurrent package already provides a BlockingQueue with (fixed size) array and (growable) linked list based implementations. These are very interesting to study for implementation ideas at the very least.