Data race in Java ArrayList class - java

I was reading about CopyOnWriteArrayList and was wondering how can I demonstrate data race in ArrayList class. Basically I'm trying to simulate a situation where ArrayList fails so that it becomes necessary to use CopyOnWriteArrayList. Any suggestions on how to simulate this.

A race is when two (or more) threads try to operate on shared data, and the final output depends on the order the data is accessed (and that order is indeterministic)
From Wikipedia:
A race condition or race hazard is a flaw in an electronic system or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first.
For example:
public class Test {
private static List<String> list = new CopyOnWriteArrayList<String>();
public static void main(String[] args) throws Exception {
ExecutorService e = Executors.newFixedThreadPool(5);
e.execute(new WriterTask());
e.execute(new WriterTask());
e.execute(new WriterTask());
e.execute(new WriterTask());
e.execute(new WriterTask());
e.awaitTermination(20, TimeUnit.SECONDS);
}
static class WriterTask implements Runnable {
#Override
public void run() {
for (int i = 0; i < 25000; i ++) {
list.add("a");
}
}
}
}
This, however, fails when using ArrayList, with ArrayIndexOutOfbounds. That's because before insertion the ensureCapacity(..) should be called to make sure the internal array can hold the new data. And here's what happens:
the first thread calls add(..), which in turn calls ensureCapacity(currentSize + 1)
before the first thread has actually incremented the size, the 2nd thread also calls ensureCapacity(currentSize + 1).
because both have read the initial value of currentSize, the new size of the internal array is currentSize + 1
the two threads make the expensive operation to copy the old array into the extended one, with the new size (which cannot hold both additions)
Then each of them tries to assign the new element to array[size++]. The first one succeeds, the second one fails, because the internal array has not been expanded properly, due to the rece condition.
This happens, because two threads have tried to add items at the same time on the same structure, and the addition of one of them has overridden the addition of the other (i.e. the first one was lost)
Another benefit of CopyOnWriteArrayList
multiple threads write to the ArrayList
a thread iterates the ArrayList. It will surely get ConcurrentModificationException
Here's how to demonstrate it:
public class Test {
private static List<String> list = new ArrayList<String>();
public static void main(String[] args) throws Exception {
ExecutorService e = Executors.newFixedThreadPool(2);
e.execute(new WriterTask());
e.execute(new ReaderTask());
}
static class ReaderTask implements Runnable {
#Override
public void run() {
while (true) {
for (String s : list) {
System.out.println(s);
}
}
}
}
static class WriterTask implements Runnable {
#Override
public void run() {
while(true) {
list.add("a");
}
}
}
}
If you run this program multiple times, you will often be getting ConcurrentModificationException before you get OutOfMemoryError.
If you replace it with CopyOnWriteArrayList, you don't get the exception (but the program is very slow)
Note that this is just a demonstration - the benefit of CopyOnWriteArrayList is when the number of reads vastly outnumbers the number of writes.

Example:
for (int i = 0; i < array.size(); ++i) {
Element elm = array.get(i);
doSomethingWith(elm);
}
If another thread calls array.clear() before this thread calls array.get(i), but after it has compared i with array.size(), -> ArrayIndexOutOfBoundsException.

Two threads, one incrementing the arraylist and one decrementing. Data race could happen here.

Related

How to show unsynchronicity of arraylist java?

We know that ArrayList are not thread safe and VectorList are. I wanted to make a program to show that operation are being performed synchronously in VectorList and not in ArrayList. The only problem, I am facing is how? What kind of operation?
For example :- If we add a value to any of the list, the program simply add values.
I tried to make one but realized the synchronicity my program is dependent on variable j, not on ArrayList or VectorList.
public class ArrayDemo implements Runnable {
private static ArrayList<Integer> al = new ArrayList<Integer>();
Random random = new Random();
int j = 0;
public void run() {
while ( j < 10) {
int i = random.nextInt(10);
al.add(i);
System.out.println(i + " "+ Thread.currentThread().getName());
j++;
//System.out.println(al.remove(0));
}
}
public static void main(String[] args) {
ArrayDemo ad = new ArrayDemo();
Thread t = new Thread(ad);
Thread t1 = new Thread(ad);
t.start();t1.start();
}
}
Small test program:
public class Test extends Thread {
public static void main(String[] args) throws Exception {
test(new Vector<>());
test(new ArrayList<>());
test(Collections.synchronizedList(new ArrayList<>()));
test(new CopyOnWriteArrayList<>());
}
private static void test(final List<Integer> list) throws Exception {
System.gc();
long start = System.currentTimeMillis();
Thread[] threads = new Thread[10];
for (int i = 0; i < threads.length; i++)
threads[i] = new Test(list);
for (Thread thread : threads)
thread.start();
for (Thread thread : threads)
thread.join();
long end = System.currentTimeMillis();
System.out.println(list.size() + " in " + (end - start) + "ms using " + list.getClass().getSimpleName());
}
private final List<Integer> list;
Test(List<Integer> list) {
this.list = list;
}
#Override
public void run() {
try {
for (int i = 0; i < 10000; i++)
this.list.add(i);
} catch (Exception e) {
e.printStackTrace(System.out);
}
}
}
Sample Output
100000 in 16ms using Vector
java.lang.ArrayIndexOutOfBoundsException: 466
at java.util.ArrayList.add(ArrayList.java:459)
at Test.run(Test.java:36)
java.lang.ArrayIndexOutOfBoundsException: 465
at java.util.ArrayList.add(ArrayList.java:459)
at Test.run(Test.java:36)
java.lang.ArrayIndexOutOfBoundsException: 10
at java.util.ArrayList.add(ArrayList.java:459)
at Test.run(Test.java:36)
32507 in 15ms using ArrayList
100000 in 16ms using SynchronizedRandomAccessList
100000 in 3073ms using CopyOnWriteArrayList
As you can see, with Vector it completes normally and returns 100000, which is the expected size after adding 10000 values in 10 parallel threads.
With ArrayList you see two different failures:
Three of the threads die with ArrayIndexOutOfBoundsException in the call to add().
Even if the three failing threads died immediately, before adding anything, the other 7 threads should still have added 10000 values each, for a total of 70000 values, but the list only contains 32507 values, so many of the added values got lost.
The third test, using Collections.synchronizedList(), works like Vector.
The fourth test, using the concurrent CopyOnWriteArrayList, also generates the right result, but much slower, due to excessive copying. It will however be faster than synchronized access if the list is smaller and changes rarely, but is read often.
It is especially good if you need to iterate the list, because even Vector and synchronizedList() will fail with ConcurrentModificationException if the list is modified while iterating, while CopyOnWriteArrayList will iterate a snapshot of the list.
Out of curiosity, I checked some Deque implementations too:
test(new ArrayDeque<>());
test(new ConcurrentLinkedDeque<>());
test(new LinkedBlockingDeque<>());
Sample Output
34295 in 0ms using ArrayDeque
100000 in 15ms using ConcurrentLinkedDeque
100000 in 16ms using LinkedBlockingDeque
As you can see, the unsynchronized ArrayDeque shows the "lost value" symptom, though it doesn't fail with an exception.
The two concurrent implementations, ConcurrentLinkedDeque and LinkedBlockingDeque, work good and fast.
Even with your simple program you could show that the ArrayList is not thread safe by making more loop iterations (10 might not be enough) and reduce other code that slows down operations on ArrayList, especially IO code such as System.out.
I modified your original code by removing Random and System.out calls. I added just a single System.out.println at the end of the loop to show possible successful termination.
However this code does not run in full. Instead it throws an exception.
Exception in thread "Thread-1" java.lang.ArrayIndexOutOfBoundsException: ...
What is important to learn from this is that even similar code might not run into thread-safety issues if the timings are not just right. This shows why thread related bugs are hard to find and can lurk in code for very long before they actually crash the program.
Here is the modified code:
import java.util.*;
public class ArrayDemo implements Runnable {
private static ArrayList<Integer> al = new ArrayList<Integer>();
int j = 0;
public void run() {
while (j < 10000) {
al.add(new Integer(1));
j++;
}
System.out.println("Array size: " + al.size());
}
public static void main(String[] args) {
ArrayDemo ad = new ArrayDemo();
Thread t = new Thread(ad);
Thread t1 = new Thread(ad);
t.start();
t1.start();
}
}

Why Unsynchronized ArrayList Object add method is not behaving properly

Here is the sample code
1) ArrayList is a single object which is passed to every thread of ThreadPool.
2) At end of execution list size should be 50, if you check the sample outputs its may not 50. Sometime it may be 41 or 47 like that, why it is behaving like that.
public class Test {
ArrayList list=new ArrayList();
public static void main(String[] args) {
ExecutorService executorService3 = Executors.newScheduledThreadPool(10);
Test test=new Test();
for(int i=0;i<5;i++)
{
Mythread t1=new Mythread(test.list);
executorService3.execute(t1);
}
executorService3.shutdown();
while(executorService3.isShutdown())
{
//---This is not giveging proper output as expected is 50.--
System.out.println("List size="+test.list.size());
break;
}
}
}
class Mythread implements Runnable {
List list=null;
Mythread(List list) {
this.list=list;
}
#Override
public void run() {
for(int i=0;i<10;i++) {
this.list.add(i);
}
}
}
Your code isn't waiting for the threads to finish execution. By the time your code calls the following line
System.out.println("List size="+test.list.size());
there's no guarantee that they have finished, and so no guarantee that the list contains the expected 50 items. Use the awaitTermination method (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#awaitTermination(long,%20java.util.concurrent.TimeUnit)), e.g.:
executorService3.shutdown();
executorService3.awaitTermination(1, TimeUnit.SECONDS);
System.out.println("List size="+test.list.size());
(Exception handling omitted for brevity)
As it says in the Javadoc for ArrayList:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
So it's "not behaving properly" because you're not using it as described in the documentation.
As is suggested in the Javadoc, you can wrap your list in a synchronized list:
List<Integer> list = Collections.synchronizedList(new ArrayList<>());
It's a concurrency problem.
The way i see it, you have :
5 threads that will execute the run method on the same object. Multiple threads can insert a variable, on the same position in the array list, since it is not synchronized.
Could you print the content of your list ?

Multi-thread MergeSort

I would like to realise a mergesort with multithreading.
so here is my code :
public class MergeSort<E extends Comparable<T>> implements Runnable {
public void run() {
mergeSort(array);
}
public synchronized void mergeSort(List<E> array) {
int size = array.size();
if (size > 1){
int mid = size / 2;
List<T> l = array.subList(0,mid);
List<T> r = array.subList(mid,vec.size());
Thread t = new Thread(new MergeSort<E>(left));
Thread t2 = new Thread(new MergeSort<E>(right));
t.start();
t2.start();
merge(l, r, array);
}
}
I would like my MergeSort to run, create 2 new threads, and then the method call the merge and finishes his job.
I tried without thread, juste by calling Mergesort(left)... It worked, so my algorithm is correct, but when I try with threads, the List is not sorted.
So, how to synchronize the threads? I Know there will be too much threads, but I just want to know how to synchronize to sort the list.
I can't tell exactly because some of the code is missing, but it does look like you're calling mergesort with "left" twice.
Couple of things to keep in mind:
Just by creating thread, dont assume that thread would start running instanteneously.
You are injecting left as parameter to both your thread instead of l and r.
If at all you want it to work, you would need thread pair each one to do its task and once that is done, you could proceed with next two halves after merging the result.

Do I need a concurrent collection for adding elements to a list by many threads?

static final Collection<String> FILES = new ArrayList<String>(1);
for (final String s : list) {
new Thread(new Runnable() {
public void run() {
List<String> file2List = getFileAsList(s);
FILES.addAll(file2List);
}
}).start();
}
This collections gets very big, but the code works perfect. I thought I will get a concurrent modifcation exception, because the FILES list has to extend its size, but it has never happened.
is this code 100% threadsafe ?
The code takes a 12 seconds to load up and a few threads are adding elements at the same time.
I tried to first create thread and later run them, but I got same results (both time and correctness)
No, the code is not thread-safe. It may or may not throw a ConcurrentModificationException, but you may end up with elements missing or elements being added twice. Changing the list to be a
Collection<String> FILES = Collections.synchronizedList(new ArrayList<String>());
might already be a solution, assuming that the most time-consuming part is the getFilesAsList method (and not adding the resulting elements to the FILES list).
BTW, an aside: When getFileAsList is accessing the hard-drive, you should perform detailed performance tests. Multi-threaded hard-drive accesses may be slower than a single-threaded one, because the hard drive head might have to jump around the drive and not be able to read data as a contiguous block.
EDIT: In response to the comment: This program will "very likely" produce ArrayIndexOutOfBoundsExceptions from time to time:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
public class ConcurrentListTest
{
public static void main(String[] args) throws InterruptedException
{
for (int i=0; i<1000; i++)
{
runTest();
}
}
private static void runTest() throws InterruptedException
{
final Collection<String> FILES = new ArrayList<String>(1);
// With this, it will always work:
// final Collection<String> FILES = Collections.synchronizedList(new ArrayList<String>(1));
List<String> list = Arrays.asList("A", "B", "C", "D");
List<Thread> threads = new ArrayList<Thread>();
for (final String s : list)
{
Thread thread = new Thread(new Runnable()
{
#Override
public void run()
{
List<String> file2List = getFileAsList(s);
FILES.addAll(file2List);
}
});
threads.add(thread);
thread.start();
}
for (Thread thread : threads)
{
thread.join();
}
System.out.println(FILES.size());
}
private static List<String> getFileAsList(String s)
{
List<String> list = Collections.nCopies(10000, s);
return list;
}
}
But of course, there is no strict guarantee that it will. If it does not create such an exception for you, you should consider playing the lottery, because you must be remarkably lucky.
It is not thread safety at all even if you only add elements. In case that you only increase your FILES there is also some multi access problem if your collection is not thread safe.
When collection exceeds their size it has to be copied to new space and in that moment you can have problems with concurrent access, because in the moment that one thread will do the copy stuff... another can be trying add at the same time element to that collection, resizing is done by internal arraylist implementation but it is not thread-safe at all.
Check that code and lets assume that more than one thread execute it when collection capacity is full.
private int size; //it is not thread safe value!
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e; //size is not volatile value it might be cached by thread
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
public void ensureCapacity(int minCapacity) {
if (minCapacity > 0)
ensureCapacityInternal(minCapacity);
}
private void ensureCapacityInternal(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
In the moment that collection need to exceed its capacity and copy existed element to new internal array you can have real problems with multi-access, also size is not thread because it is not volatile and it can be cached by some thread and you can have overrides :) and this is answer why it might be not thread safe even if you use only use add operation on non synchronized collection.
You should consider using FILES=new CopyOnWriteArrayList();, orFILES= Collections.synchronizedList(new ArrayList()); where add operation is thread-safe.
Yes, you need a concurrent list to prevent a ConcurrentModificationException.
Here are some ways to initialize a concurrent list in Java:
Collections.newSetFromMap(new ConcurrentHashMap<>());
Collections.synchronizedList(new ArrayList<Object>());
new CopyOnWriteArrayList<>();
is this code 100% threadsafe ?
This code is 0% threadsafe, even by the weakest standard of interleaved operation. You are mutating shared state under a data race.
You most definitely need some kind of concurrent control; it is not obvious whether a concurrent collection is the right choice, though. A simple synchronizedList might fit the bill even better because you have a lot of processing and then a quick transfer to the accumulator list. The lock will not be contended much.

Java multithreaded parser

I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.

Categories