We know that ArrayList are not thread safe and VectorList are. I wanted to make a program to show that operation are being performed synchronously in VectorList and not in ArrayList. The only problem, I am facing is how? What kind of operation?
For example :- If we add a value to any of the list, the program simply add values.
I tried to make one but realized the synchronicity my program is dependent on variable j, not on ArrayList or VectorList.
public class ArrayDemo implements Runnable {
private static ArrayList<Integer> al = new ArrayList<Integer>();
Random random = new Random();
int j = 0;
public void run() {
while ( j < 10) {
int i = random.nextInt(10);
al.add(i);
System.out.println(i + " "+ Thread.currentThread().getName());
j++;
//System.out.println(al.remove(0));
}
}
public static void main(String[] args) {
ArrayDemo ad = new ArrayDemo();
Thread t = new Thread(ad);
Thread t1 = new Thread(ad);
t.start();t1.start();
}
}
Small test program:
public class Test extends Thread {
public static void main(String[] args) throws Exception {
test(new Vector<>());
test(new ArrayList<>());
test(Collections.synchronizedList(new ArrayList<>()));
test(new CopyOnWriteArrayList<>());
}
private static void test(final List<Integer> list) throws Exception {
System.gc();
long start = System.currentTimeMillis();
Thread[] threads = new Thread[10];
for (int i = 0; i < threads.length; i++)
threads[i] = new Test(list);
for (Thread thread : threads)
thread.start();
for (Thread thread : threads)
thread.join();
long end = System.currentTimeMillis();
System.out.println(list.size() + " in " + (end - start) + "ms using " + list.getClass().getSimpleName());
}
private final List<Integer> list;
Test(List<Integer> list) {
this.list = list;
}
#Override
public void run() {
try {
for (int i = 0; i < 10000; i++)
this.list.add(i);
} catch (Exception e) {
e.printStackTrace(System.out);
}
}
}
Sample Output
100000 in 16ms using Vector
java.lang.ArrayIndexOutOfBoundsException: 466
at java.util.ArrayList.add(ArrayList.java:459)
at Test.run(Test.java:36)
java.lang.ArrayIndexOutOfBoundsException: 465
at java.util.ArrayList.add(ArrayList.java:459)
at Test.run(Test.java:36)
java.lang.ArrayIndexOutOfBoundsException: 10
at java.util.ArrayList.add(ArrayList.java:459)
at Test.run(Test.java:36)
32507 in 15ms using ArrayList
100000 in 16ms using SynchronizedRandomAccessList
100000 in 3073ms using CopyOnWriteArrayList
As you can see, with Vector it completes normally and returns 100000, which is the expected size after adding 10000 values in 10 parallel threads.
With ArrayList you see two different failures:
Three of the threads die with ArrayIndexOutOfBoundsException in the call to add().
Even if the three failing threads died immediately, before adding anything, the other 7 threads should still have added 10000 values each, for a total of 70000 values, but the list only contains 32507 values, so many of the added values got lost.
The third test, using Collections.synchronizedList(), works like Vector.
The fourth test, using the concurrent CopyOnWriteArrayList, also generates the right result, but much slower, due to excessive copying. It will however be faster than synchronized access if the list is smaller and changes rarely, but is read often.
It is especially good if you need to iterate the list, because even Vector and synchronizedList() will fail with ConcurrentModificationException if the list is modified while iterating, while CopyOnWriteArrayList will iterate a snapshot of the list.
Out of curiosity, I checked some Deque implementations too:
test(new ArrayDeque<>());
test(new ConcurrentLinkedDeque<>());
test(new LinkedBlockingDeque<>());
Sample Output
34295 in 0ms using ArrayDeque
100000 in 15ms using ConcurrentLinkedDeque
100000 in 16ms using LinkedBlockingDeque
As you can see, the unsynchronized ArrayDeque shows the "lost value" symptom, though it doesn't fail with an exception.
The two concurrent implementations, ConcurrentLinkedDeque and LinkedBlockingDeque, work good and fast.
Even with your simple program you could show that the ArrayList is not thread safe by making more loop iterations (10 might not be enough) and reduce other code that slows down operations on ArrayList, especially IO code such as System.out.
I modified your original code by removing Random and System.out calls. I added just a single System.out.println at the end of the loop to show possible successful termination.
However this code does not run in full. Instead it throws an exception.
Exception in thread "Thread-1" java.lang.ArrayIndexOutOfBoundsException: ...
What is important to learn from this is that even similar code might not run into thread-safety issues if the timings are not just right. This shows why thread related bugs are hard to find and can lurk in code for very long before they actually crash the program.
Here is the modified code:
import java.util.*;
public class ArrayDemo implements Runnable {
private static ArrayList<Integer> al = new ArrayList<Integer>();
int j = 0;
public void run() {
while (j < 10000) {
al.add(new Integer(1));
j++;
}
System.out.println("Array size: " + al.size());
}
public static void main(String[] args) {
ArrayDemo ad = new ArrayDemo();
Thread t = new Thread(ad);
Thread t1 = new Thread(ad);
t.start();
t1.start();
}
}
Related
I have this piece of code:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
#Override
public void run(){
while(!intervals.isEmpty()){
//remove one interval
//do calculations
//add some intervals
}
}
This code is being executed by a specific number of threads at the same time. As you see, loop should go on until there are no more intervals left in the collection, but there is a problem. In the beginning of each iteration an interval gets removed from collection and in the end some number of intervals might get added back into same collection.
Problem is, that while one thread is inside the loop the collection might become empty, so other threads that are trying to enter the loop won't be able to do that and will finish their work prematurely, even though collection might be filled with values after the first thread will finish the iteration. I want the thread count to remain constant (or not more than some number n) until all work is really finished.
That means that no threads are currently working in the loop and there are no elements left in the collection. What are possible ways of accomplishing that? Any ideas are welcomed.
One way to solve this problem in my specific case is to give every thread a different piece of the original collection. But after one thread would finish its work it wouldn't be used by the program anymore, even though it could help other threads with their calculations, so I don't like this solution, because it's important to utilize all cores of the machine in my problem.
This is the simplest minimal working example I could come up with. It might be to lengthy.
public class Test{
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
private int threadNumber;
private Thread[] threads;
private double result;
public Test(int threadNumber){
intervals.add(new Interval(0, 1));
this.threadNumber = threadNumber;
threads = new Thread[threadNumber];
}
public double find(){
for(int i = 0; i < threadNumber; i++){
threads[i] = new Thread(new Finder());
threads[i].start();
}
try{
for(int i = 0; i < threadNumber; i++){
threads[i].join();
}
}
catch(InterruptedException e){
System.err.println(e);
}
return result;
}
private class Finder implements Runnable{
#Override
public void run(){
while(!intervals.isEmpty()){
Interval interval = intervals.poll();
if(interval.high - interval.low > 1e-6){
double middle = (interval.high + interval.low) / 2;
boolean something = true;
if(something){
intervals.add(new Interval(interval.low + 0.1, middle - 0.1));
intervals.add(new Interval(middle + 0.1, interval.high - 0.1));
}
else{
intervals.add(new Interval(interval.low + 0.1, interval.high - 0.1));
}
}
}
}
}
private class Interval{
double low;
double high;
public Interval(double low, double high){
this.low = low;
this.high = high;
}
}
}
What you might need to know about the program: After every iteration interval should either disappear (because it's too small), become smaller or split into two smaller intervals. Work is finished after no intervals are left. Also, I should be able to limit number of threads that are doing this work with some number n. The actual program looks for a maximum value of some function by dividing the intervals and throwing away the parts of those intervals that can't contain the maximum value using some rules, but this shouldn't really be relevant to my problem.
The CompletableFuture class is also an interesting solution for these kind of tasks.
It automatically distributes workload over a number of worker threads.
static CompletableFuture<Integer> fibonacci(int n) {
if(n < 2) return CompletableFuture.completedFuture(n);
else {
return CompletableFuture.supplyAsync(() -> {
System.out.println(Thread.currentThread());
CompletableFuture<Integer> f1 = fibonacci(n - 1);
CompletableFuture<Integer> f2 = fibonacci(n - 2);
return f1.thenCombineAsync(f2, (a, b) -> a + b);
}).thenComposeAsync(f -> f);
}
}
public static void main(String[] args) throws Exception {
int fib = fibonacci(10).get();
System.out.println(fib);
}
You can use atomic flag, i.e.:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue<>();
private AtomicBoolean inUse = new AtomicBoolean();
#Override
public void run() {
while (!intervals.isEmpty() && inUse.compareAndSet(false, true)) {
// work
inUse.set(false);
}
}
UPD
Question has been updated, so I would give you better solution. It is more "classic" solution using blocking queue;
private BlockingQueue<Interval> intervals = new ArrayBlockingQueue<Object>();
private volatile boolean finished = false;
#Override
public void run() {
try {
while (!finished) {
Interval next = intervals.take();
// put work there
// after you decide work is finished just set finished = true
intervals.put(interval); // anyway, return interval to queue
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
UPD2
Now it seems better to re-write solution and divide range to sub-ranges for each thread.
Your problem looks like a recursive one - processing one task (interval) might produce some sub-tasks (sub intervals).
For that purpose I would use ForkJoinPool and RecursiveTask:
class Interval {
...
}
class IntervalAction extends RecursiveAction {
private Interval interval;
private IntervalAction(Interval interval) {
this.interval = interval;
}
#Override
protected void compute() {
if (...) {
// we need two sub-tasks
IntervalAction sub1 = new IntervalAction(new Interval(...));
IntervalAction sub2 = new IntervalAction(new Interval(...));
sub1.fork();
sub2.fork();
sub1.join();
sub2.join();
} else if (...) {
// we need just one sub-task
IntervalAction sub3 = new IntervalAction(new Interval(...));
sub3.fork();
sub3.join();
} else {
// current task doesn't need any sub-tasks, just return
}
}
}
public static void compute(Interval initial) {
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new IntervalAction(initial));
// invoke will return when all the processing is completed
}
I had the same problem, and I tested the following solution.
In my test example I have a queue (the equivalent of your intervals) filled with integers. For the test, at each iteration one number is taken from the queue, incremented and placed back in the queue if the new value is below 7 (arbitrary). This has the same impact as your interval generation on the mechanism.
Here is an example working code (Note that I develop in java 1.8 and I use the Executor framework to handle my thread pool.) :
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
public class Test {
final int numberOfThreads;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
final BlockingQueue<Integer> sleepingThreadsTokens;
final ThreadPoolExecutor executor;
public static void main(String[] args) {
final Test test = new Test(2); // arbitrary number of thread => 2
test.launch();
}
private Test(int numberOfThreads){
this.numberOfThreads = numberOfThreads;
this.queue = new PriorityBlockingQueue<Integer>();
this.availableThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.sleepingThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
}
public void launch() {
// put some elements in queue at the beginning
queue.add(1);
queue.add(2);
queue.add(3);
for(int i = 0; i < numberOfThreads; i++){
availableThreadsTokens.add(1);
}
System.out.println("Start");
boolean algorithmIsFinished = false;
while(!algorithmIsFinished){
if(sleepingThreadsTokens.size() != numberOfThreads){
try {
availableThreadsTokens.take();
} catch (final InterruptedException e) {
e.printStackTrace();
// some treatment should be put there in case of failure
break;
}
if(!queue.isEmpty()){ // Continuation condition
sleepingThreadsTokens.drainTo(availableThreadsTokens);
executor.submit(new Loop(queue.poll(), queue, availableThreadsTokens));
}
else{
sleepingThreadsTokens.add(1);
}
}
else{
algorithmIsFinished = true;
}
}
executor.shutdown();
System.out.println("Finished");
}
public static class Loop implements Runnable{
int element;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
public Loop(Integer element, BlockingQueue<Integer> queue, BlockingQueue<Integer> availableThreadsTokens){
this.element = element;
this.queue = queue;
this.availableThreadsTokens = availableThreadsTokens;
}
#Override
public void run(){
System.out.println("taking element "+element);
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
if(element < 7){
this.queue.add(element+1);
System.out.println("Inserted element"+(element + 1));
}
else{
System.out.println("no insertion");
}
this.availableThreadsTokens.offer(1);
}
}
}
I ran this code for check, and it seems to work properly. However there are certainly some improvement that can be made :
sleepingThreadsTokens do not have to be a BlockingQueue, since only the main accesses it. I used this interface because it allowed a nice sleepingThreadsTokens.drainTo(availableThreadsTokens);
I'm not sure whether queue has to be blocking or not, since only main takes from it and does not wait for elements (it waits only for tokens).
...
The idea is that the main thread checks for the termination, and for this it has to know how many threads are currently working (so that it does not prematurely stops the algorithm because the queue is empty). To do so two specific queues are created : availableThreadsTokens and sleepingThreadsTokens. Each element in availableThreadsTokens symbolizes a thread that have finished an iteration, and wait to be given another one. Each element in sleepingThreadsTokens symbolizes a thread that was available to take a new iteration, but the queue was empty, so it had no job and went to "sleep". So at each moment availableThreadsTokens.size() + sleepingThreadsTokens.size() = numberOfThreads - threadExcecutingIteration.
Note that the elements on availableThreadsTokens and sleepingThreadsTokens only symbolizes thread activity, they are not thread nor design a specific thread.
Case of termination : let suppose we have N threads (aribtrary, fixed number). The N threads are waiting for work (N tokens in availableThreadsTokens), there is only 1 remaining element in the queue and the treatment of this element won't generate any other element. Main takes the first token, finds that the queue is not empty, poll the element and sends the thread to work. The N-1 next tokens are consumed one by one, and since the queue is empty the token are moved into sleepingThreadsTokens one by one. Main knows that there is 1 thread working in the loop since there is no token in availableThreadsTokens and only N-1 in sleepingThreadsTokens, so it waits (.take()). When the thread finishes and releases the token Main consumes it, discovers that the queue is now empty and put the last token in sleepingThreadsTokens. Since all tokens are now in sleepingThreadsTokens Main knows that 1) all threads are inactive 2) the queue is empty (else the last token wouldn't have been transferred to sleepingThreadsTokens since the thread would have take the job).
Note that if the working thread finishes the treatment before all the availableThreadsTokens are moved to sleepingThreadsTokens it makes no difference.
Now if we suppose that the treatment of the last element would have generated M new elements in the queue then the Main would have put all the tokens from sleepingThreadsTokens back to availableThreadsTokens, and start to assign them treatments again. We put all the token back even if M < N because we don't know how much elements will be inserted in the future, so we have to keep all the thread available.
I would suggest a master/worker approach then.
The master process goes through the intervals and assigns the calculations of that interval to a different process. It also removes/adds as necessary. This way, all the cores are utilized, and only when all intervals are finished, the process is done. This is also known as dynamic work allocation.
A possible example:
public void run(){
while(!intervals.isEmpty()){
//remove one interval
Thread t = new Thread(new Runnable()
{
//do calculations
});
t.run();
//add some intervals
}
}
The possible solution you provided is known as static allocation, and you're correct, it will finish as fast as the slowest processor, but the dynamic approach will utilize all memory.
I've run into this problem as well. The way I solved it was to use an AtomicInteger to know what is in the queue. Before each offer() increment the integer. After each poll() decrement the integer. The CLQ has no real isEmpty() since it must look at head/tail nodes and this can change atomically (CAS).
This doesn't guarantee 100% that some thread may increment after another thread decrements so you need to check again before ending the thread. It is better than relying on while(...isEmpty())
Other than that, you may need to synchronize.
static final Collection<String> FILES = new ArrayList<String>(1);
for (final String s : list) {
new Thread(new Runnable() {
public void run() {
List<String> file2List = getFileAsList(s);
FILES.addAll(file2List);
}
}).start();
}
This collections gets very big, but the code works perfect. I thought I will get a concurrent modifcation exception, because the FILES list has to extend its size, but it has never happened.
is this code 100% threadsafe ?
The code takes a 12 seconds to load up and a few threads are adding elements at the same time.
I tried to first create thread and later run them, but I got same results (both time and correctness)
No, the code is not thread-safe. It may or may not throw a ConcurrentModificationException, but you may end up with elements missing or elements being added twice. Changing the list to be a
Collection<String> FILES = Collections.synchronizedList(new ArrayList<String>());
might already be a solution, assuming that the most time-consuming part is the getFilesAsList method (and not adding the resulting elements to the FILES list).
BTW, an aside: When getFileAsList is accessing the hard-drive, you should perform detailed performance tests. Multi-threaded hard-drive accesses may be slower than a single-threaded one, because the hard drive head might have to jump around the drive and not be able to read data as a contiguous block.
EDIT: In response to the comment: This program will "very likely" produce ArrayIndexOutOfBoundsExceptions from time to time:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
public class ConcurrentListTest
{
public static void main(String[] args) throws InterruptedException
{
for (int i=0; i<1000; i++)
{
runTest();
}
}
private static void runTest() throws InterruptedException
{
final Collection<String> FILES = new ArrayList<String>(1);
// With this, it will always work:
// final Collection<String> FILES = Collections.synchronizedList(new ArrayList<String>(1));
List<String> list = Arrays.asList("A", "B", "C", "D");
List<Thread> threads = new ArrayList<Thread>();
for (final String s : list)
{
Thread thread = new Thread(new Runnable()
{
#Override
public void run()
{
List<String> file2List = getFileAsList(s);
FILES.addAll(file2List);
}
});
threads.add(thread);
thread.start();
}
for (Thread thread : threads)
{
thread.join();
}
System.out.println(FILES.size());
}
private static List<String> getFileAsList(String s)
{
List<String> list = Collections.nCopies(10000, s);
return list;
}
}
But of course, there is no strict guarantee that it will. If it does not create such an exception for you, you should consider playing the lottery, because you must be remarkably lucky.
It is not thread safety at all even if you only add elements. In case that you only increase your FILES there is also some multi access problem if your collection is not thread safe.
When collection exceeds their size it has to be copied to new space and in that moment you can have problems with concurrent access, because in the moment that one thread will do the copy stuff... another can be trying add at the same time element to that collection, resizing is done by internal arraylist implementation but it is not thread-safe at all.
Check that code and lets assume that more than one thread execute it when collection capacity is full.
private int size; //it is not thread safe value!
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e; //size is not volatile value it might be cached by thread
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
public void ensureCapacity(int minCapacity) {
if (minCapacity > 0)
ensureCapacityInternal(minCapacity);
}
private void ensureCapacityInternal(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
In the moment that collection need to exceed its capacity and copy existed element to new internal array you can have real problems with multi-access, also size is not thread because it is not volatile and it can be cached by some thread and you can have overrides :) and this is answer why it might be not thread safe even if you use only use add operation on non synchronized collection.
You should consider using FILES=new CopyOnWriteArrayList();, orFILES= Collections.synchronizedList(new ArrayList()); where add operation is thread-safe.
Yes, you need a concurrent list to prevent a ConcurrentModificationException.
Here are some ways to initialize a concurrent list in Java:
Collections.newSetFromMap(new ConcurrentHashMap<>());
Collections.synchronizedList(new ArrayList<Object>());
new CopyOnWriteArrayList<>();
is this code 100% threadsafe ?
This code is 0% threadsafe, even by the weakest standard of interleaved operation. You are mutating shared state under a data race.
You most definitely need some kind of concurrent control; it is not obvious whether a concurrent collection is the right choice, though. A simple synchronizedList might fit the bill even better because you have a lot of processing and then a quick transfer to the accumulator list. The lock will not be contended much.
I just want to see the difference between them visually, so below is the code. But it always fails. Can someone please help me on this? I have seen questions on SO too, but none of them have shown the difference programatically.
public class BBDifferencetest {
protected static int testnum = 0;
public static void testStringBuilder() {
final StringBuilder sb = new StringBuilder();
Thread t1 = new Thread() {
#Override
public void run() {
for (int x = 0; x < 100; x++) {
testnum++;
sb.append(testnum);
sb.append(" ");
}
}
};
Thread t2 = new Thread() {
public void run() {
for (int x = 0; x < 100; x++) {
testnum++;
sb.append(testnum);
sb.append(" ");
}
}
};
t1.start();
t2.start();
try {
t1.join();
t2.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Result is: " + sb.toString());
}
public static void main(String args[]) {
testStringBuilder();
}
}
When I execute this, I get the output sometimes in a random manner, so this proves my test. But when I even replace StringBuilder with StringBuffer and test, even it gives me unexpected output(rather than sequential which from 1 to 200). So can someone help me getting to know the difference visually?
P.S : If anyone has your code which shows the difference, I would be very glad to accept it as an answer. Because I am not sure whether I can achieve the difference with my code even though it is modified.
(rather than sequential which from 1 to 200)
Each thread is performing a read, modify, write operation on testnum. That in itself is not thread-safe.
Then each thread is fetching the value of testnum again in order to append it. The other thread may well have interrupted by then and incremented the value again.
If you change your code to:
AtomicInteger counter = new AtomicInteger();
...
sb.append(counter.getAndIncrement());
then you're more likely to see what you expect.
To make it clearer, change your loops to only call append once, like this:
for (int x = 0; x < 100; x++) {
sb.append(counter.incrementAndGet() + " ");
}
When I do that, for StringBuffer I always get "perfect" output. For StringBuilder I sometimes get output like this:
97 98 100 102 104
Here the two threads have both been appending at the same time, and the contents have been screwed up.
EDIT: Here's a somewhat shorter complete example:
import java.util.concurrent.atomic.AtomicInteger;
public class Test {
public static void main(String[] args) throws InterruptedException {
final AtomicInteger counter = new AtomicInteger();
// Change to StringBuffer to see "working" output
final StringBuilder sb = new StringBuilder();
Runnable runnable = new Runnable() {
#Override
public void run() {
for (int x = 0; x < 100; x++) {
sb.append(counter.incrementAndGet() + " ");
}
}
};
Thread t1 = new Thread(runnable);
Thread t2 = new Thread(runnable);
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(sb);
}
}
StringBuffer is synchronized at the method level. It means that noone can enter one of his methods if a thread is already in one of his method. But it does not guarantee that one thread will be blocked to use StringBuilder at all as long as the other thread uses it, and so the two threads will still compete for access to methods, and you may have randomly a non-ordered result.
The only way to really lock an access to the StringBuffer is to put the code that access it in a synchronized block:
public void run() {
synchronized(sb) {
for (int x = 0; x < 100; x++) {
testnum++;
sb.append(testnum);
sb.append(" ");
}
}
}
If you don't do that, then Thread 1 can go into sb.append(testnum) and Thread 2 will wait at the entry of it, and when Thread 1 goes out, Thread 2 can potentially go inside and starts to write before Thread 1 enters sb.append(" "). So you would see:
12 13 1415 16 ....
The thing is, locking like this will make things work for StringBuilder also. That's why one could say that the synchronization mechanism on StringBuffer is quite useless, and therefore why it's not used anymore (the same thing for Vector).
So, doing this way can not show you the difference between StringBuilder and StringBuffer. The suggestion in Jon Skeet answer is better.
+1 for what Cyrille said. I imagine that it is only the nature of arrays of inherently atomic types (primitives <= 32 bit) that saves you from get a ConcurrentModificationException with the StringBuilder as you would with, say, appending to a List<Integer>
Basically, you have two threads, each 100 individual operations. The two compete for lock of the object before each append, and release it afterwards, 100 times each. The thread that wins on each iteration will be randomized by the (extremely) small amount of time taken to increment the loop counter and testnum.
More exemplary of the difference from your example is not necessarily the ordering, but ensuring that all insertions are actually accounted for when using a StringBuilder. It has no internal synchronization, so it's entirely possible that some will get munged or overwritten in the process. The StringBuffer will handle this with internal synchronization guaranteeing that all inserts make it in properly, but you'll need external synchronization such as Cyrille's example above to hold a lock for the entire iteration sequence of each thread to safely use a StringBuilder.
I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.
I was reading about CopyOnWriteArrayList and was wondering how can I demonstrate data race in ArrayList class. Basically I'm trying to simulate a situation where ArrayList fails so that it becomes necessary to use CopyOnWriteArrayList. Any suggestions on how to simulate this.
A race is when two (or more) threads try to operate on shared data, and the final output depends on the order the data is accessed (and that order is indeterministic)
From Wikipedia:
A race condition or race hazard is a flaw in an electronic system or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first.
For example:
public class Test {
private static List<String> list = new CopyOnWriteArrayList<String>();
public static void main(String[] args) throws Exception {
ExecutorService e = Executors.newFixedThreadPool(5);
e.execute(new WriterTask());
e.execute(new WriterTask());
e.execute(new WriterTask());
e.execute(new WriterTask());
e.execute(new WriterTask());
e.awaitTermination(20, TimeUnit.SECONDS);
}
static class WriterTask implements Runnable {
#Override
public void run() {
for (int i = 0; i < 25000; i ++) {
list.add("a");
}
}
}
}
This, however, fails when using ArrayList, with ArrayIndexOutOfbounds. That's because before insertion the ensureCapacity(..) should be called to make sure the internal array can hold the new data. And here's what happens:
the first thread calls add(..), which in turn calls ensureCapacity(currentSize + 1)
before the first thread has actually incremented the size, the 2nd thread also calls ensureCapacity(currentSize + 1).
because both have read the initial value of currentSize, the new size of the internal array is currentSize + 1
the two threads make the expensive operation to copy the old array into the extended one, with the new size (which cannot hold both additions)
Then each of them tries to assign the new element to array[size++]. The first one succeeds, the second one fails, because the internal array has not been expanded properly, due to the rece condition.
This happens, because two threads have tried to add items at the same time on the same structure, and the addition of one of them has overridden the addition of the other (i.e. the first one was lost)
Another benefit of CopyOnWriteArrayList
multiple threads write to the ArrayList
a thread iterates the ArrayList. It will surely get ConcurrentModificationException
Here's how to demonstrate it:
public class Test {
private static List<String> list = new ArrayList<String>();
public static void main(String[] args) throws Exception {
ExecutorService e = Executors.newFixedThreadPool(2);
e.execute(new WriterTask());
e.execute(new ReaderTask());
}
static class ReaderTask implements Runnable {
#Override
public void run() {
while (true) {
for (String s : list) {
System.out.println(s);
}
}
}
}
static class WriterTask implements Runnable {
#Override
public void run() {
while(true) {
list.add("a");
}
}
}
}
If you run this program multiple times, you will often be getting ConcurrentModificationException before you get OutOfMemoryError.
If you replace it with CopyOnWriteArrayList, you don't get the exception (but the program is very slow)
Note that this is just a demonstration - the benefit of CopyOnWriteArrayList is when the number of reads vastly outnumbers the number of writes.
Example:
for (int i = 0; i < array.size(); ++i) {
Element elm = array.get(i);
doSomethingWith(elm);
}
If another thread calls array.clear() before this thread calls array.get(i), but after it has compared i with array.size(), -> ArrayIndexOutOfBoundsException.
Two threads, one incrementing the arraylist and one decrementing. Data race could happen here.