Are there any concurrency problems with one thread reading from one index of an array, while another thread writes to another index of the array, as long as the indices are different?
e.g. (this example not necessarily recommended for real use, only to illustrate my point)
class Test1
{
static final private int N = 4096;
final private int[] x = new int[N];
final private AtomicInteger nwritten = new AtomicInteger(0);
// invariant:
// all values x[i] where 0 <= i < nwritten.get() are immutable
// read() is not synchronized since we want it to be fast
int read(int index) {
if (index >= nwritten.get())
throw new IllegalArgumentException();
return x[index];
}
// write() is synchronized to handle multiple writers
// (using compare-and-set techniques to avoid blocking algorithms
// is nontrivial)
synchronized void write(int x_i) {
int index = nwriting.get();
if (index >= N)
throw SomeExceptionThatIndicatesArrayIsFull();
x[index] = x_i;
// from this point forward, x[index] is fixed in stone
nwriting.set(index+1);
}
}
edit: critiquing this example is not my question, I literally just want to know if array access to one index, concurrently to access of another index, poses concurrency problems, couldn't think of a simple example.
While you will not get an invalid state by changing arrays as you mention, you will have the same problem that happens when two threads are viewing a non volatile integer without synchronization (see the section in the Java Tutorial on Memory Consistency Errors). Basically, the problem is that Thread 1 may write a value in space i, but there is no guarantee when (or if) Thread 2 will see the change.
The class java.util.concurrent.atomic.AtomicIntegerArray does what you want to do.
The example has a lot of stuff that differs from the prose question.
The answer to that question is that distinct elements of an array are accessed independently, so you don't need synchronization if two threads change different elements.
However, the Java memory model makes no guarantees (that I'm aware of) that a value written by one thread will be visible to another thread, unless you synchronize access.
Depending on what you're really trying to accomplish, it's likely that java.util.concurrent already has a class that will do it for you. And if it doesn't, I still recommend taking a look at the source code for ConcurrentHashMap, since your code appears to be doing the same thing that it does to manage the hash table.
I am not really sure if synchronizing only the write method, while leaving the read method unsychronized would work. Not really what are all the consequences, but at least it might lead to read method returning some values that has just been overriden by write.
Yes, as bad cache interleaving can still happen in a multi-cpu/core environment. There are several options to avoid it:
Use the Unsafe Sun-private library to atomically set an element in an array (or the jsr166y added feature in Java7
Use AtomicXYZ[] array
Use custom object with one volatile field and have an array of that object.
Use the ParallelArray of jsr166y addendum instead in your algorithm
Since read() is not synchronized you could have the following scenario:
Thread A enters write() method
Thread A writes to nwriting = 0;
Thread B reads from nwriting =0;
Thread A increments nwriting. nwriting=1
Thread A exits write();
Since you want to guarantee that your variable addresses never conflict, what about something like (discounting array index issues):
int i;
synchronized int curr(){ return i; }
synchronized int next(){ return ++i;}
int read( ) {
return values[curr()];
}
void write(int x){
values[next()]=x;
}
Related
I found the following Java code.
for (int type = 0; type < typeCount; type++)
synchronized(result) {
result[type] += parts[type];
}
}
where result and parts are double[].
I know basic operations on primitive types are thread-safe, but I am not sure about +=. If the above synchronized is necessary, is there maybe a better class to handle such operation?
No. The += operation is not thread-safe. It requires locking and / or a proper chain of "happens-before" relationships for any expression involving assignment to a shared field or array element to be thread-safe.
(With a field declared as volatile, the "happens-before" relationships exist ... but only on read and write operations. The += operation consists of a read and a write. These are individually atomic, but the sequence isn't. And most assignment expressions using = involve both one or more reads (on the right hand side) and a write. That sequence is not atomic either.)
For the complete story, read JLS 17.4 ... or the relevant chapter of "Java Concurrency in Action" by Brian Goetz et al.
As I know basic operations on primitive types are thread-safe ...
Actually, that is an incorrect premise:
consider the case of arrays
consider that expressions are typically composed of a sequence of operations, and that a sequence of atomic operations is not guaranteed to be atomic.
There is an additional issue for the double type. The JLS (17.7) says this:
"For the purposes of the Java programming language memory model, a single write to a non-volatile long or double value is treated as two separate writes: one to each 32-bit half. This can result in a situation where a thread sees the first 32 bits of a 64-bit value from one write, and the second 32 bits from another write."
"Writes and reads of volatile long and double values are always atomic."
In a comment, you asked:
So what type I should use to avoid global synchronization, which stops all threads inside this loop?
In this case (where you are updating a double[], there is no alternative to synchronization with locks or primitive mutexes.
If you had an int[] or a long[] you could replace them with AtomicIntegerArray or AtomicLongArray and make use of those classes' lock-free update. However there is no AtomicDoubleArray class, or even an AtomicDouble class.
(UPDATE - someone pointed out that Guava provides an AtomicDoubleArray class, so that would be an option. A good one actually.)
One way of avoiding a "global lock" and massive contention problems might be to divide the array into notional regions, each with its own lock. That way, one thread only needs to block another thread if they are using the same region of the array. (Single writer / multiple reader locks could help too ... if the vast majority of accesses are reads.)
Despite of the fact that there is no AtomicDouble or AtomicDoubleArray in java, you can easily create your own based on AtomicLongArray.
static class AtomicDoubleArray {
private final AtomicLongArray inner;
public AtomicDoubleArray(int length) {
inner = new AtomicLongArray(length);
}
public int length() {
return inner.length();
}
public double get(int i) {
return Double.longBitsToDouble(inner.get(i));
}
public void set(int i, double newValue) {
inner.set(i, Double.doubleToLongBits(newValue));
}
public void add(int i, double delta) {
long prevLong, nextLong;
do {
prevLong = inner.get(i);
nextLong = Double.doubleToLongBits(Double.longBitsToDouble(prevLong) + delta);
} while (!inner.compareAndSet(i, prevLong, nextLong));
}
}
As you can see, I use Double.doubleToLongBits and Double.longBitsToDouble to store Doubles as Longs in AtomicLongArray. They both have the same size in bits, so precision is not lost (except for -NaN, but I don't think it is important).
In Java 8 the implementation of add can be even easier, as you can use accumulateAndGet method of AtomicLongArray that was added in java 1.8.
Upd: It appears that I virtually re-implemented guava's AtomicDoubleArray.
Even the normal 'double' data type is not thread-safe (because it is not atomic) in 32-bit JVMs as it takes eight bytes in Java (which involves 2*32 bit operations).
As it's already explained, this code is not thread safe. One possible solution to avoid synchronization in Java-8 is to use new DoubleAdder class which is capable to maintain the sum of double numbers in thread-safe manner.
Create array of DoubleAdder objects before parallelizing:
DoubleAdder[] adders = Stream.generate(DoubleAdder::new)
.limit(typeCount).toArray(DoubleAdder[]::new);
Then accumulate the sum in parallel threads like this:
for(int type = 0; type < typeCount; type++)
adders[type].add(parts[type]);
}
Finally get the result after parallel subtasks finished:
double[] result = Arrays.stream(adders).mapToDouble(DoubleAdder::sum).toArray();
I was just looking for the answer for the question why ArrayList is faster than Vector and i found ArrayList is faster as it is not synchronized.
so my doubt is:
If ArrayList is not synchronized why would we use it in multithreaded environment and compare it with Vector.
If we are in a single threaded environment then how the performance of the Vector decreases as there is no Synchronization going on as we are dealing with a single thread.
Why should we compare the performance considering the above points ?
Please guide me :)
a) Methods using ArrayList in a multithreaded program may be synchronized.
class X {
List l = new ArrayList();
synchronized void add(Object e) {
l.add(e);
}
...
b) We can use ArrayList without exposing it to other threads, this is when ArrayList is referenced only from local variables
void x() {
List l = new ArrayList(); // no other thread except current can access l
...
Even in a single threaded environment entering a synchronized method takes a lock, this is where we lose performance
public synchronized boolean add(E e) { // current thread will take a lock here
modCount++;
...
You can use ArrayList in a multithread environment if the list is not shared between threads.
If the list is shared between threads you can synchronize the access to that list.
Otherwise you can use Collections.synchronizedList() to get a List that can be used thread safely.
Vector is an old implementation of a synchronized List that is no longer used because the internal implementation basically synchronize every method. Generally you want to synchronize a sequence of operations. Otherwyse you can throw a ConcurrentModificationException when iterating the list another thread modify it. In addition synchronize every method is not good from a performance point of view.
In addition also in a single thread environment accessing a synchronized method needs to perform some operations, so also in a single thread application Vector is not a good solution.
Just because a component is single threaded doesn't mean that it cannot be used in a thread safe context. Your application may have it's own locking in which case additional locking is redundant work.
Conversely, just because a component is thread safe, it doesn't mean that you cannot use it in an unsafe manner. Typically thread safety extends to a single operation. E.g. if you take an Iterator and call next() on a collection this is two operations and they are no longer thread safe when used in combination. You still have to use locking for Vector. Another simple example is
private Vector<Integer> vec =
vec.add(1);
int n = vec.remove(vec.size());
assert n == 1;
This is atleast three operations however the number of things which can go wrong are much more than you might suppose. This is why you end up doing your own locking and why the locking inside Vector might be redundant, even unwanted.
For you own interest;
vec can change at any point t another Vector or null
vec.add(2) can happen between any operation, changing the size and the last element.
vec.remove() can happen between any operation.
vec.add(null) can happen between any operation resulting in a possible NullPointerException
The vec can /* change */ in these places.
private Vector<Integer> vec =
vec.add(1); /* change*/
int n = vec.remove(vec.size() /* change*/);
assert n == 1;
In short, assuming that just because you used a thread safe collection your code is now thread safe is a big assumption.
A common pattern which breaks is
for(int n : vec) {
// do something.
}
Look harmless enough except
for(Iterator iter = vec.iterator(); /* change */ vec.hasNext(); ) {
/* change */ int n = vec.next();
I have marked with /* change */ where another thread could change the collection meaning this loop can get a ConcurrentModificationException (but might not)
there is no Synchronization
The JVM doesn't know there is no need for synchronization and so it still has to do something. It has an optimisation to reduce the cost of uncontended locks, but it still has to do work.
You need to understand the basic concept to know answer for your above questions...
When you say array list is not syncronized and vector is, we mean that the methods in those classes (like add(), get(), remove() etc...) are synchronized in vector class and not in array list class. These methods will act upon tha data being stored .
So, the data saved in vector class cannot be edited / read parallely as add, get, remove metods are synchornized and the same in array list can be done parallely as these methods in array list are not synchronized...
This parallel activity makes array list fast and vector slow... This behavior remains same though you use them in either multithreaded (or) single threaded enviornment...
Hope this answers your question...
Java's present memory model guarantees that if the only reference to an object "George" is stored into a final field of some other object "Joe", and neither George nor Joe have never been seen by any other thread, all operations upon George which were performed before the store will be seen by all threads as having been performed before the store. This works out very nicely in cases where it makes sense to store into a final field a reference to an object which will never be mutated after that.
Is there any efficient way of achieving such semantics in cases where an object of mutable type is supposed to be lazily created (sometime after the owning object's constructor has finished execution)? Consider the fairly simple class ArrayThing which encapsulates an immutable array, but it offers a method (three versions with the same nominal purpose) to return the sum of all elements prior to a specified one. For purposes of this example, assume that many instances will be constructed without ever using that method, but on instances where that method is used, it will be used a lot; consequently, it's not worthwhile to precompute the sums when every instance of ArrayThing is constructed, but it is worthwhile to cache them.
class ArrayThing {
final int[] mainArray;
ArrayThing(int[] initialContents) {
mainArray = (int[])initialContents.clone();
}
public int getElementAt(int index) {
return mainArray[index];
}
int[] makeNewSumsArray() {
int[] temp = new int[mainArray.length+1];
int sum=0;
for (int i=0; i<mainArray.length; i++) {
temp[i] = sum;
sum += mainArray[i];
}
temp[i] = sum;
return temp;
}
// Unsafe version (a thread could be seen as setting sumOfPrevElements1
// before it's seen as populating array).
int[] sumOfPrevElements1;
public int getSumOfElementsBefore_v1(int index) {
int[] localElements = sumOfPrevElements1;
if (localElements == null) {
localElements = makeNewSumsArray();
sumOfPrevElements1 = localElements;
}
return localElements[index];
}
static class Holder {
public final int[] it;
public Holder(int[] dat) { it = dat; }
}
// Safe version, but slower to read (adds another level of indirection
// but no thread can possibly see a write to sumOfPreviousElements2
// before the final field and the underlying array have been written.
Holder sumOfPrevElements2;
public int getSumOfElementsBefore_v2(int index) {
Holder localElements = sumOfPrevElements2;
if (localElements == null) {
localElements = new Holder(makeNewSumsArray());
sumOfPrevElements2 = localElements;
}
return localElements.it[index];
}
// Safe version, I think; but no penalty on reading speed.
// Before storing the reference to the new array, however, it
// creates a temporary object which is almost immediately
// discarded; that seems rather hokey.
int[] sumOfPrevElements3;
public int getSumOfElementsBefore_v3(int index) {
int[] localElements = sumOfPrevElements3;
if (localElements == null) {
localElements = (new Holder(makeNewSumsArray())).it;
sumOfPrevElements3 = localElements;
}
return localElements[index];
}
}
As with the String#hashCode() method, it is possible that two or more threads might see that a computation hasn't been performed, decide to perform it, and store the result. Since all threads would end up producing identical results, that wouldn't be an issue. With getSumOfElementsBefore_v1(), however, there is a different problem: Java could re-order program execution so the array reference gets written to sumOfPrevElements1 before all the elements of the array have been written. Another thread which called getSumOfElementsBefore() at that moment could see that the array wasn't null, and then proceed to read an array element which hadn't yet been written. Oops.
From what I understand, getSumOfElementsBefore_v2() fixes that problem, since storing a reference to the array in final field Holder#it would establish a "happens-after" relationship with regard to the array element writes. Unfortunately, that version of the code would need to create and maintain an extra heap object, and would require that every attempt to access the sum-of-elements array go through an extra level of indirection.
I think getSumOfElementsBefore_v3() would be cheaper but still safe. The JVM guarantees that all actions which were done to a new object before a reference is stored into a final field will be visible to all threads by the time any thread can see that reference. Thus, even if other threads don't use Holder#it directly, the fact that they are using a reference which was copied from that field would establish that they can't see the reference until after everything that was done before the store has actually happened.
Even though the latter method limits the overhead (versus the unsafe method) to the times when the new array is created (rather than adding overhead to every read), it still seems rather ugly to create a new object purely for the purpose of writing and reading back a final field. Making the array field volatile would achieve legitimate semantics, but would add memory-system overhead every time the field is read (a volatile qualifier would require that the code notice if the field has been written in another thread, but that's overkill for this application; what's necessary is merely that any thread which does see that the field has been written also see all writes which occurred to the array identify thereby before the reference was stored). Is there any way to achieve similar semantics without having to either create and abandon a superfluous temporary object, or add additional overhead every time the field is read??
Your third version does not work. The guarantees made for a properly constructed object stored in a final instance field apply to reads of that final field only. Since the other threads don’t read that final variable, there is no guaranty made.
Most notably, the fact that the initialization of the array has to be completed before the array reference is stored in the final Holder.it variable does not say anything about when the sumOfPrevElements3 variable will be written (as seen by other threads). In practice, a JVM might optimize away the entire Holder instance creation as it has no side-effects, thus the resulting code behaves like an ordinary unsafe publication of an int[] array.
For using the final field publication guaranty you have to publish the Holder instance containing the final field, there is no way around it.
But if that additional instance annoys you, you should really consider using a simple volatile variable. After all, you are making only assumptions about the cost of that volatile variable, in other words, thinking about premature optimization.
After all, detecting a change made by another thread doesn’t have to be expensive, e.g. on x86 it doesn’t even need an access to the main memory as it has cache coherence. It’s also possible that an optimizer detects that you never write to the variable again once it became non-null, then enabling almost all optimizations possible for ordinary fields once a non-null reference has been read.
So the conclusion is as always: measure, don’t guess. And start optimizing only once you found a real bottleneck.
I think your second and third examples do work (sort of, as you say the reference itself might not be noticed by another thread, which might re-assign the array. That's a lot of extra work!).
But those examples are based on a faulty premise: it is not true that a volatile field requires the reader to "notice" the change. In fact, volatile and final fields perform exactly the same operation. The read operation of a volatile or a final has no overhead on most CPU architectures. I believe on a write volatile has a tiny amount of extra overhead.
So I would just use volatile here, and not worry about your supposed "optimizations". The difference in speed, if any, is going to be extremely slight, and I'm talking like an extra 4 bytes written with a bus-lock, if that. And your "optimized" code is pretty god-awful to read.
As a minor pendant, it is not true that final fields require you to have the sole reference to an object to make it immutable and thread safe. The spec only requires you to prevent changes to the object. Having the sole reference to an object is one way to prevent changes, sure. But objects that are already immutable (like java.lang.String for example) can be shared without problems.
In summary: Premature Optimization is the Root of All Evil.. Loose the tricky nonsense and just write a simple array update with assignment to a volatile.
volatile int[] sumOfPrevElements;
public int getSumOfElementsBefore(int index) {
if( sumOfPrevElements != null ) return sumOfPrevElements[index];
sumOfPrevElements = makeNewSumsArray();
return sumOfPrevElements[index];
}
I'm considering implementations of multi-threaded sorting with use of one volatile array. Let's say I have an array of length N, and M threads that will sort sub-ranges of the array. These sub-ranges are disjoint. Then, in the main thread I will merge partially sorted array.
Example code:
final int N = ....
volatile MyClass[] array = new MyClass[N];
//... fill array with values
void sort(){
MyThread[] workers = new MyThread[M];
int len = N/M; //length of the sub-range
for(int i=0;i<M;++i){
workers[i] = new MyThread(i*len, (i+1)*len);
workers[i].start();
}
for(int i=0;i<M;++i)workers.join();
//now synchronization in memory using "happens before"
//will it work?
array = array;
//...merge sorted sub-ranges into one sorted array
}
private class MyThread extends Thread{
final int from;
final int to;
public MyThread(int from, int to){ ..... }
public void run(){
//...something like: quicksort(array, from, to);
//...without synchronization, ranges <from, to> are exclusive
}
I don't need synchronization in memory while running threads because the array sub-ranges are disjoint. I want to do the synchronization once after finished threads. Will the updated version of the array (seen in the main thread) contain all the changes made in the working threads?
If this solution is valid, is it effective for large tables?
Thank you in advance for your help.
EDIT:
I ran the tests. I received correct results regardless of the use of volatile keyword. But the time of execution is a few times (about M-times) longer for a volatile array.
Not an answer, just some thoughts:
There is no such thing as a volatile array. Only fields can be volatile. You have declared a volatile field named "array", and initialized it with a reference to an array object.
It looks like you are expecting the statement, array = array to act as a full memory barrier. I don't know if it will or if it won't, or if the answer depends on what compiler, what JVM and, what operating system you use. Maybe somebody more expert than I can answer.
I don't like it for two reasons though: One is, it looks like a no-op. It's an invitation for some other programmer who doesn't understand what you're trying to do to come along and "clean up" the code by deleting it. A tricky statement like that should be wrapped in a function with a name that explains the trick.
Two is, the function of that statement has nothing to do with the array that the field references. It would be better to use a volatile int field or a volatile somethingelse field that obviously has no connection to the array, thereby calling attention to the fact that what matters is something other than the value of the field.
Update: According to Brian Goetz, that one statement won't do what you want. What you need is for each worker thread to update the volatile field after finishing its work, and then you need the master thread to read the volatile field before it tries to look at the worker's results.
On the other hand... Do you need the barrier at all? Isn't it enough that the worker threads all terminated and the master join()ed them? Again, maybe somebody more expert than myself can answer.
What you're doing looks very messy and as suggested, probably won't work as expected.
If you use Java8 then perhaps the parallel sort is for you. Otherwise --
Sorting a single array in place, in parallel is a horror show. Sorting in parallel is rather simple if you create a new array of sorted elements.
Create objects of the the sub-array (you'll need to do this eventually). Pass each object to a thread. Let the threads sort their objects in parallel. When all sorts are done, merge the sorted objects into a new array.
That means there is more memory required, but its rather easy and you don't need to worry about volatile or synchronization.
I am new to multi-threading in Java and don't quite understand what's going on.
From online tutorials and lecture notes, I know that the synchronized block, which must be applied to a non-null object, ensures that only one thread can execute that block of code. Since an array is an object in Java, synchronize can be applied to it. Further, if the array stores objects, I should be able to synchronize each element of the array too.
My program has several threads updated an array of numbers, hence I created an array of Long objects:
synchronized (grid[arrayIndex]){
grid[arrayIndex] += a.getNumber();
}
This code sits inside the run() method of the thread class which I have extended. The array, grid, is shared by all of my threads. However, this does not return the correct results while running the same program on one thread does.
This will not work. It is important to realize that grid[arrayIndex] += ... is actually replacing the element in the grid with a new object. This means that you are synchronizing on an object in the array and then immediately replacing the object with another in the array. This will cause other threads to lock on a different object so they won't block. You must lock on a constant object.
You can instead lock on the entire array object, if it is never replaced with another array object:
synchronized (grid) {
// this changes the object to another Long so can't be used to lock
grid[arrayIndex] += a.getNumber();
}
This is one of the reasons why it is a good pattern to lock on a final object. See this answer with more details:
Why is it not a good practice to synchronize on Boolean?
Another option would be to use an array of AtomicLong objects, and use their addAndGet() or getAndAdd() method. You wouldn't need synchronization to increment your objects, and multiple objects could be incremented concurrently.
The java class Long is immutable, you cannot change its value. So when you perform an action:
grid[arrayIndex] += a.getNumber();
it is not changing the value of grid[arrayIndex], which you are locking on, but is actually creating a new Long object and setting its value to the old value plus a.getNumber. So you will end up with different threads synchronizing on different objects, which leads to the results you are seeing
The synchronized block you have here is no good. When you synchronize on the array element, which is presumably a number, you're synchronizing only on that object. When you reassign the element of the array to a different object than the one you started with, the synchronization is no longer on the correct object and other threads will be able to access that index.
One of these two options would be more correct:
private final int[] grid = new int[10];
synchronized (grid) {
grid[arrayIndex] += a.getNumber();
}
If grid can't be final:
private final Object MUTEX = new Object();
synchronized (MUTEX) {
grid[arrayIndex] += a.getNumber();
}
If you use the second option and grid is not final, any assignment to grid should also be synchronized.
synchronized (MUTEX) {
grid = new int[20];
}
Always synchronize on something final, always synchronize on both access and modification, and once you have that down, you can start looking into other locking mechanisms, such as Lock, ReadWriteLock, and Semaphore. These can provide more complex locking mechanisms than synchronization that is better for scenarios where Java's default synchronization alone isn't enough, such as locking data in a high-throughput system (read/write locking) or locking in resource pools (counting semaphores).