volatile array and multithreaded sorting

volatile array and multithreaded sorting - java

I'm considering implementations of multi-threaded sorting with use of one volatile array. Let's say I have an array of length N, and M threads that will sort sub-ranges of the array. These sub-ranges are disjoint. Then, in the main thread I will merge partially sorted array.
Example code:
final int N = ....
volatile MyClass[] array = new MyClass[N];
//... fill array with values
void sort(){
MyThread[] workers = new MyThread[M];
int len = N/M; //length of the sub-range
for(int i=0;i<M;++i){
workers[i] = new MyThread(i*len, (i+1)*len);
workers[i].start();
}
for(int i=0;i<M;++i)workers.join();
//now synchronization in memory using "happens before"
//will it work?
array = array;
//...merge sorted sub-ranges into one sorted array
}
private class MyThread extends Thread{
final int from;
final int to;
public MyThread(int from, int to){ ..... }
public void run(){
//...something like: quicksort(array, from, to);
//...without synchronization, ranges <from, to> are exclusive
}
I don't need synchronization in memory while running threads because the array sub-ranges are disjoint. I want to do the synchronization once after finished threads. Will the updated version of the array (seen in the main thread) contain all the changes made in the working threads?
If this solution is valid, is it effective for large tables?
Thank you in advance for your help.
EDIT:
I ran the tests. I received correct results regardless of the use of volatile keyword. But the time of execution is a few times (about M-times) longer for a volatile array.

Not an answer, just some thoughts:
There is no such thing as a volatile array. Only fields can be volatile. You have declared a volatile field named "array", and initialized it with a reference to an array object.
It looks like you are expecting the statement, array = array to act as a full memory barrier. I don't know if it will or if it won't, or if the answer depends on what compiler, what JVM and, what operating system you use. Maybe somebody more expert than I can answer.
I don't like it for two reasons though: One is, it looks like a no-op. It's an invitation for some other programmer who doesn't understand what you're trying to do to come along and "clean up" the code by deleting it. A tricky statement like that should be wrapped in a function with a name that explains the trick.
Two is, the function of that statement has nothing to do with the array that the field references. It would be better to use a volatile int field or a volatile somethingelse field that obviously has no connection to the array, thereby calling attention to the fact that what matters is something other than the value of the field.
Update: According to Brian Goetz, that one statement won't do what you want. What you need is for each worker thread to update the volatile field after finishing its work, and then you need the master thread to read the volatile field before it tries to look at the worker's results.
On the other hand... Do you need the barrier at all? Isn't it enough that the worker threads all terminated and the master join()ed them? Again, maybe somebody more expert than myself can answer.

What you're doing looks very messy and as suggested, probably won't work as expected.
If you use Java8 then perhaps the parallel sort is for you. Otherwise --
Sorting a single array in place, in parallel is a horror show. Sorting in parallel is rather simple if you create a new array of sorted elements.
Create objects of the the sub-array (you'll need to do this eventually). Pass each object to a thread. Let the threads sort their objects in parallel. When all sorts are done, merge the sorted objects into a new array.
That means there is more memory required, but its rather easy and you don't need to worry about volatile or synchronization.

Related

How to create thread safe object array in Java?

I've searched for this question and I only found answer for primitive type arrays.
Let's say I have a class called MyClass and I want to have an array of its objects in my another class.
class AnotherClass {
[modifiers(?)] MyClass myObjects;
void initFunction( ... ) {
// some code
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
return myObjects[index];
}
}
I read somewhere that declaring an array volatile does not give volatile access to its fields, but giving a new value of the array is safe.
So, if I understand it well, if I give my array a volatile modifier in my example code, it would be (kinda?) safe. In case of I never change its values by the [] operator.
Or am I wrong? And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
AtomicXYZArray is not an option because it is only good for a primitive type arrays. AtomicIntegerArray uses native code for get() and set(), so it didn't help me.
Edit 1:
Collections.synchronizedList(...) can be a good alternative I think, but now I'm looking for arrays.
Edit 2: initFunction() is called from a different class.
AtomicReferenceArray seems to be a good answer. I didn't know about it, up to now. (I'm still interested in that my example code would work with volatile modifier (before the array) with only this two function called from somewhere else.)
This is my first question. I hope I managed to reach the formal requirements. Thanks.

Yes you are correct when you say that the volatile word will not fulfill your case, as it will protect the reference to the array and not its elements.
If you want both, Collections.synchronizedList(...) or synchronized collections is the easiest way to go.
Using modifiers like you are inclining to do is not the way to do this, as you will not affect the elements.
If you really, must, use and array like this one: new MyClass[]{ ... };
Then AnotherClass is the one that needs to take responsibility for its safety, you are probably looking for lower level synchronization here: synchronized key word and locks.
The synchonized key word is the easier and yuo may create blocks and method that lock in a object, or in the class instance by default.
In higher levels you can use Streams to perform a job for you. But in the end, I would suggest you use a synchronized version of an arraylist if you are already using arrays. and a volatile reference to it, if necessary. If you do not update the reference to your array after your class is created, you don't need volatile and you better make it final, if possible.

For your data to be thread-safe you want to ensure that there are no simultaneous:
write/write operations
read/write operations
by threads to the same object. This is known as the readers/writers problem. Note that it is perfectly fine for two threads to simultaneously read data at the same time from the same object.
You can enforce the above properties to a satisfiable level in normal circumstances by using the synchronized modifier (which acts as a lock on objects) and atomic constructs (which performs operations "instantaneously") in methods and for members. This essentially ensures that no two threads can access the same resource at the same time in a way that would lead to bad interleaving.
if I give my array a volatile modifier in my example code, it would be (kinda?) safe.
The volatile keyword will place the array reference in main memory and ensure that no thread can cache a local copy of it within their private memory, which helps with thread visibility although it won't guarantee thread safety by itself. Also the use of volatile should be used sparsely unless by experienced programmers as it may cause unintended effects on the program.
And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
Create synchronized mutator methods for the mutable members of your class if they need to be changed or use the methods provided by atomic objects within your classes. This would be the simplest approach to changing your data without causing any unintended side-effects (for example, removing the object from the array whilst a thread is accessing the data in the object being removed).

Volatile does actually work in this case with one caveat: all the operations on MyClass may only read values.
Compared to all what you might read about what volatile does, it has one purpose in the JMM: creating a happens-before relationship. It only affects two kinds of operations:
volatile read (eg. accessing the field)
volatile write (eg. assignment to the field)
That's it. A happens-before relationship, straight from the JLS §17.4.5:
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
These relationships are transitive. Taken all together this implies some important points: All actions taken on a single thread happened-before that thread's volatile write to that field (third point above). A volatile write of a field happens-before a read of that field (point two). So any other thread that reads the volatile field would see all the updates, including all referred to objects like array elements in this case, as visible (first point). Importantly, they are only guaranteed to see the updates visible when the field was written. This means that if you fully construct an object, and then assign it to a volatile field and then never mutate it or any of the objects it refers to, it will be never be in an inconsistent state. This is safe taken with the caveat above:
class AnotherClass {
private volatile MyClass[] myObjects = null;
void initFunction( ... ) {
// Using a volatile write with a fully constructed object.
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
return null; // or something else
}
else {
// should probably check length too
return local[index];
}
}
}
I'm assuming you're only calling initFunction once. Even if you did call it more than once you would just clobber the values there, it wouldn't ever be in an inconsistent state.
You're also correct that updating this structure is not quite straightforward because you aren't allowed to mutate the array. Copy and replace, as you stated is common. Assuming that only one thread will be updating the values you can simply grab a reference to the current array, copy the values into a new array, and then re-assign the newly constructed value back to the volatile reference. Example:
private void add(MyClass newClass) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
// volatile write
myObjects = new MyClass[] { newClass };
}
else {
MyClass[] withUpdates = new MyClass[local.length + 1];
// System.arrayCopy
withUpdates[local.length] = newClass;
// volatile write
myObjects = withUpdates;
}
}
If you're going to have more than one thread updating then you're going to run into issues where you lose additions to the array as two threads could copy and old array, create a new array with their new element and then the last write would win. In that case you need to either use more synchronization or AtomicReferenceFieldUpdater

ArrayList vs Vector performance in single-threaded application

I was just looking for the answer for the question why ArrayList is faster than Vector and i found ArrayList is faster as it is not synchronized.
so my doubt is:
If ArrayList is not synchronized why would we use it in multithreaded environment and compare it with Vector.
If we are in a single threaded environment then how the performance of the Vector decreases as there is no Synchronization going on as we are dealing with a single thread.
Why should we compare the performance considering the above points ?
Please guide me :)

a) Methods using ArrayList in a multithreaded program may be synchronized.
class X {
List l = new ArrayList();
synchronized void add(Object e) {
l.add(e);
}
...
b) We can use ArrayList without exposing it to other threads, this is when ArrayList is referenced only from local variables
void x() {
List l = new ArrayList(); // no other thread except current can access l
...
Even in a single threaded environment entering a synchronized method takes a lock, this is where we lose performance
public synchronized boolean add(E e) { // current thread will take a lock here
modCount++;
...

You can use ArrayList in a multithread environment if the list is not shared between threads.
If the list is shared between threads you can synchronize the access to that list.
Otherwise you can use Collections.synchronizedList() to get a List that can be used thread safely.
Vector is an old implementation of a synchronized List that is no longer used because the internal implementation basically synchronize every method. Generally you want to synchronize a sequence of operations. Otherwyse you can throw a ConcurrentModificationException when iterating the list another thread modify it. In addition synchronize every method is not good from a performance point of view.
In addition also in a single thread environment accessing a synchronized method needs to perform some operations, so also in a single thread application Vector is not a good solution.

Just because a component is single threaded doesn't mean that it cannot be used in a thread safe context. Your application may have it's own locking in which case additional locking is redundant work.
Conversely, just because a component is thread safe, it doesn't mean that you cannot use it in an unsafe manner. Typically thread safety extends to a single operation. E.g. if you take an Iterator and call next() on a collection this is two operations and they are no longer thread safe when used in combination. You still have to use locking for Vector. Another simple example is
private Vector<Integer> vec =
vec.add(1);
int n = vec.remove(vec.size());
assert n == 1;
This is atleast three operations however the number of things which can go wrong are much more than you might suppose. This is why you end up doing your own locking and why the locking inside Vector might be redundant, even unwanted.
For you own interest;
vec can change at any point t another Vector or null
vec.add(2) can happen between any operation, changing the size and the last element.
vec.remove() can happen between any operation.
vec.add(null) can happen between any operation resulting in a possible NullPointerException
The vec can /* change */ in these places.
private Vector<Integer> vec =
vec.add(1); /* change*/
int n = vec.remove(vec.size() /* change*/);
assert n == 1;
In short, assuming that just because you used a thread safe collection your code is now thread safe is a big assumption.
A common pattern which breaks is
for(int n : vec) {
// do something.
}
Look harmless enough except
for(Iterator iter = vec.iterator(); /* change */ vec.hasNext(); ) {
/* change */ int n = vec.next();
I have marked with /* change */ where another thread could change the collection meaning this loop can get a ConcurrentModificationException (but might not)
there is no Synchronization
The JVM doesn't know there is no need for synchronization and so it still has to do something. It has an optimisation to reduce the cost of uncontended locks, but it still has to do work.

You need to understand the basic concept to know answer for your above questions...
When you say array list is not syncronized and vector is, we mean that the methods in those classes (like add(), get(), remove() etc...) are synchronized in vector class and not in array list class. These methods will act upon tha data being stored .
So, the data saved in vector class cannot be edited / read parallely as add, get, remove metods are synchornized and the same in array list can be done parallely as these methods in array list are not synchronized...
This parallel activity makes array list fast and vector slow... This behavior remains same though you use them in either multithreaded (or) single threaded enviornment...
Hope this answers your question...

Synchronizing elements in an array

I am new to multi-threading in Java and don't quite understand what's going on.
From online tutorials and lecture notes, I know that the synchronized block, which must be applied to a non-null object, ensures that only one thread can execute that block of code. Since an array is an object in Java, synchronize can be applied to it. Further, if the array stores objects, I should be able to synchronize each element of the array too.
My program has several threads updated an array of numbers, hence I created an array of Long objects:
synchronized (grid[arrayIndex]){
grid[arrayIndex] += a.getNumber();
}
This code sits inside the run() method of the thread class which I have extended. The array, grid, is shared by all of my threads. However, this does not return the correct results while running the same program on one thread does.

This will not work. It is important to realize that grid[arrayIndex] += ... is actually replacing the element in the grid with a new object. This means that you are synchronizing on an object in the array and then immediately replacing the object with another in the array. This will cause other threads to lock on a different object so they won't block. You must lock on a constant object.
You can instead lock on the entire array object, if it is never replaced with another array object:
synchronized (grid) {
// this changes the object to another Long so can't be used to lock
grid[arrayIndex] += a.getNumber();
}
This is one of the reasons why it is a good pattern to lock on a final object. See this answer with more details:
Why is it not a good practice to synchronize on Boolean?

Another option would be to use an array of AtomicLong objects, and use their addAndGet() or getAndAdd() method. You wouldn't need synchronization to increment your objects, and multiple objects could be incremented concurrently.

The java class Long is immutable, you cannot change its value. So when you perform an action:
grid[arrayIndex] += a.getNumber();
it is not changing the value of grid[arrayIndex], which you are locking on, but is actually creating a new Long object and setting its value to the old value plus a.getNumber. So you will end up with different threads synchronizing on different objects, which leads to the results you are seeing

The synchronized block you have here is no good. When you synchronize on the array element, which is presumably a number, you're synchronizing only on that object. When you reassign the element of the array to a different object than the one you started with, the synchronization is no longer on the correct object and other threads will be able to access that index.
One of these two options would be more correct:
private final int[] grid = new int[10];
synchronized (grid) {
grid[arrayIndex] += a.getNumber();
}
If grid can't be final:
private final Object MUTEX = new Object();
synchronized (MUTEX) {
grid[arrayIndex] += a.getNumber();
}
If you use the second option and grid is not final, any assignment to grid should also be synchronized.
synchronized (MUTEX) {
grid = new int[20];
}
Always synchronize on something final, always synchronize on both access and modification, and once you have that down, you can start looking into other locking mechanisms, such as Lock, ReadWriteLock, and Semaphore. These can provide more complex locking mechanisms than synchronization that is better for scenarios where Java's default synchronization alone isn't enough, such as locking data in a high-throughput system (read/write locking) or locking in resource pools (counting semaphores).

Are arrays thread-safe in Java?

Are there any concurrency problems with one thread reading from one index of an array, while another thread writes to another index of the array, as long as the indices are different?
e.g. (this example not necessarily recommended for real use, only to illustrate my point)
class Test1
{
static final private int N = 4096;
final private int[] x = new int[N];
final private AtomicInteger nwritten = new AtomicInteger(0);
// invariant:
// all values x[i] where 0 <= i < nwritten.get() are immutable
// read() is not synchronized since we want it to be fast
int read(int index) {
if (index >= nwritten.get())
throw new IllegalArgumentException();
return x[index];
}
// write() is synchronized to handle multiple writers
// (using compare-and-set techniques to avoid blocking algorithms
// is nontrivial)
synchronized void write(int x_i) {
int index = nwriting.get();
if (index >= N)
throw SomeExceptionThatIndicatesArrayIsFull();
x[index] = x_i;
// from this point forward, x[index] is fixed in stone
nwriting.set(index+1);
}
}
edit: critiquing this example is not my question, I literally just want to know if array access to one index, concurrently to access of another index, poses concurrency problems, couldn't think of a simple example.

While you will not get an invalid state by changing arrays as you mention, you will have the same problem that happens when two threads are viewing a non volatile integer without synchronization (see the section in the Java Tutorial on Memory Consistency Errors). Basically, the problem is that Thread 1 may write a value in space i, but there is no guarantee when (or if) Thread 2 will see the change.
The class java.util.concurrent.atomic.AtomicIntegerArray does what you want to do.

The example has a lot of stuff that differs from the prose question.
The answer to that question is that distinct elements of an array are accessed independently, so you don't need synchronization if two threads change different elements.
However, the Java memory model makes no guarantees (that I'm aware of) that a value written by one thread will be visible to another thread, unless you synchronize access.
Depending on what you're really trying to accomplish, it's likely that java.util.concurrent already has a class that will do it for you. And if it doesn't, I still recommend taking a look at the source code for ConcurrentHashMap, since your code appears to be doing the same thing that it does to manage the hash table.

I am not really sure if synchronizing only the write method, while leaving the read method unsychronized would work. Not really what are all the consequences, but at least it might lead to read method returning some values that has just been overriden by write.

Yes, as bad cache interleaving can still happen in a multi-cpu/core environment. There are several options to avoid it:
Use the Unsafe Sun-private library to atomically set an element in an array (or the jsr166y added feature in Java7
Use AtomicXYZ[] array
Use custom object with one volatile field and have an array of that object.
Use the ParallelArray of jsr166y addendum instead in your algorithm

Since read() is not synchronized you could have the following scenario:
Thread A enters write() method
Thread A writes to nwriting = 0;
Thread B reads from nwriting =0;
Thread A increments nwriting. nwriting=1
Thread A exits write();
Since you want to guarantee that your variable addresses never conflict, what about something like (discounting array index issues):
int i;
synchronized int curr(){ return i; }
synchronized int next(){ return ++i;}
int read( ) {
return values[curr()];
}
void write(int x){
values[next()]=x;
}

Caching of instances

I am just curious about the java memory model a little.
Here is what i though.
If i have the following class
public class Test {
int[] numbers = new int[Integer.MAX_VALUE]; // kids dont try this at home
void increment(int ind){
numbers[ind]++;
}
int get(int ind){
return numbers[ind];
}
}
There are multiple readers get() and one writer increment() thread accessing this class.
The question is here , is there actually any synchronization at all that i have to do in order to leave the class at a consistent state after each method call?
Why i am asking this, i am curious if the elements in the array are cached in some way by the JVM or is this only applied to class members? If the members inside the array could be cached, is there a way to define them as volatile ?
Thanks
Roman

As an alternative to synchronizing those methods, you could also consider replacing the int[] with an array of AtomicIntegers. This would have the benefit/downside (depending on your application) of allowing concurrent access to different elements in your list.

You will definitely have to use some sort of synchronization (either on your class or the underlying data structure) in order to ensure the data is left in a consistent state after method calls. Consider the following situations, with two Threads A and B, with the integer array initially containing all zero values.
Thread A calls increment(0). The post-increment operation is not atomic; you can actually consider it to be broken down into at least three steps:
Read the current value; Add one to the current value; Store the value.
Thread B also calls increment(0). If this happens soon after Thread A has done the same, they will both read the same initial value for the element at index 0 of the array.
At this point, both Thread A and B have read a value of '0' for the element they want to increment. Both will increment the value to '1' and store it back in the first element of the array.
Thus, only the work of the Thread that last writes to the array is seen.
The situation is similar if you had a decrement() method. If both increment() and decrement() were called at near-simultaneous times by two separate Threads, there is no telling what the outcome would be. The value would either be incremented by one or decremented by one, and the operations would not "cancel" each other out.
EDIT: Update to reflect Roman's (OP) comment below
Sorry, I mis-read the post. I think I understand your question, which is along the lines of:
"If I declare an array as volatile,
does that mean access to its elements
are treated as volatile as well?"
The quick answer is No: Please see this article for more information; the information in the previous answers here is also correct.

Yes, the VM is allowed to cache inside the thread any field that is not synchronized or voltile. To prevent this, you could mark the fields as volatile, but they still wouldn't be thread safe, since ++ is not an atomic operation. Add the synchronized keyword to the methods, and you're safe.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.