I am just curious about the java memory model a little.
Here is what i though.
If i have the following class
public class Test {
int[] numbers = new int[Integer.MAX_VALUE]; // kids dont try this at home
void increment(int ind){
numbers[ind]++;
}
int get(int ind){
return numbers[ind];
}
}
There are multiple readers get() and one writer increment() thread accessing this class.
The question is here , is there actually any synchronization at all that i have to do in order to leave the class at a consistent state after each method call?
Why i am asking this, i am curious if the elements in the array are cached in some way by the JVM or is this only applied to class members? If the members inside the array could be cached, is there a way to define them as volatile ?
Thanks
Roman
As an alternative to synchronizing those methods, you could also consider replacing the int[] with an array of AtomicIntegers. This would have the benefit/downside (depending on your application) of allowing concurrent access to different elements in your list.
You will definitely have to use some sort of synchronization (either on your class or the underlying data structure) in order to ensure the data is left in a consistent state after method calls. Consider the following situations, with two Threads A and B, with the integer array initially containing all zero values.
Thread A calls increment(0). The post-increment operation is not atomic; you can actually consider it to be broken down into at least three steps:
Read the current value; Add one to the current value; Store the value.
Thread B also calls increment(0). If this happens soon after Thread A has done the same, they will both read the same initial value for the element at index 0 of the array.
At this point, both Thread A and B have read a value of '0' for the element they want to increment. Both will increment the value to '1' and store it back in the first element of the array.
Thus, only the work of the Thread that last writes to the array is seen.
The situation is similar if you had a decrement() method. If both increment() and decrement() were called at near-simultaneous times by two separate Threads, there is no telling what the outcome would be. The value would either be incremented by one or decremented by one, and the operations would not "cancel" each other out.
EDIT: Update to reflect Roman's (OP) comment below
Sorry, I mis-read the post. I think I understand your question, which is along the lines of:
"If I declare an array as volatile,
does that mean access to its elements
are treated as volatile as well?"
The quick answer is No: Please see this article for more information; the information in the previous answers here is also correct.
Yes, the VM is allowed to cache inside the thread any field that is not synchronized or voltile. To prevent this, you could mark the fields as volatile, but they still wouldn't be thread safe, since ++ is not an atomic operation. Add the synchronized keyword to the methods, and you're safe.
Related
I've searched for this question and I only found answer for primitive type arrays.
Let's say I have a class called MyClass and I want to have an array of its objects in my another class.
class AnotherClass {
[modifiers(?)] MyClass myObjects;
void initFunction( ... ) {
// some code
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
return myObjects[index];
}
}
I read somewhere that declaring an array volatile does not give volatile access to its fields, but giving a new value of the array is safe.
So, if I understand it well, if I give my array a volatile modifier in my example code, it would be (kinda?) safe. In case of I never change its values by the [] operator.
Or am I wrong? And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
AtomicXYZArray is not an option because it is only good for a primitive type arrays. AtomicIntegerArray uses native code for get() and set(), so it didn't help me.
Edit 1:
Collections.synchronizedList(...) can be a good alternative I think, but now I'm looking for arrays.
Edit 2: initFunction() is called from a different class.
AtomicReferenceArray seems to be a good answer. I didn't know about it, up to now. (I'm still interested in that my example code would work with volatile modifier (before the array) with only this two function called from somewhere else.)
This is my first question. I hope I managed to reach the formal requirements. Thanks.
Yes you are correct when you say that the volatile word will not fulfill your case, as it will protect the reference to the array and not its elements.
If you want both, Collections.synchronizedList(...) or synchronized collections is the easiest way to go.
Using modifiers like you are inclining to do is not the way to do this, as you will not affect the elements.
If you really, must, use and array like this one: new MyClass[]{ ... };
Then AnotherClass is the one that needs to take responsibility for its safety, you are probably looking for lower level synchronization here: synchronized key word and locks.
The synchonized key word is the easier and yuo may create blocks and method that lock in a object, or in the class instance by default.
In higher levels you can use Streams to perform a job for you. But in the end, I would suggest you use a synchronized version of an arraylist if you are already using arrays. and a volatile reference to it, if necessary. If you do not update the reference to your array after your class is created, you don't need volatile and you better make it final, if possible.
For your data to be thread-safe you want to ensure that there are no simultaneous:
write/write operations
read/write operations
by threads to the same object. This is known as the readers/writers problem. Note that it is perfectly fine for two threads to simultaneously read data at the same time from the same object.
You can enforce the above properties to a satisfiable level in normal circumstances by using the synchronized modifier (which acts as a lock on objects) and atomic constructs (which performs operations "instantaneously") in methods and for members. This essentially ensures that no two threads can access the same resource at the same time in a way that would lead to bad interleaving.
if I give my array a volatile modifier in my example code, it would be (kinda?) safe.
The volatile keyword will place the array reference in main memory and ensure that no thread can cache a local copy of it within their private memory, which helps with thread visibility although it won't guarantee thread safety by itself. Also the use of volatile should be used sparsely unless by experienced programmers as it may cause unintended effects on the program.
And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
Create synchronized mutator methods for the mutable members of your class if they need to be changed or use the methods provided by atomic objects within your classes. This would be the simplest approach to changing your data without causing any unintended side-effects (for example, removing the object from the array whilst a thread is accessing the data in the object being removed).
Volatile does actually work in this case with one caveat: all the operations on MyClass may only read values.
Compared to all what you might read about what volatile does, it has one purpose in the JMM: creating a happens-before relationship. It only affects two kinds of operations:
volatile read (eg. accessing the field)
volatile write (eg. assignment to the field)
That's it. A happens-before relationship, straight from the JLS §17.4.5:
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
These relationships are transitive. Taken all together this implies some important points: All actions taken on a single thread happened-before that thread's volatile write to that field (third point above). A volatile write of a field happens-before a read of that field (point two). So any other thread that reads the volatile field would see all the updates, including all referred to objects like array elements in this case, as visible (first point). Importantly, they are only guaranteed to see the updates visible when the field was written. This means that if you fully construct an object, and then assign it to a volatile field and then never mutate it or any of the objects it refers to, it will be never be in an inconsistent state. This is safe taken with the caveat above:
class AnotherClass {
private volatile MyClass[] myObjects = null;
void initFunction( ... ) {
// Using a volatile write with a fully constructed object.
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
return null; // or something else
}
else {
// should probably check length too
return local[index];
}
}
}
I'm assuming you're only calling initFunction once. Even if you did call it more than once you would just clobber the values there, it wouldn't ever be in an inconsistent state.
You're also correct that updating this structure is not quite straightforward because you aren't allowed to mutate the array. Copy and replace, as you stated is common. Assuming that only one thread will be updating the values you can simply grab a reference to the current array, copy the values into a new array, and then re-assign the newly constructed value back to the volatile reference. Example:
private void add(MyClass newClass) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
// volatile write
myObjects = new MyClass[] { newClass };
}
else {
MyClass[] withUpdates = new MyClass[local.length + 1];
// System.arrayCopy
withUpdates[local.length] = newClass;
// volatile write
myObjects = withUpdates;
}
}
If you're going to have more than one thread updating then you're going to run into issues where you lose additions to the array as two threads could copy and old array, create a new array with their new element and then the last write would win. In that case you need to either use more synchronization or AtomicReferenceFieldUpdater
As Integer class is also immutable class and we know that immutable class is thread-safe what is the need of Atomic Integer.
I am confused .
Is it the reason that reads and write of immutable objects need not be atomic whereas read and write of atomic integer is atomic .
That means atomic classes are also thread-safe.
AtomicInteger is used in multithreaded environments when you need to make sure that only one thread can update an int variable. The advantage is that no external synchronization is requried since the operations which modify it's value are executed in a thread-safe way.
Consider the followind code:
private int count;
public int updateCounter() {
return ++count;
}
If multiple threads would call the updateCounter method, it's possible that some of them would receive the same value. The reason it that the ++count operation isn't atomical since isn't only one operation, but made from three operations: read count, add 1 to it's value and write it back to it. Multiple calling threads could see the variable as unmodified to it's latest value.
The above code should be replaced with this:
private AtomicInteger count = new AtomicInteger(0);
public int updateCounter() {
return count.incrementAndGet();
}
The incrementAndGet method is guaranteed to atomically increment the stored value and return it's value without using any external synchonization.
If your value never changes, you don't have to use the AtomicInteger, it's enought to use int.
AtomicInteger is thread safe (in fact, all classes from java.util.concurrent.atomic package are thread safe), while normal integers are NOT threadsafe.
You would require 'synchronized' & 'volatile' keywords, when you are using an 'Integer' variable in multi-threaded environment (to make it thread safe) where as with atomic integers you don't need 'synchronized' & 'volatile' keywords as atomic integers take care of thread safety.
Also, I would recommend the below helpful tutorial on the same subject:
http://tutorials.jenkov.com/java-concurrency/compare-and-swap.html
Please refer below oracle doc for more information on 'atomic' package:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html
While immutable objects are thread-safe by definition, mutable objects can be thread safe too.
That is precisely the purpose of the Atomic... classes (AtomicInteger, AtomicBoolean, and so on).
The various ...get... and ...set... methods allow thread-safe access and mutation of the object.
Not surprisingly, the class is declared in the java.util.concurrent package.
You only have to browse the API for the java.util.concurrent.atomic package:
A small toolkit of classes that support lock-free thread-safe programming on single variables.
Consider a variable
int myInt = 3;
AtomicInteger relates to myInt.
Integer relates to 3.
in other words, your variable is mutable and can change it's value. While the value 3 is an integer literal, a constant, an immutable expression.
Integers are object representations of literals and are therefore immutable, you can basically only read them.
AtomicIntegers are containers for those values. You can read and set them. Same as asigning a value to variable. But different to changing the value of int variable, operations on an AtomicInteger are atomic.
For example this is not atomic
if(myInt == 3) {
myInt++;
}
This is atomic
AtomicInteger myInt = new AtomicInteger(3);
//atomic
myInt.compareAndSet(3, 4);
I think the main difference between AtomicInteger and normal immutable Integer will come into the picture, once we understand why even immutable Integers are not thread-safe.
Let's see with an example.
Suppose, we have a value of int count = 5, which is being shared by two different threads named T1 and T2 with both reading and writing at the same time.
We know that, if there is any value being reassigned into an immutable object, the old object remains at the pool and the new one takes over.
Now, when T1 and T2 are updating their values into count variable, Java might take this value into some cache and will do the set operations there and we won't know when JVM will write the updated value into main memory, so there might be a possibility that one of the threads may be updating the value into a totally stale value.
This brings us to the volatile keyword.
Volatile - This keyword ensures that all the I/O operations on any variable will take place on the main memory so that all the threads are working with the most updated value.
Consider, if 1 Thread is writing and all other threads are reading then, volatile will solve our problem, but if all the threads are reading and writing on the same variable at the same time, then we need synchronizing to ensure thread-safety.
Volatile keyword does not ensure thread-safety.
Now, coming to why AtomicIntegers. Even if are using syncrhonized keyword to ensure thread-safety, the actual update operation of count variable will be a three step process.
get updated value of count variable
increment the value by 1
set the value to count variable
This is why it takes a slightly longer time to update any value for normal Integers to update values once the thread safety is taken into consideration.
**AtomicIntegers solve this problem furthermore of thread safety and also faster updates by an optimized lock-free algorithm called Compare-And-Swap (CAS method).
They perform all the update operations atomically as a single-step process. **
I came across the example below of a Java class which was claimed to be thread-safe. Could anyone please explain how it could be thread-safe? I can clearly see that the last method in the class is not being guarded against concurrent access of any reader thread. Or, am I missing something here?
public class Account {
private Lock lock = new ReentrantLock();
private int value = 0;
public void increment() {
lock.lock();
value++;
lock.unlock();
}
public void decrement() {
lock.lock();
value--;
lock.unlock();
}
public int getValue() {
return value;
}
}
The code is not thread-safe.
Suppose that one thread calls decrement and then a second thread calls getValue. What happens?
The problem is that there is no "happens before" relationship between the decrement and the getValue. That means that there is no guarantee, that the getValue call will see the results of the decrement. Indeed, the getValue could "miss" the results of an indefinite sequence of increment and decrement calls.
Actually, unless we see the code that uses the Account class, the question of thread-safety is ill-defined. The conventional notion of thread-safety1 of a program is about whether the code behaves correctly irrespective of thread-related non-determinacy. In this case, we don't have a specification of what "correct" behaviour is, or indeed an executable program to test or examine.
But my reading of the code2 is that there is an implied API requirement / correctness criterion that getValue returns the current value of the account. That cannot be guaranteed if there are multiple threads, therefore the class is not thread-safe.
Related links:
http://blogs.msdn.com/b/ericlippert/archive/2009/10/19/what-is-this-thing-you-call-thread-safe.aspx
1 - The Concurrency in Practice quote in #CKing's answer is also appealing to a notion of "correctness" by mentioning "invalid state" in the definition. However, the JLS sections on the memory model don't specify thread-safety. Instead, they talk about "well-formed executions".
2 - This reading is supported by the OP's comment below. However, if you don't accept that this requirement is real (e.g. because it is not stated explicitly), then the flip-side is that behaviour of the "account" abstraction depends on how code outside of the Account class ... which makes this a "leaky abstraction".
This is not thread safe purely due to the fact there is no guarantees about how the compiler can re-order. Since value is not volatile here is your classic example:
while(account.getValue() != 0){
}
This can be hoisted to look like
while(true){
if(account.getValue() != 0){
} else {
break;
}
}
I can imagine there are other permutations of compiler fun which can cause this to subtly fail. But accessing this getValue via multiple threads can result in failure.
There are several distinct issues here:
Q: If multiple threads make overlapped calls to increment() and decrement(), and then they stop, and then enough time passes with no threads calling increment() or decrement(), will getValue() return the correct number?
A: Yes. The locking in the increment and decrement methods insures that each increment and decrement operation will happen atomically. They can not interfere with one another.
Q: How long is enough time?
A: That's hard to say. The Java language specification does not guarantee that a thread calling getValue() will ever see the latest value written by some other thread because getValue() accesses the value without any synchronization at all.
If you change getValue() to lock and unlock the same lock object or if you declare count to be volatile, then zero amount of time would be enough.
Q: Can a call to getValue() return an invalid value?
A: No, It can only ever return the initial value, or the result of complete increment() call or the result of a complete decrement() operation.
But, the reason for this has nothing to do with the lock. The lock does not prevent any thread from calling getValue() while some other thread is in the middle of incrementing or decrementing the value.
The thing that prevents getValue() from returning a completely invalid value is that value is an int, and the JLS guarantees that updates and reads of int variables are always atomic.
The short answer :
By definition,Account is a thread-safe class even though the geValue method is not guarded
The long answer
From Java Concurrency in practice a class is said to be thread safe when :
No set of operations performed sequentially or concurrently on
instances of a thread-safe class can cause an instance to be in an
invalid state.
Since the the getValue method will not result in the Account class being in an invalid state at any given time, your class is said to be thread safe.
The documentation for Collections#synchronizedCollection resonates this sentiment :
Returns a synchronized (thread-safe) collection backed by the
specified collection. In order to guarantee serial access, it is
critical that all access to the backing collection is accomplished
through the returned collection. It is imperative that the user
manually synchronize on the returned collection when iterating over
it:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized (c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
Notice how the documentation says that the collection (which is an object of an inner class named SynchronizedCollection in the Collections class) is thread-safe and yet asks the client code to guard the collection while iterating over it. Infact, the iterator method in SynchronizedCollection is not synchronized. This is very similar to your example where Account is thread-safe but client code still needs to ensure atomicity when calling getValue.
It's completely thread safe.
Nobody can simultaneously increment and decrement value so you won't lose or gain a count in error.
The fact that getValue() will return different values through time is something that will happen anyway: simultaneity is not relevant.
You do not have to protect getValue. Accessing it from multiple threads at the same time does not lead to any negative effects. The object state cannot become invalid no matter when or from how many threads you call this methid (because it does not change).
Having said that - you can write a non-thread-safe code that uses this class.
For example something like
if (acc.getValue()>0) acc.decrement();
is potentially dangerous because it can lead to race conditions. Why?
Let's say you have a business rule "never decrement below 0", your current value is 1, and there are two threads executing this code. There's a chance that they'll do it in the following order:
Thread 1 checks that acc.getValue is >0. Yes!
Thread 2 that acc.getValue is >0. Yes!
Thread 1 calls decrement. value is 0
Thread 2 calls decrement. value is now -1
What happened? Each function made sure it was not going below zero, but together they managed to do that. This is called race condition.
To avoid this you must not protect the elementary operations, but rather any pieces of code that must be executed uninterrupted.
So, this class is thread-safe but only for very limited use.
I'm considering implementations of multi-threaded sorting with use of one volatile array. Let's say I have an array of length N, and M threads that will sort sub-ranges of the array. These sub-ranges are disjoint. Then, in the main thread I will merge partially sorted array.
Example code:
final int N = ....
volatile MyClass[] array = new MyClass[N];
//... fill array with values
void sort(){
MyThread[] workers = new MyThread[M];
int len = N/M; //length of the sub-range
for(int i=0;i<M;++i){
workers[i] = new MyThread(i*len, (i+1)*len);
workers[i].start();
}
for(int i=0;i<M;++i)workers.join();
//now synchronization in memory using "happens before"
//will it work?
array = array;
//...merge sorted sub-ranges into one sorted array
}
private class MyThread extends Thread{
final int from;
final int to;
public MyThread(int from, int to){ ..... }
public void run(){
//...something like: quicksort(array, from, to);
//...without synchronization, ranges <from, to> are exclusive
}
I don't need synchronization in memory while running threads because the array sub-ranges are disjoint. I want to do the synchronization once after finished threads. Will the updated version of the array (seen in the main thread) contain all the changes made in the working threads?
If this solution is valid, is it effective for large tables?
Thank you in advance for your help.
EDIT:
I ran the tests. I received correct results regardless of the use of volatile keyword. But the time of execution is a few times (about M-times) longer for a volatile array.
Not an answer, just some thoughts:
There is no such thing as a volatile array. Only fields can be volatile. You have declared a volatile field named "array", and initialized it with a reference to an array object.
It looks like you are expecting the statement, array = array to act as a full memory barrier. I don't know if it will or if it won't, or if the answer depends on what compiler, what JVM and, what operating system you use. Maybe somebody more expert than I can answer.
I don't like it for two reasons though: One is, it looks like a no-op. It's an invitation for some other programmer who doesn't understand what you're trying to do to come along and "clean up" the code by deleting it. A tricky statement like that should be wrapped in a function with a name that explains the trick.
Two is, the function of that statement has nothing to do with the array that the field references. It would be better to use a volatile int field or a volatile somethingelse field that obviously has no connection to the array, thereby calling attention to the fact that what matters is something other than the value of the field.
Update: According to Brian Goetz, that one statement won't do what you want. What you need is for each worker thread to update the volatile field after finishing its work, and then you need the master thread to read the volatile field before it tries to look at the worker's results.
On the other hand... Do you need the barrier at all? Isn't it enough that the worker threads all terminated and the master join()ed them? Again, maybe somebody more expert than myself can answer.
What you're doing looks very messy and as suggested, probably won't work as expected.
If you use Java8 then perhaps the parallel sort is for you. Otherwise --
Sorting a single array in place, in parallel is a horror show. Sorting in parallel is rather simple if you create a new array of sorted elements.
Create objects of the the sub-array (you'll need to do this eventually). Pass each object to a thread. Let the threads sort their objects in parallel. When all sorts are done, merge the sorted objects into a new array.
That means there is more memory required, but its rather easy and you don't need to worry about volatile or synchronization.
I am new to multi-threading in Java and don't quite understand what's going on.
From online tutorials and lecture notes, I know that the synchronized block, which must be applied to a non-null object, ensures that only one thread can execute that block of code. Since an array is an object in Java, synchronize can be applied to it. Further, if the array stores objects, I should be able to synchronize each element of the array too.
My program has several threads updated an array of numbers, hence I created an array of Long objects:
synchronized (grid[arrayIndex]){
grid[arrayIndex] += a.getNumber();
}
This code sits inside the run() method of the thread class which I have extended. The array, grid, is shared by all of my threads. However, this does not return the correct results while running the same program on one thread does.
This will not work. It is important to realize that grid[arrayIndex] += ... is actually replacing the element in the grid with a new object. This means that you are synchronizing on an object in the array and then immediately replacing the object with another in the array. This will cause other threads to lock on a different object so they won't block. You must lock on a constant object.
You can instead lock on the entire array object, if it is never replaced with another array object:
synchronized (grid) {
// this changes the object to another Long so can't be used to lock
grid[arrayIndex] += a.getNumber();
}
This is one of the reasons why it is a good pattern to lock on a final object. See this answer with more details:
Why is it not a good practice to synchronize on Boolean?
Another option would be to use an array of AtomicLong objects, and use their addAndGet() or getAndAdd() method. You wouldn't need synchronization to increment your objects, and multiple objects could be incremented concurrently.
The java class Long is immutable, you cannot change its value. So when you perform an action:
grid[arrayIndex] += a.getNumber();
it is not changing the value of grid[arrayIndex], which you are locking on, but is actually creating a new Long object and setting its value to the old value plus a.getNumber. So you will end up with different threads synchronizing on different objects, which leads to the results you are seeing
The synchronized block you have here is no good. When you synchronize on the array element, which is presumably a number, you're synchronizing only on that object. When you reassign the element of the array to a different object than the one you started with, the synchronization is no longer on the correct object and other threads will be able to access that index.
One of these two options would be more correct:
private final int[] grid = new int[10];
synchronized (grid) {
grid[arrayIndex] += a.getNumber();
}
If grid can't be final:
private final Object MUTEX = new Object();
synchronized (MUTEX) {
grid[arrayIndex] += a.getNumber();
}
If you use the second option and grid is not final, any assignment to grid should also be synchronized.
synchronized (MUTEX) {
grid = new int[20];
}
Always synchronize on something final, always synchronize on both access and modification, and once you have that down, you can start looking into other locking mechanisms, such as Lock, ReadWriteLock, and Semaphore. These can provide more complex locking mechanisms than synchronization that is better for scenarios where Java's default synchronization alone isn't enough, such as locking data in a high-throughput system (read/write locking) or locking in resource pools (counting semaphores).