I have a question regarding synchronization of code that is executed by several threads:
As far as I know each thread has its own stack, hence, non-static variables exist in different locations in the memory for each thread (for X threads there are X stacks that include all non-static variables).
So why there's a need to synchronize anything?
I mean, if the code that the threads execute includes some class variable v1, then each thread has its own "instance" of v1 (different memory address), and no other thread can "touch" it... isn't it so?
non-static variables exist in different locations in the memory for each thread
This is not true, so the answer to
if the code that the threads execute includes some class variable v1, then each thread has its own "instance" of v1 (different memory address), and no other thread can "touch" it... isn't it so
is no. Threads can touch object instances allocated and modified by other threads and the burden is on the programmer to ensure this does not affect program correctness.
Class member variables exist in a single place in memory per-class instance, not per thread. It is true that between memory barriers (think the start { and end } of synchronized), that a thread may have a cache of the state of an object, but that is not the same as the language mandating per-thread storage. The "memory for each thread" is its stack which does not contain object members* -- only references to objects.
The best way to think of it is that there is one location on the heap for each object, but that there might be multiple reads&|writes involving that memory location happening at the same time.
I can see how you would come to the conclusions you did if you heard that threads allocate objects in different parts of the heap. Some JVMs have an optimization whereby they do thread-local allocation but that does not prevent other threads from accessing those objects.
Thread-local allocation
If the allocator were truly implemented as shown in Listing 1, the shared heapStart field would quickly become a significant concurrency bottleneck, as every allocation would involve acquiring the lock that guards this field. To avoid this problem, most JVMs use thread-local allocation blocks, where each thread allocates a larger chunk of memory from the heap and services small allocation requests sequentially out of that thread-local block. As a result, the number of times a thread has to acquire the shared heap lock is greatly reduced, improving concurrency.
* - it's possible that JVM optimizations allow some objects to be allocated on the stack.
The stack is thread-safe whereas the heap is not thread-safe unless you synchronized the code. The stack contains local variables and method parameters (primitive and reference) whereas the heap contains objects.
Stack yes (think of a call stack, local variables), but class variables live in the heap and you have to synchronize access to them:)
Only primitive types, such as int are guaranteed to be allocated on the stack. Objects and arrays are all typically stored in the heap unless Escape Analysis determines the scope of the object is 'restricted to the scope of the procedure'.
On a same object instance, if your method is not synchronized, there is no guarantee that the same code is not executed twice in different threads --> havoc! Which is the correct value?
At the minimum, you want to declare methods accessing a variable as synchronized. If you want more fine-grained control, you can use, for instance, a ReentrantReadWriteLock.
Declaring a method synchronized synchronizes on the object instance, so this is safe.
Local variables, primitives and references are implicitly thread-local. However, objects referenced can be shared and when a thread can modify a shared object it is highly likely you will need synchronised, a Lock or some other strategy to ensure thread safety.
Some key points which can help clarifying your doubts -
Objects are always allocated on heap.
Class level variables are shared across threads (thread of same object )
Local variables are always thread safe (if not exposed to outside world in non thread safe manner)
"non-static variables exist in different locations" could not possibly be correct. In Java, you never directly get to know anything of "the stack". All of your class variables, static or instance, come from the heap. As a java developer, however, you don't really care about that.
The only time you don't care about thread-safety is when your classes are immutable (don't change after construction) OR you aren't ever doing anything in threads. If your classes don't fall into these two categories, you need to think about making them thread-safe.
The more immutability you can get into your designs, the easier the Threading issues are to reason about and overcome.
Nrj has got the right idea.
Related
What is preferable for performance? Assume no to little contention
mutable class with volatile fields and updating them one by one
immutable class with final fields, an update cycle avoids the
multi-field update and instead re-creates the class once
Volatiles require memory barriers on every write, I suppose the same is true for final fields? Meaning a single memory barrier upon object construction?
Update for clarification:
I feel the question is valuable on its own and answerable generically, taking into account the java memory model and current gen hardware. If you must assume specifics:
the object is of course accessed from multiple threads, otherwise this exercise would be pointless
a single object is long-lived, as in multiple hours
there are hundreds to thousands of those objects, with hundreds to thousands of update events per second
final is a hint to the compiler that the field value cannot change. Any write attempts are caught at compile time. Reading a final value does not use a memory barrier. You cannot write to a final variable, so a memory barrier is meaningless.
Using the hint, the compiler (or the JIT) may replace a memory reference to a final value with a constant. So in terms of performance, final does not introduce any additional overhead.
If the garbage-collector flushes every thread's cache between the last time an old object is accessed and the space being made available for a new object, and if no cache line contains data from multiple objects, then it would on most platforms be naturally impossible for a newly-constructed object to get loaded into any thread's cache before a reference to that object is stored in a location which is accessible to that thread, even in the absence of any read barriers (beyond the aforementioned once-per-GC-cycle system-wide barrier). Further, if a compiler can tell that writes occur to multiple fields of an object will occur without any intervening writes to any other object whose reference may have been exposed, it could omit write barriers for all but the last.
The only time using final fields would be more expensive than volatile would be if it necessitated the creation of more objects to handle changes which could have been done "in place" with volatile fields. Since many factors can affect object-creation cost, the only reliable way to judge which approach is more efficient under a particular set of circumstances on a particular system would often benchmark both.
I want to know how does the JVM guarantee the visibility of member variable modifications in the referenced object when using synchronized.
I know synchronized and volatile will provide visibility for variable modifications.
class Test{
public int a=0;
public void modify(){
a+=1;
}
}
//Example:
// Thread A:
volatile Test test=new Test();
synchronized(locker){
test.modify();
}
// then thread B:
synchronized(locker){
test.modify();
}
// Now, I think test.a==2 is true. Is it ok? How JVM implements it?
// I know the memory barrier, does it flush all cache to main storage?
Thread A call modify in a sychronized block first, and then pass the object to thread B (Write the reference to a volatile variable.).
Then thread B call modify again (in synchronized).
Is there any guarantee for a==2? And how is the JVM implemented?
Visibility between threads is enforced with Memory Barriers/Fences. In case of synchronized block JVM will insert a memory barrier after the execution of the block completes.
JVM implements memory barriers with CPU instruction e.g. a store barrier is done with sfence and load barrier is done with lfence instruction on x86. There is also mfence and possibly other instructions which can be specific to CPU architecture.
For your (still incomplete!) example, if we can assume the following:
The code in thread A initializing test is guaranteed to run before thread B uses it.
The locker variable contains a reference to the same object for threads A & B.
then we can prove that a == 2 will be true at the point you indicate. If precondition 1 is not guaranteed, then thread B may get an NPE. If precondition 2 is not guaranteed (i.e. threads A and B may synchronize on different objects) then there is not a proper happens-before relationship to ensure that thread B sees the result of thread A's actions on a.
(#NathanHughes commented that the volatile is unnecessary. I wouldn't necessarily agree with that. It depends on details of your example that you still haven't show us.)
How JVM implements it?
The actual implementation is Java platform and (in theory) version specific. The JVM spec Memory Model places constraints on how a program that obeys "the rules" will behave. It is entirely implementation specific how that actually happens.
I know the memory barrier, does it flush all cache to main storage?
That is implementation specific too. There are different kinds of memory barrier that work in different ways. The JIT compiler will emit native code that uses the appropriate instructions to meet the guarantees required by the JLS. If there is a way to do this without doing a full cache flush then the implementation may do that.
(There is a JVM command line option to tell the JIT compiler to output the native code. If you really want to know what is happening under the hood, that is a good place to start looking.)
But if you are trying to understand / analyze your application's thread-safety, you should be doing it in terms of the Java Memory Model. Also, use higher level concurrency abstractions that allow you to avoid the lower level pitfalls.
I used to believe that any variable that is shared between two threads, can be cached thread-locally and should be declared as volatile. But that belief has been challenged recently by a teammate. We are trying to figure out whether volatile is required in the following case or not.
class Class1
{
void Method1()
{
Worker worker = new Worker();
worker.start();
...
System.out.println(worker.value); // want to poll value at this instant
...
}
class Worker extends Thread
{
int value = 0; // Should this be declared as a volatile?
public void run()
{
...
value = 1; // this is the only piece of code that updates value
...
}
}
}
Now my contention is that, it is possible that the Worker (child) thread could have cached the variable "value" of the Worker object within the thread and updated just it's copy while setting the value to 1. In such a case, main thread may not see the updated value.
But my teammate believes that since the access to "value" is happening through an object (worker), therefore for both the threads to see different values, it could only be possible if both the threads were maintaining separate copies of "worker" object itself (which would further mean that creation of a thread involves creating a deep copy of all the shared objects).
Now I know that that can't be true, for it would be highly inefficient for each thread to maintain wholly different copies of all shared objects. So hence, I am in serious doubt. Does doing a "worker.value" in the main thread reference a different memory location than doing a "this.value" in the child thread? Will the child (Worker) thread cache "value"?
Regards.
Now my contention is that, it is possible that the Worker (child) thread could have cached the variable "value" of the Worker object thread-locally and updated just it's copy while setting the value to 1. In such a case, main thread may not see the updated value.
You are correct. Even though you are both dealing with the same Worker instance, there is no guarantee that the cached memory version of the Worker's fields have been synchronized between the various different thread memory caches.
The value field must be marked as volatile to guarantee that other threads will see the value = 1; update to the value field.
But my teammate believes that since the access to "value" is happening through an object (worker), therefore for both the threads to see different values, it could only be possible if both the threads were maintaining separate copies of "worker" object itself...
No, this is not correct. The tricky part about thread memory revolves around processor memory caches. Without a memory barrier that is imposed by volatile, a process is completely free to cache memory. So even though both threads would be working with the same instance of the Worker, they may have a locally cached copy of the memory associated with Worker.
Thread architectures get much of their speed because they are working with separate high-speed processor-local memory as opposed to always referencing central storage.
But my teammate believes that since the access to "value" is happening through an object (worker), therefore for both the threads to see different values, it could only be possible if both the threads were maintaining separate copies of "worker" object itself (which would further mean that creation of a thread involves creating a deep copy of all the shared objects).
What your coworker does not realize is that values of instance variables (any variables for that matter) can be cached temporarily in machine registers, or in the processor's first or second-level memory caches. The Java Language Specification explicitly says that two threads won't necessarily see the same values for the same variable unless they have taken the appropriate steps.
There is a whole section of the JLS that deals with this issue: JLS 17.4. I recommend that both you and your co-worker read this and 17.5 and 17.6 as well if you are going to debate how Java behaves in this area. Or you could read the last chapter of "Java Concurrency in Practice" by Brian Goetz et al which is rather more easy to read than the JLS.
I'd recommend that you and your co-worker don't rely on your intuition about threading ought to work. Read the specs. Some aspects of thread behavior are not intuitive ... though there are good reasons way they are the way they are,
is it enough to use only local variables and no instance variables. Thus only using memory on the stack (per thread).
But what happens when you create a new MyObject that is local to the method. Doesn't the new object get created on the heap ? Is it thread safe becuase the reference to it is local (thread safe) ?
It is thread safe because if it is only referenced by variables in that particular method (it is, as you said, a local variable), then no other threads can possibly have a reference to the object, and therefore cannot change it.
Imagine you and I are pirates (threads). You go and bury your booty (the object) on an island (the heap), keeping a map to it (the reference). I happen to use the same island for burying my booty, but unless you give me your map, or I go digging all over the island (which isn't allowed on the island of Java), I can't mess with your stash.
Your new MyObject is thread-safe because each call to the method will create its own local instance on the heap. None of the calls refer to a common method; if there are N calls, that means N instances of MyObject on the heap. When the method exits, each instance is eligible for GC as long as you don't return it to the caller.
Well, let me ask you a question: does limiting your method to local variables mean your method can't share a resource with another thread? If not, then obviously this isn't sufficient for thread safety in general.
If you're worried about whether another thread can modify an object you created in another thread, then the only thing you need to worry about is never leaking a reference to that object out of the thread. If you achieve that, your object will be in the heap, but no other thread will be able to reference it so it doesn't matter.
Edit
Regarding my first statement, here's a method with no instance variables:
public void methodA() {
File f = new File("/tmp/file");
//...
}
This doesn't mean there can't be a shared resource between two threads :-).
Threre's no way to other threads to access such object reference. But if that object is not thread-safe, then the overall thread-safety is compromised.
Consider for example that MyObject is a HashMap.
The argument that if it's in the heap, it's not thread-safe, is not valid. The heap is not accessible via pointer arithmetic, so it doesn't affect where the object is actually stored (besides ThreadLocal's).
I've been mulling this over & reading but can find an absolute authoritative answer.
I have several deep data structures made up of objects containing ArrayLists, Strings & primitive values. I can guarantee that the data in these structures will not change (no thread will ever make structural changes to lists, change references, change primitives).
I'm wondering if reading data in these structures is thread safe; i.e. is it safe to recursively read variables from the objects, iterate the ArrayLists etc. to extract information from the structures in multiple threads without synchronization?
The only reason why it wouldn't be safe is if one thread were writing to a field while another thread was simultaneously reading from it. No race condition exists if the data is not changing. Making objects immutable is one way of guaranteeing that they are thread safe. Start by reading this article from IBM.
The members of an ArrayList aren't protected by any memory barriers, so there is no guarantee that changes to them are visible between threads. This applies even when the only "change" that is ever made to the list is its construction.
Any data that is shared between thread needs a "memory barrier" to ensure its visibility. There are several ways to accomplish this.
First, any member that is declared final and initialized in a constructor is visible to any thread after the constructor completes.
Changes to any member that is declared volatile are visible to all threads. In effect, the write is "flushed" from any cache to main memory, where it can be seen by any thread that accesses main memory.
Now it gets a bit trickier. Any writes made by a thread before that thread writes to a volatile variable are also flushed. Likewise, when a thread reads a volatile variable, its cache is cleared, and subsequent reads may repopulate it from main memory.
Finally, a synchronized block is like a volatile read and write, with the added quality of atomicity. When the monitor is acquired, the thread's read cache is cleared. When the monitor is released, all writes are flushed to main memory.
One way to make this work is to have the thread that is populating your shared data structure assign the result to a volatile variable (or an AtomicReference, or other suitable java.util.concurrent object). When other threads access that variable, not only are they guaranteed to get the most recent value for that variable, but also any changes made to the data structure by the thread before it assigned the value to the variable.
If the data is never modified after it's created, then you should be fine and reads will be thread safe.
To be on the safe side, you could make all of the data members "final" and make all of the accessing functions reentrant where possible; this ensures thread safety and can help keep your code thread safe if you change it in the future.
In general, making as many members "final" as possible helps reduce the introduction of bugs, so many people advocate this as a Java best practice.
Just as an addendum to everyone else's answers: if you're sure you need to synchronize your array lists, you can call Collections.synchronizedList(myList) which will return you a thread safe implementation.
I cannot see how reading from ArrayLists, Strings and primitive values using multiple threads should be any problem.
As long as you are only reading, no synchronization should be necessary. For Strings and primitives it is certainly safe as they are immutable. For ArrayLists it should be safe, but I do not have it on authority.
Do NOT use java.util.Vector, use java.util.Collections.unmodifiableXXX() wrapper if they truly are unmodifiable, this will guarantee they won't change, and will enforce that contract. If they are going to be modified, then use java.util.Collections.syncronizedXXX(). But that only guarantees internal thread safety. Making the variables final will also help the compiler/JIT with optimizations.