I've been mulling this over & reading but can find an absolute authoritative answer.
I have several deep data structures made up of objects containing ArrayLists, Strings & primitive values. I can guarantee that the data in these structures will not change (no thread will ever make structural changes to lists, change references, change primitives).
I'm wondering if reading data in these structures is thread safe; i.e. is it safe to recursively read variables from the objects, iterate the ArrayLists etc. to extract information from the structures in multiple threads without synchronization?
The only reason why it wouldn't be safe is if one thread were writing to a field while another thread was simultaneously reading from it. No race condition exists if the data is not changing. Making objects immutable is one way of guaranteeing that they are thread safe. Start by reading this article from IBM.
The members of an ArrayList aren't protected by any memory barriers, so there is no guarantee that changes to them are visible between threads. This applies even when the only "change" that is ever made to the list is its construction.
Any data that is shared between thread needs a "memory barrier" to ensure its visibility. There are several ways to accomplish this.
First, any member that is declared final and initialized in a constructor is visible to any thread after the constructor completes.
Changes to any member that is declared volatile are visible to all threads. In effect, the write is "flushed" from any cache to main memory, where it can be seen by any thread that accesses main memory.
Now it gets a bit trickier. Any writes made by a thread before that thread writes to a volatile variable are also flushed. Likewise, when a thread reads a volatile variable, its cache is cleared, and subsequent reads may repopulate it from main memory.
Finally, a synchronized block is like a volatile read and write, with the added quality of atomicity. When the monitor is acquired, the thread's read cache is cleared. When the monitor is released, all writes are flushed to main memory.
One way to make this work is to have the thread that is populating your shared data structure assign the result to a volatile variable (or an AtomicReference, or other suitable java.util.concurrent object). When other threads access that variable, not only are they guaranteed to get the most recent value for that variable, but also any changes made to the data structure by the thread before it assigned the value to the variable.
If the data is never modified after it's created, then you should be fine and reads will be thread safe.
To be on the safe side, you could make all of the data members "final" and make all of the accessing functions reentrant where possible; this ensures thread safety and can help keep your code thread safe if you change it in the future.
In general, making as many members "final" as possible helps reduce the introduction of bugs, so many people advocate this as a Java best practice.
Just as an addendum to everyone else's answers: if you're sure you need to synchronize your array lists, you can call Collections.synchronizedList(myList) which will return you a thread safe implementation.
I cannot see how reading from ArrayLists, Strings and primitive values using multiple threads should be any problem.
As long as you are only reading, no synchronization should be necessary. For Strings and primitives it is certainly safe as they are immutable. For ArrayLists it should be safe, but I do not have it on authority.
Do NOT use java.util.Vector, use java.util.Collections.unmodifiableXXX() wrapper if they truly are unmodifiable, this will guarantee they won't change, and will enforce that contract. If they are going to be modified, then use java.util.Collections.syncronizedXXX(). But that only guarantees internal thread safety. Making the variables final will also help the compiler/JIT with optimizations.
Related
I'm using Java to write a multithreading application. One question I have is I have a list that is accessed by multiple threads, and I have one thread trying to update it. However, each time, the update thread will create a new List and then make the public shared list point to the new List, like this:
Public List<DataObject> publicDataObject = XXXX; // <- this will be accessed by multiple threads
Then I have one thread updating this List:
List<DataObject> newDataObjectList = CreateNewDataObject();
publicDataObject = newDataObjectList;
When I update the pointer of publicDataObject, do I need a lock to make it thread-safe?
Before going into the answer, let's first check if I understand the situation and the question.
Assumption as stated in problem description: Only a single thread creates new versions of the list, based on the previous value of publicDataObject, and stores that new list in publicDataObject.
Derived assumption: Other threads access DataObject elements from that list, but do not add, remove or change the order of elements.
If this assumption holds, the answer is below.
Otherwise, please make sure your question includes this in its description. This makes the answer much more complex, though, and I advise you to study the topic of concurrency more, for example by reading a book about Java concurrency.
Additional assumption:The DataObject objects themselves are thread-safe.
If this assumption does not hold, this would make the scope of the question too broad and I would suggest to study the topic of concurrency more, for example by reading a book about Java concurrency.
Answer
Given that the above assumptions are true, you do not need a lock, however, you cannot just access publicDataObject from multiple threads, using its definition in you code example. The reason is the Java Memory Model. The Java Memory Model makes no guarantees whatsoever about threads seeing changes made by other threads, unless you use special language constructs like atomic variables, locks or synchronization.
Simplified, these constructs ensure that a read in one thread that happens after a write in another, can see that written value, as long as you are using the same construct: the same atomic variable, lock or synchronisation on the same object. Locks and intrinsic locks (used by synchronisation) can also ensure exclusive access of a single thread to a block of code.
Given, again, that the above assumptions are true, you can suffice using an AtomicReference, whose get and set methods have the desired relationship:
// Definition
public AtomicReference<List<DataObject>> publicDataObject;
The reasons that a simple construct can be used that "only" guarantees visibility are:
The list that publicDataObject refers to, is always a new one (the first assumption). Therefore, other threads can never see a stale value of the list itself. Making sure that threads see the correct value of publicDataObject is therefore enough
As long as other threads don't change the list.
If in addition, only thread sets publicDataObject, there can be no race conditions in setting it, for example loosing updates that are overwritten by more recent updates before ever being read.
If I call a object synchronized, can I access objects inside that object as if they were synchronized? Or can I only access the data types?
Even though your goal is to protect data, synchronization provides exclusivity around a block of code, not a piece of data. Code outside the synchronization blocks (or in blocks that use different objects), may alter the data you are trying to protect even if that isn't what you want.
Any correct locking strategy must ensure that blocks of code that could interfere with each other hold the same lock. That includes code which could interfere with another copy of itself run in a second thread.
synchronized (myObject) {
// sensitive code
}
Locking at the method level is just a shorthand for locking the this pointer for the body of the method. (Or the class object for a static method).
Possibly, but only with care. You can do this if you always lock the same object.
Most likely you have to lock each object.
The ability to synchronize on every object is a commonly cited annoyance in Java because it's confusing.
Basically, all it means is that every object can be a lock. That's it. Hence, there is no special effect on object's members when you lock on the parent object and it doesn't matter which particular object you use as a lock. If all your thread locks on the same object, only one of them will be running/accessing whatever code is in the synchronized block. If some of them don't, there is no such guarantee.
If you want to make sure only one thread is accessing a member at any given time, make sure all threads that access that member lock (or "synchronize") on the same object before accessing it. As long as you do that, it doesn't matter which object you use for the lock.
When to use volatile keyword vs synchronization in multithreading?
Use volatile to guarantee that each read access to a variable will see the latest value written to that variable. Use synchronized whenever you need values to be stable for multiple instructions. (Note that this does not necessarily mean multiple statements; the single statement:
var++; // NOT thread safe!
is not thread-safe even if var is declared volatile. You need to do this:
synchronized(LOCK_OBJECT){var++;}
See here for a nice summary of this issue.
Volatile only ensures the read operation always gives the latest state from memory across threads. However, it does not ensure any write safety / ordering of operations, i.e. two threads can update the volatile variable in any random order. Also it does not ensure that multiple operations on the variable are atomic.
However a synchronized block ensures latest state and write safety. Also the access and update to variable is atomic inside a synchronized block.
The above, however is true, only if all the access / updates to the variable in question are using the same lock object so that at no time multiple threads gets access to the variable.
That's a pretty broad question. The best answer I can give is to use synchronized when performing multiple actions that must be seen by other threads as occurring atomically—either all or none of the steps have occurred.
For a single action, volatile may be sufficient; it acts as a memory barrier to ensure visibility of the change to other threads.
Okay, suppose I have a bunch of variables, one of them declared volatile:
int a;
int b;
int c;
volatile int v;
If one thread writes to all four variables (writing to v last), and another thread reads from all four variables (reading from v first), does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
Since there seems to be some confusion: I'm not deliberately trying to do something unsafe. I just want to understand the Java memory model and the semantics of the volatile keyword. Pure curiosity.
I'm going to speak to what I think you may really be probing about—piggybacking synchronization.
The technique that it looks like you're trying to use involves using one volatile variable as a synchronization guard in concert with one or more other non-volatile variables. This technique is applicable when the following conditions hold true:
Only one thread will write to the set of values meant to be guarded.
The threads reading the set of values will read them only if the volatile guard value meets some criteria.
You don't mention the second condition holding true for your example, but we can examine it anyway. The model for the writer is as follows:
Write to all the non-volatile variables, assuming that no other thread will try to read them.
Once complete, write a value to the volatile guard variable that indicates that the readers' criteria is met.
The readers operate as follows:
Read the volatile guard variable at any time, and if its value meets the criteria, then
Read the other non-volatile variables.
The readers must not read the other non-volatile variables if the volatile guard variable does not yet indicate a proper value.
The guard variable is acting as a gate. It's closed until the writer sets it to a particular value, or set of values that all meet the criteria of indicating that the gate is now open. The non-volatile variables are guarded behind the gate. The reader is not permitted to read them until the gate opens. Once the gate is open, the reader will see a consistent view of the set of non-volatile variables.
Note that it is not safe to run this protocol repeatedly. The writer can't keep changing the non-volatile variables once it's opened the gate. At that point, multiple reader threads may be reading those other variables, and they can—though are not guaranteed—see updates to those variables. Seeing some but not all of those updates would yield inconsistent views of the set.
Backing up, the trick here is to control access to a set of variables without either
creating a structure to hold them all, to which an atomic reference could be swapped, um, atomically, or
using a lock to make writing to and reading from the entire set of variables mutually exclusive activities.
Piggybacking on top of the volatile guard variable is a clever stunt—not one to be done casually. Subsequent updates to the program can break the aforementioned fragile conditions, removing the consistency guarantees afforded by the Java memory model. Should you choose to use this technique, document its invariants and requirements in the code clearly.
Yes. volatile, locks, etc., setup the happens-before relationship, but it affects all variables (in the new Java Memory Model (JMM) from Java SE 5/JDK 1.4). Kind of makes it useful for non-primitive volatiles...
does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
You will get stale reads, b/c you can't ensure that the values of a, b, c are the ones set after reading of v. Using state machine (but you need CAS to change the state) is a way to tackle similar issues but it's beyond the scope of the discussion.
Perhaps this part is unclear, after writing to v and reading first from v, you'd get the right results (non-stale reads), the main issue is that if you do
if (v==STATE1){...proceed...}, there is no guarantee some other thread would not be modifying the state of a/b/c. In that case, there will be state reads.
If you modify the a/b/c+v once only you'd get the correct result.
Mastering concurrency and and lock-free structures is a really hard one. Doug Lea has a good book on and most talks/articles of Dr. Cliff Click are a wonderful wealth, if you need something to start digging in.
Yes, volatile write "happens-before" next volatile read on the same variable.
While #seh is right on about consistency problems with multiple variables, there are use cases that less consistency is required.
For example, a writer thread updates some state variables; a reader thread displays them promptly. There's not much relation among the variables, we only care about reading the new values promptly. We could make every state variable volatile. Or we could use only one volatile variable as visibility guard.
However, the saving is only on the paper, performance wise there's hardly any difference. In either version, every state variable must be "flushed" by the writer and "loaded" by the reader. No free lunch.
Let's say I have this class:
class Zoo
{
protected String bearName;
protected Double trainerSalary;
protected Integer monkeyCount;
}
Can one thread write to these fields, and another one read them, without requiring synchronized access to the Zoo object?
Note: these values can be treated separate from one another, so it doesn't matter that the trainerSalary is changed while the monkeyCount is read.
EDIT:
Just to clarify, the fields are mutable; only their referenced objects are immutable.
Technically you need to make them final, volatile or read and write them using synchronzied to guarantee that the reader will read the most up-to-date value. As you have it right now, if one thread writes in a value, there's no guarantee that another thread will read the same value. This is because the the reading thread may see a cached valued. This is more likely with multi-core CPUs and various levels of cache.
A great book on this is Java Concurrency in Practice.
Accesses and updates to the memory cells corresponding to fields of any type except long or double are guaranteed to be atomic (see Concurrent Programming In Java). That's why one might expect that you don't need to synchronize read access to your fields. However, the Java memory model allows threads to cache previously read values in case you access them repeatedly so you should mark the fields as volatile to ensure that each thread sees the most recent values.
If you are sure that nobody will change the values of the fields, make them final. In that case, no volatile field is necessary.
Things are different if the values of the fields depend on each other. In that case, I'd recommend to use synchronized setters that ensure that the invariant of your class is not violated.
As you've stated the class it's possible for another class in the same package to change these values. This class isn't immutable.
Now if you did something like
class Zoo
{
protected final String bearName;
protected final Double trainerSalary;
protected final Integer monkeyCount;
}
Then the class would be immutable. If the logic of your program treats this class as immutable, then why not make it actually immutable?
Also, if multiple threads were checking and updating the same value then you could have issue. Say multiple threads were checking and updating monkeyCount, then there is a good chance monkeyCount would end up incorrect because there is nothing that is forcing these check and updates to occur atomically.
My 2 cents, from "The Java Programming Language", 4 ed., 14.10.2 :
"There is a common misconception that shared access to immutable objects does not require any synchronization because the state of the object never changes. This is a misconception in general because it relies on the assumption that a thread will be guaranteed to see the
initialized state of the immutable object, and that need not be the case. The problem is that, while the shared object is immutable, the reference used to access the shared object is itself shared and often mutable - consequently, a correctly synchronized program must synchronize access to that shared reference, but often programs do not do this, because programmers do not recognize the need to do it. For example, suppose one thread creates a String object and stores a reference to it in a static field. A second thread then uses that
reference to access the string. There is no guarantee, based on what we've discussed so far, that the values written by the first thread when constructing the string will be seen by the second thread when it accesses the string."
If those variables are indeed independent, then no, you do not need synchronization. However, as you note, if you had
protected Integer monkeysLeft;
protected Integer monkeysEatenByBears;
where the two variables are logically connected, you would want synchronized access to the pair of them.