Multiple thread accessing same data but getting latest data? - java

I wrote this program:
package com.example.threads;
import java.util.concurrent.ConcurrentHashMap;
public class ConcurrentHashMapBehaviour {
private static ConcurrentHashMap<String, String> chm = new ConcurrentHashMap<>();
private static Object _lock = new Object();
public static void main(String[] args) {
Thread t = new Thread(new MyThread());
t.start();
int counter = 0;
while (true) {
String val = "FirstVal" + counter;
counter++;
String currentVal = null;
synchronized (_lock) {
chm.put("first", val);
currentVal = chm.get("first");
}
System.out.println("In Main thread, current value is : " + currentVal);
}
}
static class MyThread implements Runnable {
#Override
public void run() {
String val = null;
while (true) {
synchronized (_lock) {
val = chm.get("first");
}
System.out.println("Value seen in MyThread is " + val);
}
}
}
}
I am sharing a common data between these thread viz: chm (ConcurrentHashMap). I made this to run in debug mode in which I made Main thread run more times than MyThread, both are controlled by _lock.
So, for instance, I made to run Main thread twice and so the value of "first" key would be "FirstVal1". Then i made Main Thread to halt and made MyThread to proceed, it was able to get the latest value, even though main thread was run multiple times.
How is this possible? I was under the impression that this variable needs to be volatile in order for these MyThread to get the latest values.
I didn't understand this behaviour. Can anyone decipher this where I am missing?

First, you're using a ConcurrentHashMap, which is safe to use in a multi-threaded environment, so if a thread puts a value into it, other threads will be able to see that value.
Second, you are synchronizing access to the map. That will ensure only one thread will write to the map.
Each such explicit synchronization also includes a memory-barrier, which will write any results waiting in a cache to be written to the main memory, making it possible for other threads to see it. Which is what a volatile variable access is: access to volatile values have memory visibility guarantees.
If you want to see data races in your program, remove all synchronization primitives and try again. That does not guarantee that you'll observe a race all the time, but you should be able to see unexpected values every now and then.

There are three misconceptions here:
Writing to a volatile variable guarantees that all changes made by the writing thread are published, i.e. can be seen by other threads. See The Java Language Specification Chapter 8 for all the details. This does not mean that the absence of the volatile modifier forbids publication. JVM implementations may be (and actually are) implemented much more forgiving. This is one of the reasons concurrency problems are so hard to trace.
"A hash table supporting full concurrency of retrievals and high expected concurrency for updates." is the first sentence of the API Documentation on the ConcurrentHashMap class. And that pretty much sums it up. The concurrent hashmap guarantees that when calling get any thread gets the latest value. That's exactly the purpose of this class. If you look at its source code you can by the way see that they use volatile fields internally.
You're additionally using synchronized blocks to access your data. These do not only guarantee exclusive access, they also guarantee that all changes made before leaving such a block are visible to all threads that synchronize on the same lock object.
To summarize it: By using the concurrent hashmap implementation and using synchronization blocks you publish the changes and make the latest changes visible to other threads. One of the two would have already been sufficient.

Related

Sharing variable without synchronization

I read it from Java Concurrency in Practice, that it is bad to share variables in threads without synchronisation. However, for some examples as following which only have one read thread and one write thread, I can't find errors in it. From my perspective, the result for the following program will definitely terminate and print 42 because ReaderThread can go through only when ready becomes true, and that means number is 42. Could somebody give me some explanation why I am wrong?
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready)
Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}
Since ready isn't volatile, there's no guarantee that ReaderThread will see that your main thread has changed it. When you mark ready as volatile, all writes from one thread will be seen in other threads reading it.
You always need some sort of synchronization / visibility control when communicating between threads. Whether it's volatile, explicitly using synchronized or using the java.util.concurrent.* classes.
You don't need synchronization (e.g., synchronized) in your example (though you do need volatile, more below) because reads and writes of boolean and int variables are always atomic. Which is to say, a thread can't be part-way through writing to a boolean (or int) variable when another thread comes along and reads it, getting garbage. The value being written by one thread is always fully written before another thread can read it. (This is not true of non-volatile double or long variables; it would be entirely possible for a thread to read garbage if it happened to read in the middle of another thread's write to a long or double if they aren't marked volatile.)
But you do need volatile, because each thread can have its own copy of the variables, and potentially can keep using its own copy for a long period of time. So it's entirely possible for your reader thread to wait forever, because it keeps re-reading its own copy of ready which stays false even though your main thread writes true to its copy of ready. It's also possible for your reader thread to see ready become true but keep reading its own copy of number, and so print 0 instead of 42.
You would need to use synchronized if you were modifying the state of an object that doesn't guarantee thread-safe access. For instance, if you were adding to a Map or List. That's because there are multiple operations involved, and it's essential to prevent one thread from reading a half-complete change another thread is making.
Other classes, such as those in java.util.concurrent, offer classes with thread-safe access semantics.

How does Thread.yield prevent a print statement from executing in a while loop given that the main thread changes the 'while' condition [duplicate]

I'm looking at a code sample from "Java Concurrency in Practice" by Brian Goetz. He says that it is possible that this code will stay in an infinite loop because "the value of 'ready' might never become visible to the reader thread". I don't understand how this can happen...
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready)
Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}
Because ready isn't marked as volatile and the value may be cached at the start of the while loop because it isn't changed within the while loop. It's one of the ways the jitter optimizes the code.
So it's possible that the thread starts before ready = true and reads ready = false caches that thread-locally and never reads it again.
Check out the volatile keyword.
The reason is explained in the section following the one with the code sample.
3.1.1 Stale data
NoVisibility demonstrated on of the ways that insufficiently synchronized programs can cause surprising results: stale data. When the reader thread examines ready, it may see an out-of-date value. Unless synchronization is used every time a variable is accessed, it is possible to see a stale value for that variable.
The Java Memory Model allows the JVM to optimize reference accesses and such as if it is a single threaded application, unless the field is marked as volatile or the accesses with a lock being held (the story gets a bit complicated with locks actually).
In the example, you provided, the JVM could infer that ready field may not be modified within the current thread, so it would replace !ready with false, causing an infinite loop. Marking the the field as volatile would cause the JVM to check the field value every time (or at least ensure that ready changes propagate to the running thread).
The problem is rooted in the hardware -- each CPU has different behavior with respect to cache coherence, memory visibility, and reordering of operations. Java is in better shape here than C++ because it defines a cross-platform memory model that all programmers can count on. When Java runs on a system whose memory model is weaker than that required by the Java Memory Model, the JVM has to make up the difference.
Languages like C "inherit" the memory model of the underlying hardware. There is work afoot to give C++ a formal memory model so that C++ programs can mean the same thing on different platforms.
private static boolean ready;
private static int number;
The way the memory model can work is that each thread could be reading and writing to its own copy of these variables (the problem affects non-static member variables too). This is a consequence of the way the underlying architecture can work.
Jeremy Manson and Brian Goetz:
In multiprocessor systems, processors generally have one or more layers of memory cache,which improves performance both by speeding access to data (because the data is closer to the processor) and reducing traffic on the shared memory bus (because many memory operations can be satisfied by local caches.) Memory caches can improve performance tremendously, but they present a host of new challenges. What, for example, happens when two processors examine the same memory location at the same time? Under what conditions will they see the same value?
So, in your example, the two threads might run on different processors, each with a copy of ready in their own, separate caches. The Java language provides the volatile and synchronized mechanisms for ensuring that the values seen by the threads are in sync.
public class NoVisibility {
private static boolean ready = false;
private static int number;
private static class ReaderThread extends Thread {
#Override
public void run() {
while (!ready) {
Thread.yield();
}
System.out.println(number);
}
}
public static void main(String[] args) throws InterruptedException {
new ReaderThread().start();
number = 42;
Thread.sleep(20000);
ready = true;
}
}
Place the Thread.sleep() call for 20 secs what will happen is JIT will kick in during those 20 secs and it will optimize the check and cache the value or remove the condition altogether. And so the code will fail on visibility.
To stop that from happening you MUST use volatile.

When to use volatile and synchronized

I know there are many questions about this, but I still don't quite understand. I know what both of these keywords do, but I can't determine which to use in certain scenarios. Here are a couple of examples that I'm trying to determine which is the best to use.
Example 1:
import java.net.ServerSocket;
public class Something extends Thread {
private ServerSocket serverSocket;
public void run() {
while (true) {
if (serverSocket.isClosed()) {
...
} else { //Should this block use synchronized (serverSocket)?
//Do stuff with serverSocket
}
}
}
public ServerSocket getServerSocket() {
return serverSocket;
}
}
public class SomethingElse {
Something something = new Something();
public void doSomething() {
something.getServerSocket().close();
}
}
Example 2:
public class Server {
private int port;//Should it be volatile or the threads accessing it use synchronized (server)?
//getPort() and setPort(int) are accessed from multiple threads
public int getPort() {
return port;
}
public void setPort(int port) {
this.port = port;
}
}
Any help is greatly appreciated.
A simple answer is as follows:
synchronized can always be used to give you a thread-safe / correct solution,
volatile will probably be faster, but can only be used to give you a thread-safe / correct in limited situations.
If in doubt, use synchronized. Correctness is more important than performance.
Characterizing the situations under which volatile can be used safely involves determining whether each update operation can be performed as a single atomic update to a single volatile variable. If the operation involves accessing other (non-final) state or updating more than one shared variable, it cannot be done safely with just volatile. You also need to remember that:
updates to non-volatile long or a double may not be atomic, and
Java operators like ++ and += are not atomic.
Terminology: an operation is "atomic" if the operation either happens entirely, or it does not happen at all. The term "indivisible" is a synonym.
When we talk about atomicity, we usually mean atomicity from the perspective of an outside observer; e.g. a different thread to the one that is performing the operation. For instance, ++ is not atomic from the perspective of another thread, because that thread may be able to observe state of the field being incremented in the middle of the operation. Indeed, if the field is a long or a double, it may even be possible to observe a state that is neither the initial state or the final state!
The synchronized keyword
synchronized indicates that a variable will be shared among several threads. It's used to ensure consistency by "locking" access to the variable, so that one thread can't modify it while another is using it.
Classic Example: updating a global variable that indicates the current time
The incrementSeconds() function must be able to complete uninterrupted because, as it runs, it creates temporary inconsistencies in the value of the global variable time. Without synchronization, another function might see a time of "12:60:00" or, at the comment marked with >>>, it would see "11:00:00" when the time is really "12:00:00" because the hours haven't incremented yet.
void incrementSeconds() {
if (++time.seconds > 59) { // time might be 1:00:60
time.seconds = 0; // time is invalid here: minutes are wrong
if (++time.minutes > 59) { // time might be 1:60:00
time.minutes = 0; // >>> time is invalid here: hours are wrong
if (++time.hours > 23) { // time might be 24:00:00
time.hours = 0;
}
}
}
The volatile keyword
volatile simply tells the compiler not to make assumptions about the constant-ness of a variable, because it may change when the compiler wouldn't normally expect it. For example, the software in a digital thermostat might have a variable that indicates the temperature, and whose value is updated directly by the hardware. It may change in places that a normal variable wouldn't.
If degreesCelsius is not declared to be volatile, the compiler is free to optimize this:
void controlHeater() {
while ((degreesCelsius * 9.0/5.0 + 32) < COMFY_TEMP_IN_FAHRENHEIT) {
setHeater(ON);
sleep(10);
}
}
into this:
void controlHeater() {
float tempInFahrenheit = degreesCelsius * 9.0/5.0 + 32;
while (tempInFahrenheit < COMFY_TEMP_IN_FAHRENHEIT) {
setHeater(ON);
sleep(10);
}
}
By declaring degreesCelsius to be volatile, you're telling the compiler that it has to check its value each time it runs through the loop.
Summary
In short, synchronized lets you control access to a variable, so you can guarantee that updates are atomic (that is, a set of changes will be applied as a unit; no other thread can access the variable when it's half-updated). You can use it to ensure consistency of your data. On the other hand, volatile is an admission that the contents of a variable are beyond your control, so the code must assume it can change at any time.
There is insufficient information in your post to determine what is going on, which is why all the advice you are getting is general information about volatile and synchronized.
So, here's my general advice:
During the cycle of writing-compiling-running a program, there are two optimization points:
at compile time, when the compiler might try to reorder instructions or optimize data caching.
at runtime, when the CPU has its own optimizations, like caching and out-of-order execution.
All this means that instructions will most likely not execute in the order that you wrote them, regardless if this order must be maintained in order to ensure program correctness in a multithreaded environment. A classic example you will often find in the literature is this:
class ThreadTask implements Runnable {
private boolean stop = false;
private boolean work;
public void run() {
while(!stop) {
work = !work; // simulate some work
}
}
public void stopWork() {
stop = true; // signal thread to stop
}
public static void main(String[] args) {
ThreadTask task = new ThreadTask();
Thread t = new Thread(task);
t.start();
Thread.sleep(1000);
task.stopWork();
t.join();
}
}
Depending on compiler optimizations and CPU architecture, the above code may never terminate on a multi-processor system. This is because the value of stop will be cached in a register of the CPU running thread t, such that the thread will never again read the value from main memory, even thought the main thread has updated it in the meantime.
To combat this kind of situation, memory fences were introduced. These are special instructions that do not allow regular instructions before the fence to be reordered with instructions after the fence. One such mechanism is the volatile keyword. Variables marked volatile are not optimized by the compiler/CPU and will always be written/read directly to/from main memory. In short, volatile ensures visibility of a variable's value across CPU cores.
Visibility is important, but should not be confused with atomicity. Two threads incrementing the same shared variable may produce inconsistent results even though the variable is declared volatile. This is due to the fact that on some systems the increment is actually translated into a sequence of assembler instructions that can be interrupted at any point. For such cases, critical sections such as the synchronized keyword need to be used. This means that only a single thread can access the code enclosed in the synchronized block. Other common uses of critical sections are atomic updates to a shared collection, when usually iterating over a collection while another thread is adding/removing items will cause an exception to be thrown.
Finally two interesting points:
synchronized and a few other constructs such as Thread.join will introduce memory fences implicitly. Hence, incrementing a variable inside a synchronized block does not require the variable to also be volatile, assuming that's the only place it's being read/written.
For simple updates such as value swap, increment, decrement, you can use non-blocking atomic methods like the ones found in AtomicInteger, AtomicLong, etc. These are much faster than synchronized because they do not trigger a context switch in case the lock is already taken by another thread. They also introduce memory fences when used.
Note: In your first example, the field serverSocket is actually never initialized in the code you show.
Regarding synchronization, it depends on whether or not the ServerSocket class is thread safe. (I assume it is, but I have never used it.) If it is, you don't need to synchronize around it.
In the second example, int variables can be atomically updated so volatile may suffice.
volatile solves “visibility” problem across CPU cores. Therefore, value from local registers is flushed and synced with RAM. However, if we need consistent value and atomic op, we need a mechanism to defend the critical data. That can be achieved by either synchronized block or explicit lock.

Do I need to use volatile, if 2 different write and read thread will never alive at the same time

By referring to http://www.javamex.com/tutorials/synchronization_volatile.shtml, I am not sure whether I need to use volatile keyword in the following case, due to additional rule 3.
A primitive static variable will be write by Thread A.
The same primitive static variable will be read by Thread B.
Thread B will only run, after Thread A is "dead". ("dead" means, the last statement of Thread A's void run is finished)
Will the new value written by Thread A, will always committed to main memory, after it "dead"? If yes, does it mean I need not volatile keyword if the above 3 conditions are meet?
I am doubt that volatile is being required in this case. As it is required, then ArrayList may broken. As one thread may perform insert and update size member variable. Later, another thread (not-concurrently) may read the ArrayList's size. If you look at ArrayList source code, size is not being declared as volatile.
In JavaDoc of ArrayList, then only mention that ArrayList is not safe to be used for multiple threads access an ArrayList instance concurrently, but not for multiple threads access an ArrayList instance at different timing.
Let me use the following code to issulate this problem
public static void main(String[] args) throws InterruptedException {
// Create and start the thread
final ArrayList<String> list = new ArrayList<String>();
Thread writeThread = new Thread(new Runnable() {
public void run() {
list.add("hello");
}
});
writeThread.join();
Thread readThread = new Thread(new Runnable() {
public void run() {
// Does it guarantee that list.size will always return 1, as this list
// is manipulated by different thread?
// Take note that, within implementation of ArrayList, member
// variable size is not marked as volatile.
assert(1 == list.size());
}
});
readThread.join();
}
Yes, you still need to use volatile (or some other form of synchronization).
The reason why is that the two threads could run on different processors and even if one thread has long finished before the other starts there is no guarantee that the second thread will get the freshest value when it makes the read. If the field is not marked as volatile and no other synchronization is used, then the second thread could get a value that was cached locally on the processor it is running on. That cached value could in theory be out-of-date for a long period of time, including after the first thread completed.
If you use volatile the value will always be written to and read from main memory, bypassing the processor's cached value.
No, you may not need it. despite Mark Byers answer begin fairly accurate, it is limited. synchronized and volatile are not the only ways to correctly pass data between threads. there are other, less talked about "synchronization points". specifically, thread start and thread end are synchronization points. however, the thread which is starting Thread B must have recognized that Thread A is finished (e.g. by joining the thread or checking the thread's state). if this is the case, the the variable does not need to be volatile.
Possibly yes, unless you manually create a memory barrier. If A sets the variable, and B decides to take oit from some registry, you have a problem. So, you need a mmemory barrier, either implicit (lock, volatile) or explicit.
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.4
The final action in a thread T1
synchronizes-with any action in
another thread T2 that detects that T1
has terminated. T2 may accomplish this
by calling T1.isAlive() or T1.join().
So it is possible to achieve your goal without using volatile.
In many cases, when there are apparent time dependencies, synchronization is being done by someone under the hood, and application doesn't need extra synchronization. Unfortunately this is not the rule, programmers must analyze each case carefully.
One example is Swing worker thread. People would do some calculation in a worker thread, save the result to a variable, then raise an event. The event thread will then read the result of the calculation from the variable. No explicit synchronization is needed from application code, because "raising an event" already did synchronization, so writes from worker thread is visible from event thread.
On one hand, this is a bliss. On the other hand, many people didn't understand this, they omit the synchronization simply because they never thought about the issue. Their programs happen to be correct... this time.
If Thread A definitely dies before Thread B starts reading then it would be possible to avoid using volatile
eg.
public class MyClass {
volatile int x = 0;
public static void main(String[] args) {
final int i = x;
new Thread() {
int j = i;
public void run() {
j = 10;
final int k = j;
new Thread() {
public void run() {
MyClass.x = k;
}
}.start();
}
}.start();
}
}
However, the problem is that whichever Thread starts Thread B will need to now that the value that Thread A is writing to has changed and to not use its own cached version. The easiest way to do this is to get Thread A to spawn Thread B. But if Thread A has nothing else to do when it spawns Thread B then this seems a little pointless (why not just use the same thread).
The other alternative is that if no other thread is dependent on this variable then maybe Thread A could initial a local variable with the volatile variable, do what it needs to do, and then finally write the contents of its local variable back to the volatile variable. Then when Thread B starts it initialises its local variable from the volatile variable and reads only from its local variable thereafter. This should massively reduce the amount of time spent keeping the volatile variable in sync. If this solution seems unacceptable (because of other threads writing to the volatile variable or whatever) then you definitely need to declare the variable volatile.

Concurrency, object visibility

I'm trying to figure out if the code below suffers from any potential concurrency issues. Specifically, the issue of visibility related to volatile variables. Volatile is defined as: The value of this variable will never be cached thread-locally: all reads and writes will go straight to "main memory"
public static void main(String [] args)
{
Test test = new Test();
// This will always single threaded
ExecutorService ex = Executors.newSingleThreadExecutor();
for (int i=0; i<10; ++i)
ex.execute(test);
}
private static class Test implements Runnable {
// non volatile variable in question
private int state = 0;
#Override
public void run() {
// will we always see updated state value? Will updating state value
// guarantee future run's see the value?
if (this.state != -1)
this.state++;
}
}
For the above single threaded executor:
Is it okay to make test.state non volatile? In other words, will every successive Test.run() (which will occur sequentially and not concurrently because again executor is single threaded), always see the updated test.state value? If not, doesn't exiting of Test.run() ensure any changes made thread locally get written back to main memory? Otherwise when does changes made thread locally get written back to main memory if not upon exiting of the thread?
As long as it's only a single thread there is no need to make it volatile. If you're going to use multiple threads, you should not only use volatile but synchronize too. Incrementing a number is not an atomic operation - that's a common misconception.
public void run() {
synchronize (this) {
if (this.state != -1)
this.state++;
}
}
Instead of using synchronization, you could also use AtomicInteger#getAndIncrement() (if you won't need an if before).
private AtomicInteger state = new AtomicInteger();
public void run() {
state.getAndIncrement()
}
Originally, I was thinking this way:
If the task were always executed by
the same thread, there would be no
problem. But Excecutor produced by
newSingleThreadExecutor() may create
new threads to replace a those that
are killed for any reason. There is no
guarantee about when the replacement
thread will be created or which thread
will create it.
If a thread performs some writes, then
calls start() on a new thread, those
writes will be visible to the new
thread. But there is no guarantee that
that rule applies in this case.
But irreputable is right: creating a correct ExecutorService without sufficient barriers to ensure visibility is practically impossible. I was forgetting that detecting the death of another thread is a synchronizes-with relationship. The blocking mechanism used to idle worker threads would also require a barrier.
Yes it is safe, even if the executor replaced its thread in the middle. Thread start/terminate are also synchronization points.
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.4
A simple example:
static int state;
static public void main(String... args) {
state = 0; // (1)
Thread t = new Thread() {
public void run() {
state = state + 1; // (2)
}
};
t.start();
t.join();
System.out.println(state); // (3)
}
It is guaranteed that (1), (2), (3) are well ordered and behave as expected.
For the single thread executor, "Tasks are guaranteed to execute sequentially", it must somehow detect the finish of one task before starting the next one, which necessarily properly synchronizes the different run()'s
Your code, specifically this bit
if (this.state != -1)
this.state++;
would require the atomic test of the state value and then an increment to the state in a concurrent context. So even if your variable was volatile and more than one thread was involved, you would have concurrency issues.
But your design is based on asserting that there will always only be one instance of Test, and, that single instance is only granted to a single (same) thread. (But note that the single instance is in fact a shared state between the main thread and the executor thread.)
I think you need to make these assumptions more explicit (in the code, for example, use the ThreadLocal and ThreadLocal.get()). This is to guard both against future bugs (when some other developer may carelessly violate the design assumptions), and, guard against making assumptions about the internal implementation of the Executor method you are using which may in some implementations simply provide single threaded executor (i.e. sequential and not necessarily the same thread in each invocation of execute(runnable).
It is perfectly fine for state to be non-volatile in this specific code, because there is only one thread, and only that thread accesses the field. Disabling caching the value of this field within the only thread you have will just give a performance hit.
However if you wish to use the value of state in the main thread which is running the loop, you have to make the field volatile:
for (int i=0; i<10; ++i) {
ex.execute(test);
System.out.println(test.getState());
}
However, even this might not work correctly with volatile, because there is no synchronization between the threads.
Since the field is private, there is only an issue if the main thread executes a method that can access this field.
If your ExecutorService is single threaded then there is no shared state, so I don't see how there could be any issues around that.
However wouldn't it make more sense to pass a new instance of your Test class to each call to execute()? i.e.
for (int i=0; i<10; ++i)
ex.execute(new Test());
This way there will not be any shared state.

Categories