Strange NullPointerException in multithreading program [duplicate] - java

When writing multithreaded applications, one of the most common problems experienced is race conditions.
My questions to the community are:
What is the race condition?
How do you detect them?
How do you handle them?
Finally, how do you prevent them from occurring?

A race condition occurs when two or more threads can access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any time, you don't know the order in which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are "racing" to access/change the data.
Problems often occur when one thread does a "check-then-act" (e.g. "check" if the value is X, then "act" to do something that depends on the value being X) and another thread does something to the value in between the "check" and the "act". E.g:
if (x == 5) // The "Check"
{
y = x * 2; // The "Act"
// If another thread changed x in between "if (x == 5)" and "y = x * 2" above,
// y will not be equal to 10.
}
The point being, y could be 10, or it could be anything, depending on whether another thread changed x in between the check and act. You have no real way of knowing.
In order to prevent race conditions from occurring, you would typically put a lock around the shared data to ensure only one thread can access the data at a time. This would mean something like this:
// Obtain lock for x
if (x == 5)
{
y = x * 2; // Now, nothing can change x until the lock is released.
// Therefore y = 10
}
// release lock for x

A "race condition" exists when multithreaded (or otherwise parallel) code that would access a shared resource could do so in such a way as to cause unexpected results.
Take this example:
for ( int i = 0; i < 10000000; i++ )
{
x = x + 1;
}
If you had 5 threads executing this code at once, the value of x WOULD NOT end up being 50,000,000. It would in fact vary with each run.
This is because, in order for each thread to increment the value of x, they have to do the following: (simplified, obviously)
Retrieve the value of x
Add 1 to this value
Store this value to x
Any thread can be at any step in this process at any time, and they can step on each other when a shared resource is involved. The state of x can be changed by another thread during the time between x is being read and when it is written back.
Let's say a thread retrieves the value of x, but hasn't stored it yet. Another thread can also retrieve the same value of x (because no thread has changed it yet) and then they would both be storing the same value (x+1) back in x!
Example:
Thread 1: reads x, value is 7
Thread 1: add 1 to x, value is now 8
Thread 2: reads x, value is 7
Thread 1: stores 8 in x
Thread 2: adds 1 to x, value is now 8
Thread 2: stores 8 in x
Race conditions can be avoided by employing some sort of locking mechanism before the code that accesses the shared resource:
for ( int i = 0; i < 10000000; i++ )
{
//lock x
x = x + 1;
//unlock x
}
Here, the answer comes out as 50,000,000 every time.
For more on locking, search for: mutex, semaphore, critical section, shared resource.

What is a Race Condition?
You are planning to go to a movie at 5 pm. You inquire about the availability of the tickets at 4 pm. The representative says that they are available. You relax and reach the ticket window 5 minutes before the show. I'm sure you can guess what happens: it's a full house. The problem here was in the duration between the check and the action. You inquired at 4 and acted at 5. In the meantime, someone else grabbed the tickets. That's a race condition - specifically a "check-then-act" scenario of race conditions.
How do you detect them?
Religious code review, multi-threaded unit tests. There is no shortcut. There are few Eclipse plugin emerging on this, but nothing stable yet.
How do you handle and prevent them?
The best thing would be to create side-effect free and stateless functions, use immutables as much as possible. But that is not always possible. So using java.util.concurrent.atomic, concurrent data structures, proper synchronization, and actor based concurrency will help.
The best resource for concurrency is JCIP. You can also get some more details on above explanation here.

There is an important technical difference between race conditions and data races. Most answers seem to make the assumption that these terms are equivalent, but they are not.
A data race occurs when 2 instructions access the same memory location, at least one of these accesses is a write and there is no happens before ordering among these accesses. Now what constitutes a happens before ordering is subject to a lot of debate, but in general ulock-lock pairs on the same lock variable and wait-signal pairs on the same condition variable induce a happens-before order.
A race condition is a semantic error. It is a flaw that occurs in the timing or the ordering of events that leads to erroneous program behavior.
Many race conditions can be (and in fact are) caused by data races, but this is not necessary. As a matter of fact, data races and race conditions are neither the necessary, nor the sufficient condition for one another. This blog post also explains the difference very well, with a simple bank transaction example. Here is another simple example that explains the difference.
Now that we nailed down the terminology, let us try to answer the original question.
Given that race conditions are semantic bugs, there is no general way of detecting them. This is because there is no way of having an automated oracle that can distinguish correct vs. incorrect program behavior in the general case. Race detection is an undecidable problem.
On the other hand, data races have a precise definition that does not necessarily relate to correctness, and therefore one can detect them. There are many flavors of data race detectors (static/dynamic data race detection, lockset-based data race detection, happens-before based data race detection, hybrid data race detection). A state of the art dynamic data race detector is ThreadSanitizer which works very well in practice.
Handling data races in general requires some programming discipline to induce happens-before edges between accesses to shared data (either during development, or once they are detected using the above mentioned tools). this can be done through locks, condition variables, semaphores, etc. However, one can also employ different programming paradigms like message passing (instead of shared memory) that avoid data races by construction.

A sort-of-canonical definition is "when two threads access the same location in memory at the same time, and at least one of the accesses is a write." In the situation the "reader" thread may get the old value or the new value, depending on which thread "wins the race." This is not always a bug—in fact, some really hairy low-level algorithms do this on purpose—but it should generally be avoided. #Steve Gury give's a good example of when it might be a problem.

A race condition is a situation on concurrent programming where two concurrent threads or processes compete for a resource and the resulting final state depends on who gets the resource first.

A race condition is a kind of bug, that happens only with certain temporal conditions.
Example:
Imagine you have two threads, A and B.
In Thread A:
if( object.a != 0 )
object.avg = total / object.a
In Thread B:
object.a = 0
If thread A is preempted just after having check that object.a is not null, B will do a = 0, and when thread A will gain the processor, it will do a "divide by zero".
This bug only happen when thread A is preempted just after the if statement, it's very rare, but it can happen.

Many answers in this discussion explains what a race condition is. I try to provide an explaination why this term is called race condition in software industry.
Why is it called race condition?
Race condition is not only related with software but also related with hardware too. Actually the term was initially coined by the hardware industry.
According to wikipedia:
The term originates with the idea of two signals racing each other to
influence the output first.
Race condition in a logic circuit:
Software industry took this term without modification, which makes it a little bit difficult to understand.
You need to do some replacement to map it to the software world:
"two signals" ==> "two threads"/"two processes"
"influence the output" ==> "influence some shared state"
So race condition in software industry means "two threads"/"two processes" racing each other to "influence some shared state", and the final result of the shared state will depend on some subtle timing difference, which could be caused by some specific thread/process launching order, thread/process scheduling, etc.

Race conditions occur in multi-threaded applications or multi-process systems. A race condition, at its most basic, is anything that makes the assumption that two things not in the same thread or process will happen in a particular order, without taking steps to ensure that they do. This happens commonly when two threads are passing messages by setting and checking member variables of a class both can access. There's almost always a race condition when one thread calls sleep to give another thread time to finish a task (unless that sleep is in a loop, with some checking mechanism).
Tools for preventing race conditions are dependent on the language and OS, but some comon ones are mutexes, critical sections, and signals. Mutexes are good when you want to make sure you're the only one doing something. Signals are good when you want to make sure someone else has finished doing something. Minimizing shared resources can also help prevent unexpected behaviors
Detecting race conditions can be difficult, but there are a couple signs. Code which relies heavily on sleeps is prone to race conditions, so first check for calls to sleep in the affected code. Adding particularly long sleeps can also be used for debugging to try and force a particular order of events. This can be useful for reproducing the behavior, seeing if you can make it disappear by changing the timing of things, and for testing solutions put in place. The sleeps should be removed after debugging.
The signature sign that one has a race condition though, is if there's an issue that only occurs intermittently on some machines. Common bugs would be crashes and deadlocks. With logging, you should be able to find the affected area and work back from there.

Microsoft actually have published a really detailed article on this matter of race conditions and deadlocks. The most summarized abstract from it would be the title paragraph:
A race condition occurs when two threads access a shared variable at
the same time. The first thread reads the variable, and the second
thread reads the same value from the variable. Then the first thread
and second thread perform their operations on the value, and they race
to see which thread can write the value last to the shared variable.
The value of the thread that writes its value last is preserved,
because the thread is writing over the value that the previous thread
wrote.

What is a race condition?
The situation when the process is critically dependent on the sequence or timing of other events.
For example,
Processor A and processor B both needs identical resource for their execution.
How do you detect them?
There are tools to detect race condition automatically:
Lockset-Based Race Checker
Happens-Before Race Detection
Hybrid Race Detection
How do you handle them?
Race condition can be handled by Mutex or Semaphores. They act as a lock allows a process to acquire a resource based on certain requirements to prevent race condition.
How do you prevent them from occurring?
There are various ways to prevent race condition, such as Critical Section Avoidance.
No two processes simultaneously inside their critical regions. (Mutual Exclusion)
No assumptions are made about speeds or the number of CPUs.
No process running outside its critical region which blocks other processes.
No process has to wait forever to enter its critical region. (A waits for B resources, B waits for C resources, C waits for A resources)

You can prevent race condition, if you use "Atomic" classes. The reason is just the thread don't separate operation get and set, example is below:
AtomicInteger ai = new AtomicInteger(2);
ai.getAndAdd(5);
As a result, you will have 7 in link "ai".
Although you did two actions, but the both operation confirm the same thread and no one other thread will interfere to this, that means no race conditions!

I made a video that explains this.
Essentially it is when you have a state with is shared across multiple threads and before the first execution on a given state is completed, another execution starts and the new thread’s initial state for a given operation is wrong because the previous execution has not completed.
Because the initial state of the second execution is wrong, the resulting computation is also wrong. Because eventually the second execution will update the final state with the wrong result.
You can view it here.
https://youtu.be/RWRicNoWKOY

Here is the classical Bank Account Balance example which will help newbies to understand Threads in Java easily w.r.t. race conditions:
public class BankAccount {
/**
* #param args
*/
int accountNumber;
double accountBalance;
public synchronized boolean Deposit(double amount){
double newAccountBalance=0;
if(amount<=0){
return false;
}
else {
newAccountBalance = accountBalance+amount;
accountBalance=newAccountBalance;
return true;
}
}
public synchronized boolean Withdraw(double amount){
double newAccountBalance=0;
if(amount>accountBalance){
return false;
}
else{
newAccountBalance = accountBalance-amount;
accountBalance=newAccountBalance;
return true;
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
BankAccount b = new BankAccount();
b.accountBalance=2000;
System.out.println(b.Withdraw(3000));
}

Try this basic example for better understanding of race condition:
public class ThreadRaceCondition {
/**
* #param args
* #throws InterruptedException
*/
public static void main(String[] args) throws InterruptedException {
Account myAccount = new Account(22222222);
// Expected deposit: 250
for (int i = 0; i < 50; i++) {
Transaction t = new Transaction(myAccount,
Transaction.TransactionType.DEPOSIT, 5.00);
t.start();
}
// Expected withdrawal: 50
for (int i = 0; i < 50; i++) {
Transaction t = new Transaction(myAccount,
Transaction.TransactionType.WITHDRAW, 1.00);
t.start();
}
// Temporary sleep to ensure all threads are completed. Don't use in
// realworld :-)
Thread.sleep(1000);
// Expected account balance is 200
System.out.println("Final Account Balance: "
+ myAccount.getAccountBalance());
}
}
class Transaction extends Thread {
public static enum TransactionType {
DEPOSIT(1), WITHDRAW(2);
private int value;
private TransactionType(int value) {
this.value = value;
}
public int getValue() {
return value;
}
};
private TransactionType transactionType;
private Account account;
private double amount;
/*
* If transactionType == 1, deposit else if transactionType == 2 withdraw
*/
public Transaction(Account account, TransactionType transactionType,
double amount) {
this.transactionType = transactionType;
this.account = account;
this.amount = amount;
}
public void run() {
switch (this.transactionType) {
case DEPOSIT:
deposit();
printBalance();
break;
case WITHDRAW:
withdraw();
printBalance();
break;
default:
System.out.println("NOT A VALID TRANSACTION");
}
;
}
public void deposit() {
this.account.deposit(this.amount);
}
public void withdraw() {
this.account.withdraw(amount);
}
public void printBalance() {
System.out.println(Thread.currentThread().getName()
+ " : TransactionType: " + this.transactionType + ", Amount: "
+ this.amount);
System.out.println("Account Balance: "
+ this.account.getAccountBalance());
}
}
class Account {
private int accountNumber;
private double accountBalance;
public int getAccountNumber() {
return accountNumber;
}
public double getAccountBalance() {
return accountBalance;
}
public Account(int accountNumber) {
this.accountNumber = accountNumber;
}
// If this method is not synchronized, you will see race condition on
// Remove syncronized keyword to see race condition
public synchronized boolean deposit(double amount) {
if (amount < 0) {
return false;
} else {
accountBalance = accountBalance + amount;
return true;
}
}
// If this method is not synchronized, you will see race condition on
// Remove syncronized keyword to see race condition
public synchronized boolean withdraw(double amount) {
if (amount > accountBalance) {
return false;
} else {
accountBalance = accountBalance - amount;
return true;
}
}
}

You don't always want to discard a race condition. If you have a flag which can be read and written by multiple threads, and this flag is set to 'done' by one thread so that other thread stop processing when flag is set to 'done', you don't want that "race condition" to be eliminated. In fact, this one can be referred to as a benign race condition.
However, using a tool for detection of race condition, it will be spotted as a harmful race condition.
More details on race condition here, http://msdn.microsoft.com/en-us/magazine/cc546569.aspx.

Consider an operation which has to display the count as soon as the count gets incremented. ie., as soon as CounterThread increments the value DisplayThread needs to display the recently updated value.
int i = 0;
Output
CounterThread -> i = 1
DisplayThread -> i = 1
CounterThread -> i = 2
CounterThread -> i = 3
CounterThread -> i = 4
DisplayThread -> i = 4
Here CounterThread gets the lock frequently and updates the value before DisplayThread displays it. Here exists a Race condition. Race Condition can be solved by using Synchronzation

A race condition is an undesirable situation that occurs when two or more process can access and change the shared data at the same time.It occurred because there were conflicting accesses to a resource . Critical section problem may cause race condition. To solve critical condition among the process we have take out only one process at a time which execute the critical section.

Related

Thread safety in java multithreading

I found code about thread safety but it doesn't have any explanation from the person who gave the example. I would like to understand why if I don't set the "synchronized" variable before "count" that the count value will be non-atomic ( always =200 is the desired result). Thanks
public class Example {
private static int count = 0;
public static void main(String[] args) {
for (int i = 0; i < 2; i++) {
new Thread(new Runnable() {
#Override
public void run() {
try {
Thread.sleep(10);
} catch (Exception e) {
e.printStackTrace();
}
for (int i = 0; i < 100; i++) {
//add synchronized
synchronized (Example.class){
count++;
}
}
}).start();
}
try{
Thread.sleep(2000);
}catch (Exception e){
e.printStackTrace();
}
System.out.println(count);
}
}
++ is not atomic
The count++ operation is not atomic. That means it is not a single solitary operation. The ++ is actually three operations: load, increment, store.
First the value stored in the variable is loaded (copied) into a register in the CPU core.
Second, that value in the core’s register is incremented.
Third and last, the new incremented value is written (copied) from the core’s register back to the variable’s content in memory. The core’s register is then free to be assigned other values for other work.
It is entirely possible for two or more threads to read the same value for the variable, say 42. Each of those threads would then proceed to increment the value to the same new value 43. They would then each write back 43 to that same variable, unwittingly storing 43 again and again repeatedly.
Adding synchronized eliminates this race condition. When the first thread gets the lock, the second and third threads must wait. So the first thread is guaranteed to be able to read, increment, and write the new value alone, going from 42 to 43. Once completed, the method exits, thereby releasing the lock. The second thread vying for the lock gets the go-ahead, acquiring the lock, and is able to read, increment, and write the new value 44 without interference. And so on, thread-safe.
Another problem: Visibility
However, this code is still broken.
This code has a visibility problem, with various threads possibly reading stale values kept in caches. But that is another topic. Search to learn more about volatile keyword, the AtomicInteger class, and the Java Memory Model.
I would like to understand why if I don't set the "synchronized" variable before "count" that the count value will be non-atomic.
The short answer: Because the JLS says so!
If you don't use synchronized (or volatile or something similar) then the Java Language Specification (JLS) does not guarantee that the main thread will see the values written to count by the child thread.
This is specified in great detail in the Java Memory Model section of the JLS. But the specification is very technical.
The simplified version is that a read of a variable is not guaranteed to see the value written by a preceding write if there is not a happens before (HB) relationship connecting the write and the read. Then there are a bunch of rules that say when an HB relationship exists. One of the rules is that there is an HB between on thread releasing a mutex and a different thread acquiring it.
An alternative intuitive (but incomplete and technically inaccurate) explanation is that the latest value of count may be cached in a register or a chipset's memory caches. The synchronized construct flushes values to be memory.
The reason that is an inaccurate explanation is that JLS doesn't say anything about registers, caches and so on. Rather, the memory visibility guarantees that the JLS specifies are typically implemented by a Java compiler inserting instructions to write registers to memory, flush caches, or whatever is required by the hardware platform.
The other thing to note is that this is not really about count++ being atomic or not1. It is about whether the result of a change to count is visible to a different thread.
1 - It isn't atomic, but you would get the same effect for an atomic operation like a simple assignment!
Let's get back to the basics with a Wall Street example.
Let's say, You (Lets call T1 ) and your friend (Lets call T2) decided to meet at a coffee house on Wall Street. You both started at same time, let's say from southern end of the Wall Street (Though you are not walking together). You are waking on one side of footpath and your friend is walking on other side of the footpath on Wall Street and you both going towards North (Direction is same).
Now, let's say you came in front of a coffee house and you thought this is the coffee house you and your friend decided to meet, so you stepped inside the coffee house, ordered a cold coffee and started sipping it while waiting.
But, On other side of the road, similar incident happened, your friend came across a coffee shop and ordered a hot chocolate and was waiting for you.
After a while, you both decided the other one is not going to come dropped the plan for meeting.
You both missed your destination and time. Why was this happened? Don't have to mention but, Because you did not decided the exact venue.
The code
synchronized(Example.class){
counter++;
}
solves the problem that you and your friend just encountered.
In technical terms the operation counter++ is actually conducted in three steps;
Step 1: Read the value of counter (lets say 1)
Step 2: Add 1 in to the value of counter variable.
Step 3: Write the value of the variable counter back to memory.
If two threads are working simultaneously on counter variable, final value of the counter will be uncertain. For example, Thread1 could read the value of the counter as 1, at the same time thread2 could read the value of variable as 1. The both threads endup incrementing the value of counter to 2. This is called race condition.
To avoid this issue, the operation counter++ has to be atomic. To make it atomic you need to synchronize execution of the thread. Each thread should modify the counter in organized manner.
I suggest you to read book Java Concurrency In Practice, every developer should read this book.

Does reading a volatile variable affects the value of other no-volatile variable in thread cache?

Why can the following code stop thread 1? Does reading a volatile variable affects the value of other no-volatile variable in thread cache? When thread 1 read "s", it will also reload the new value of "run"?
public class Vol {
boolean run = true;
volatile int s = 1;
public static void main(String[] args) throws InterruptedException {
Vol v = new Vol();
//thread 1
new Thread(() ->{
while (v.run) {
//with this, thread 1 can be shutdown
int a = v.s;
}
}).start();
//thread 2
new Thread(() ->{
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
v.run = false;
System.out.println("set run false");
}).start();
}
}
The Java Memory Model (JMM) defines a few guarantees. Beyond these guarantees, VMs are free to do whatever they want. So, usually, the answer to a question of the form: "Is it possible that caches are flushed here?" is yes, because you should be asking more absolutist questions instead: "Is it guaranteed that caches are flushed here?" is a much more interesting question.
The JMM is based around the notion of 'Happens-Before/Happens-After'. The JMM defines specific scenarios where the JMM will guarantee that code will execute such that what you observe matches the idea that a certain line happened before some other line. For all lines of java code where no HB/HA relationship is established, the JVM may act such that you observe that one ran before the other, or that one ran after another, or even a bizarre mix where parts are observable and other parts are not.
In other words, consider it like an evil coin: Without HB/HA, the JVM flips a coin for every interaction (a change made by one thread, can this be observed by another? Without HB/HA, the evil coin is flipped). It's evil in that the odds aren't 50/50, it's just to mess with you: During dev and that week-long test it worked out every time, and just when sales is giving that demo to the big customer, it fails.
There is no easy way to test that coin flips are even happening. Nevertheless, your task is to ensure the evil coin is never flipped. This is difficult, so you should be extremely careful when writing code where multiple threads are reading and writing to the same field.
volatile access does establish HB/HA relationships. However, it's hard to know in which direction.
Given this code:
// shared memory:
volatile int x = 0;
/* not volatile */ int y = 0;
// thread A:
y = 10;
x = 20;
// thread B:
if (x == 20 && y != 10) System.out.println("Augh!");
Then you are guaranteed: Augh! can never print. That's because if you read x as 20, the HB/HA relationship guarantees that the line in thread B runs after the second line of A, and everything that A did on that second line or before it is thus guaranteed to be observed. However, you have absolutely no guarantee that x is 20 here. Just, if it is, then the change made to y by thread A is also guaranteed observed, even though y isn't volatile.
However, in this code:
// shared state:
/* both non-volatile */ int x = 0, y = 0;
// thread A:
x = 10;
y = 20;
// thread B:
if (y == 20 && x != 10) System.out.println("AUGH!");
Then the JVM is entirely free and evil coin flips are happening! A JVM is free to print AUGH! every time, or never, or sometimes, or only if the moon is full. There is no HB/HA and therefore the VM makes no guarantees about observability and it doesn't have to make sense. It is acceptable for a JVM implementation to allow thread B to observe the change to y but not the change to x even though a simplified but mistaken notion that threads actually run code sequentially would suggest that this would be impossible. Basically, without HB/HA, all bets are off.
In your snippet, thread A is reading volatile variable s, but that's the only interacting with s available; thread B never touches s.
Therefore, this code is broken and evil coin flips are occurring! - the JVM is free to work as follows: thread 2 sets run to false today. thread 1 nevertheless runs for 3 more days, and then, seemingly at a completely arbitrary time prompted by nothing you're aware of, all of a sudden it finally observes that run is now false.
Or, thread 1 stops virtually immediately after thread 2 sets run to false.
It depends on your OS, the CPU, which song was playing in winamp, the phase of the moon, and whether the butterfly flapped its wings. There is no way to know and no (easy) way to test that you messed up here.

is it guaranteed that threads will resume in the order they were suspended/blocked by wait() in Java?

Suppose my program uses three threads. The first two threads are blocked using wait(), then the third thread comes and resumes both of them. When the third thread frees the two threads, it creates a race condition between those two threads (please correct me if I am wrong). Here is a program I wrote to illustrate this:
class Callee {
static boolean doBlock = true;
void callMe (int index) throws InterruptedException {
//suspend the first two threads
if (index < 3 && doBlock) {
wait();
}
System.out.println("The index is: " + index);
//let the third thread resume both of them
if (index == 3) {
doBlock = false;
notifyAll();
}
}
}
class Caller implements Runnable {
final int threadIndex;
final Thread thread;
final Callee callee;
Caller(int index, Callee c) {
threadIndex = index;
callee = c;
thread = new Thread(this);
thread.start();
}
#Override
public void run() {
synchronized (callee) {
try {
callee.callMe(threadIndex);
} catch (InterruptedException ie) {
ie.printStackTrace();
}
}
}
}
public class App {
public static void main(String[] args) throws InterruptedException {
Callee c = new Callee();
Caller caller1 = new Caller (1, c);
Caller caller2 = new Caller (2, c);
Caller caller3 = new Caller (3, c);
caller1.thread.join();
caller2.thread.join();
caller3.thread.join();
}
}
Each time I run the above program on my Windows machine, I get consistent output:
The index is: 3
The index is: 1
The index is: 2
Note that the first thread was freed before the second thread. Also note that I did not set priorities to any of those threads. I ran it at least ten times but results are not changing. I'm curious if it's my OS or does Java always resume the thread that was blocked first?
Short answer
No, there is no guarantee that they will resume in order so you shouldn't build any logic based on that even if the behavior was validated multiple time by multiple people, but you had a great idea of asking instead of just assuming.
Long answer
What could be happening is that this is how threads behave on Windows specifically; if the threading is fully handled by the SO itself, it might even be a behavior specific to your specific version of Windows. In that case, because Java works on multiple OS and each OS could behave differently, then Java cannot guarantee a behavior across all environments and therefore makes no promises about it.
It could also just be a behavior of that specific version of the JVM and Sun/Oracle never wanted to commit to a specific behavior, this means that even if this behavior was to be constant in the current JVM version, because it was never part of the formal "contract" they could reserve the right to change it at any moment without prior notification.
In either cases, what could happen if you decide to build logic on top of it is that the code simply wouldn't work properly on another OS or, even better, stop working properly after an OS update or JVM update (even minor ones).
An example of that happened in a company I've worked for a few years back; it used to be that Oracle (the RDBMS) used to automatically sort your results by your GROUP BY criteria if you didn't specify any (it was never part of the SQL standard, nor ever specified in any Oracle document, everyone just noticed it worked like that) ... in their infinite wisdom, many people started just skipping the ORDER BY clause if they used group by. Then came a new Oracle version (might have been 9i or 10g) and they just stopped auto-sorting the results which resulted in millions of $$$ being wasted going over ALL applications to inspect the code and then re-doing test (tests were not automated of course).
No it is not guaranteed unless you use a fair lock with a condition rather than a wait/notify.
The lock you obtain by synchronized is not fair.

Why threads do not cache object locally?

I have a String and ThreadPoolExecutor that changes the value of this String. Just check out my sample:
String str_example = "";
ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(10, 30, (long)10, TimeUnit.SECONDS, runnables);
for (int i = 0; i < 80; i++){
poolExecutor.submit(new Runnable() {
#Override
public void run() {
try {
Thread.sleep((long) (Math.random() * 1000));
String temp = str_example + "1";
str_example = temp;
System.out.println(str_example);
} catch (Exception e) {
e.printStackTrace();
}
}
});
}
so after executing this, i get something like that:
1
11
111
1111
11111
.......
So question is: i just expect the result like this if my String object has volatile modifier. But i have the same result with this modifier and without.
There are several reasons why you see "correct" execution.
First, CPU designers do as much as they can so that our programs run correctly even in presence of data races. Cache coherence deals with cache lines and tries to minimize possible conflicts. For example, only one CPU can write to a cache line at some point of time. After write was done other CPUs should request that cache line to be able to write to it. Not to say x86 architecture(most probable which you use) is very strict comparing to others.
Second, your program is slow and threads sleep for some random period of time. So they do almost all the work at different points of time.
How to achieve inconsistent behavior? Try something with for loop without any sleep. In that case field value most probably will be cached in CPU registers and some updates will not be visible.
P.S. Updates of field str_example are not atomic so you program may produce the same string values even in presense of volatile keyword.
When you talk about concepts like thread caching, you're talking about the properties of a hypothetical machine that Java might be implemented on. The logic is something like "Java permits an implementation to cache things, so it requires you to tell it when such things would break your program". That does not mean that any actual machine does anything of the sort. In reality, most machines you are likely to use have completely different kinds of optimizations that don't involve the kind of caches that you're thinking of.
Java requires you to use volatile precisely so that you don't have to worry about what kinds of absurdly complex optimizations the actual machine you're working on might or might not have. And that's a really good thing.
Your code is unlikely to exhibit concurrency bugs because it executes with very low concurrency. You have 10 threads, each of which sleep on average 500 ms before doing a string concatenation. As a rough guess, String concatenation takes about 1ns per character, and because your string is only 80 characters long, this would mean that each thread spends about 80 out of 500000000 ns executing. The chance of two or more threads running at the same time is therefore vanishingly small.
If we change your program so that several threads are running concurrently all the time, we see quite different results:
static String s = "";
public static void main(String[] args) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(5);
for (int i = 0; i < 10_000; i ++) {
executor.submit(() -> {
s += "1";
});
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.MINUTES);
System.out.println(s.length());
}
In the absence of data races, this should print 10000. On my computer, this prints about 4200, meaning over half the updates have been lost in the data race.
What if we declare s volatile? Interestingly, we still get about 4200 as a result, so data races were not prevented. That makes sense, because volatile ensures that writes are visible to other threads, but does not prevent intermediary updates, i.e. what happens is something like:
Thread 1 reads s and starts making a new String
Thread 2 reads s and starts making a new String
Thread 1 stores its result in s
Thread 2 stores its result in s, overwriting the previous result
To prevent this, you can use a plain old synchronized block:
executor.submit(() -> {
synchronized (Test.class) {
s += "1";
}
});
And indeed, this returns 10000, as expected.
It is working because you are using Thread.sleep((long) (Math.random() * 100));So every thread has different sleep time and executing may be one by one as all other thread in sleep mode or completed execution.But though your code is working is not thread safe.Even if you use Volatile also will not make your code thread safe.Volatile only make sure visibility i.e when one thread make some changes other threads are able to see it.
In your case your operation is multi step process reading the variable,updating then writing to memory.So you required locking mechanism to make it thread safe.

Writing a thread safe modular counter in Java

Full disclaimer: this is not really a homework, but I tagged it as such because it is mostly a self-learning exercise rather than actually "for work".
Let's say I want to write a simple thread safe modular counter in Java. That is, if the modulo M is 3, then the counter should cycle through 0, 1, 2, 0, 1, 2, … ad infinitum.
Here's one attempt:
import java.util.concurrent.atomic.AtomicInteger;
public class AtomicModularCounter {
private final AtomicInteger tick = new AtomicInteger();
private final int M;
public AtomicModularCounter(int M) {
this.M = M;
}
public int next() {
return modulo(tick.getAndIncrement(), M);
}
private final static int modulo(int v, int M) {
return ((v % M) + M) % M;
}
}
My analysis (which may be faulty) of this code is that since it uses AtomicInteger, it's quite thread safe even without any explicit synchronized method/block.
Unfortunately the "algorithm" itself doesn't quite "work", because when tick wraps around Integer.MAX_VALUE, next() may return the wrong value depending on the modulo M. That is:
System.out.println(Integer.MAX_VALUE + 1 == Integer.MIN_VALUE); // true
System.out.println(modulo(Integer.MAX_VALUE, 3)); // 1
System.out.println(modulo(Integer.MIN_VALUE, 3)); // 1
That is, two calls to next() will return 1, 1 when the modulo is 3 and tick wraps around.
There may also be an issue with next() getting out-of-order values, e.g.:
Thread1 calls next()
Thread2 calls next()
Thread2 completes tick.getAndIncrement(), returns x
Thread1 completes tick.getAndIncrement(), returns y = x+1 (mod M)
Here, barring the forementioned wrapping problem, x and y are indeed the two correct values to return for these two next() calls, but depending on how the counter behavior is specified, it can be argued that they're out of order. That is, we now have (Thread1, y) and (Thread2, x), but maybe it should really be specified that (Thread1, x) and (Thread2, y) is the "proper" behavior.
So by some definition of the words, AtomicModularCounter is thread-safe, but not actually atomic.
So the questions are:
Is my analysis correct? If not, then please point out any errors.
Is my last statement above using the correct terminology? If not, what is the correct statement?
If the problems mentioned above are real, then how would you fix it?
Can you fix it without using synchronized, by harnessing the atomicity of AtomicInteger?
How would you write it such that tick itself is range-controlled by the modulo and never even gets a chance to wraps over Integer.MAX_VALUE?
We can assume M is at least an order smaller than Integer.MAX_VALUE if necessary
Appendix
Here's a List analogy of the out-of-order "problem".
Thread1 calls add(first)
Thread2 calls add(second)
Now, if we have the list updated succesfully with two elements added, but second comes before first, which is at the end, is that "thread safe"?
If that is "thread safe", then what is it not? That is, if we specify that in the above scenario, first should always come before second, what is that concurrency property called? (I called it "atomicity" but I'm not sure if this is the correct terminology).
For what it's worth, what is the Collections.synchronizedList behavior with regards to this out-of-order aspect?
As far as I can see you just need a variation of the getAndIncrement() method
public final int getAndIncrement(int modulo) {
for (;;) {
int current = atomicInteger.get();
int next = (current + 1) % modulo;
if (atomicInteger.compareAndSet(current, next))
return current;
}
}
I would say that aside from the wrapping, it's fine. When two method calls are effectively simultaneous, you can't guarantee which will happen first.
The code is still atomic, because whichever actually happens first, they can't interfere with each other at all.
Basically if you have code which tries to rely on the order of simultaneous calls, you already have a race condition. Even if in the calling code one thread gets to the start of the next() call before the other, you can imagine it coming to the end of its time-slice before it gets into the next() call - allowing the second thread to get in there.
If the next() call had any other side effect - e.g. it printed out "Starting with thread (thread id)" and then returned the next value, then it wouldn't be atomic; you'd have an observable difference in behaviour. As it is, I think you're fine.
One thing to think about regarding wrapping: you can make the counter last an awful lot longer before wrapping if you use an AtomicLong :)
EDIT: I've just thought of a neat way of avoiding the wrapping problem in all realistic scenarios:
Define some large number M * 100000 (or whatever). This should be chosen to be large enough to not be hit too often (as it will reduce performance) but small enough that you can expect the "fixing" loop below to be effective before too many threads have added to the tick to cause it to wrap.
When you fetch the value with getAndIncrement(), check whether it's greater than this number. If it is, go into a "reduction loop" which would look something like this:
long tmp;
while ((tmp = tick.get()) > SAFETY_VALUE))
{
long newValue = tmp - SAFETY_VALUE;
tick.compareAndSet(tmp, newValue);
}
Basically this says, "We need to get the value back into a safe range, by decrementing some multiple of the modulus" (so that it doesn't change the value mod M). It does this in a tight loop, basically working out what the new value should be, but only making a change if nothing else has changed the value in between.
It could cause a problem in pathological conditions where you had an infinite number of threads trying to increment the value, but I think it would realistically be okay.
Concerning the atomicity problem: I don't believe that it's possible for the Counter itself to provide behaviour to guarantee the semantics you're implying.
I think we have a thread doing some work
A - get some stuff (for example receive a message)
B - prepare to call Counter
C - Enter Counter <=== counter code is now in control
D - Increment
E - return from Counter <==== just about to leave counter's control
F - application continues
The mediation you're looking for concerns the "payload" identity ordering established at A.
For example two threads each read a message - one reads X, one reads Y. You want to ensure that X gets the first counter increment, Y gets the second, even though the two threads are running simultaneously, and may be scheduled arbitarily across 1 or more CPUs.
Hence any ordering must be imposed across all the steps A-F, and enforced by some concurrency countrol outside of the Counter. For example:
pre-A - Get a lock on Counter (or other lock)
A - get some stuff (for example receive a message)
B - prepare to call Counter
C - Enter Counter <=== counter code is now in control
D - Increment
E - return from Counter <==== just about to leave counter's control
F - application continues
post- F - release lock
Now we have a guarantee at the expense of some parallelism; the threads are waiting for each other. When strict ordering is a requirement this does tend to limit concurrency; it's a common problem in messaging systems.
Concerning the List question. Thread-safety should be seen in terms of interface guarantees. There is absolute minimum requriement: the List must be resilient in the face of simultaneous access from several threads. For example, we could imagine an unsafe list that could deadlock or leave the list mis-linked so that any iteration would loop for ever. The next requirement is that we should specify behaviour when two threads access at the same time. There's lots of cases, here's a few
a). Two threads attempt to add
b). One thread adds item with key "X", another attempts to delete the item with key "X"
C). One thread is iterating while a second thread is adding
Providing that the implementation has clearly defined behaviour in each case it's thread-safe. The interesting question is what behaviours are convenient.
We can simply synchronise on the list, and hence easily give well-understood behaviour for a and b. However that comes at a cost in terms of parallelism. And I'm arguing that it had no value to do this, as you still need to synchronise at some higher level to get useful semantics. So I would have an interface spec saying "Adds happen in any order".
As for iteration - that's a hard problem, have a look at what the Java collections promise: not a lot!
This article , which discusses Java collections may be interesting.
Atomic (as I understand) refers to the fact that an intermediate state is not observable from outside. atomicInteger.incrementAndGet() is atomic, while return this.intField++; is not, in the sense that in the former, you can not observe a state in which the integer has been incremented, but has not yet being returned.
As for thread-safety, authors of Java Concurrency in Practice provide one definition in their book:
A class is thread-safe if it behaves
correctly when accessed from multiple
threads, regardless of the scheduling
or interleaving of the execution of
those threads by the runtime
environment, and with no additional
synchronization or other coordination
on the part of the calling code.
(My personal opinion follows)
Now, if we have the list
updated succesfully with two elements
added, but second comes before first,
which is at the end, is that "thread
safe"?
If thread1 entered the entry set of the mutex object (In case of Collections.synchronizedList() the list itself) before thread2, it is guaranteed that first is positioned ahead than second in the list after the update. This is because the synchronized keyword uses fair lock. Whoever sits ahead of the queue gets to do stuff first. Fair locks can be quite expensive and you can also have unfair locks in java (through the use of java.util.concurrent utilities). If you'd do that, then there is no such guarantee.
However, the java platform is not a real time computing platform, so you can't predict how long a piece of code requires to run. Which means, if you want first ahead of second, you need to ensure this explicitly in java. It is impossible to ensure this through "controlling the timing" of the call.
Now, what is thread safe or unsafe here? I think this simply depends on what needs to be done. If you just need to avoid the list being corrupted and it doesn't matter if first is first or second is first in the list, for the application to run correctly, then just avoiding the corruption is enough to establish thread-safety. If it doesn't, it is not.
So, I think thread-safety can not be defined in the absence of the particular functionality we are trying to achieve.
The famous String.hashCode() doesn't use any particular "synchronization mechanism" provided in java, but it is still thread safe because one can safely use it in their own app. without worrying about synchronization etc.
Famous String.hashCode() trick:
int hash = 0;
int hashCode(){
int hash = this.hash;
if(hash==0){
hash = this.hash = calcHash();
}
return hash;
}

Categories