Trying to understand shared variables in java threads

Trying to understand shared variables in java threads - java

I have the following code :
class thread_creation extends Thread{
int t;
thread_creation(int x){
t=x;
}
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t++;
System.out.println(t);
}
}
}
public class test {
public static void main(String[] args) {
int i =0;
thread_creation t1 = new thread_creation(i);
thread_creation t2 = new thread_creation(i);
t1.start();
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
t2.start();
}
}
When I run it , I get :
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Why I am getting this output ? According to my understanding , the variable i is a shared variable between the two threads created. So according to the code , the first thread will execute and increments i 10 times , and hence , i will be equal to 10 . The second thread will start after the first one because of the sleep statement and since i is shared , then the second thread will start will i=10 and will start incrementing it 10 times to have i = 20 , but this is not the case in the output , so why that ?

You seem to think that int t; in thread_creation is a shared variable. I'm afraid you are mistaken. Each t instance is a different variable. So the two threads are updating distinct counters.
The output you are seeing reflects that.
This is the nub of your question:
How do I pass a shared variable then ?
Actually, you can't1. Strictly a shared variable is actually a variable belonging to a shared object. You cannot pass a variable per se. Java does not allow passing of variables. This is what "Java does not support call-by-reference" really means. You can't pass or return a variable or the address of a variable in any method call. (Or in any other way.)
In Java you pass and return values: either primitives, or references to objects. The values may read from a variable by the call's parameter expression or assigned to a variable after the call's return. But you are not passing the variable. A variable and its value / contents are different things.
So the only way to implement a shared counter is to implement it as a shared counter object.
Note that "variable" and "object" mean different things, both in Java and in other programming languages. You should NOT use the two terms interchangeable. For example, when I declare this in Java:
String s = "Hello";
the s variable is not a String object. It is a variable that contains a reference to the String object. Other variables may contain references to the same String object as well. The distinction is even more stark when the objects are mutable. (String is not mutable ... in Java.)
Here are the two (IMO) best ways to implement a shared counter object.
You could create a custom Java Counter class with a count variable, a get method, and methods for incrementing, decrementing the counter. The class needs to implement various methods as thread-safe and atomic; e.g. by using synchronized methods or blocks2.
You could just use an AtomicInteger instance. That takes care of atomicity and thread-safety ... to the extent that it is possible with this kind of API.
The latter approach is simpler and likely more efficient ... unless you need to do something special each time the counter changes.
(It is conceivable that you could implement a shared counter other ways, but that is too much detail for this answer.)
1 - I realize that I just said the same thing more than 3 times. But as the Bellman says in "The Hunting of the Snark": "What I tell you three times is true."
2 - If the counter is not implemented using synchronized or an equivalent mutual exclusion mechanism with the appropriate happens before semantics, you are liable to see Heisenbugs; e.g. race conditions and memory visibility problems.

Two crucial things you're missing. Both individually explain this behaviour - you can 'fix' either one and you'll still see this, you'd have to fix both to see 1-20:
Java is pass-by-value
When you pass i, you pass a copy of it. In fact, in java, all parameters to methods are always copies. Hence, when the thread does t++, it has absolutely no effect whatsoever on your i. You can trivially test this, and you don't need to mess with threads to see it:
public static void main(String[] args) {
int i = 0;
add5(i);
System.out.println(i); // prints 0!!
}
static void add5(int i) {
i = i + 5;
}
Note that all non-primitives are references. That means: A copy of the reference is passed. It's like passing the address of a house and not the house itself. If I have an address book, and I hand you a scanned copy of a page that contains the address to my summer home, you can still drive over there and toss a brick through the window, and I'll 'see' that when I go follow my copy of the address. So, when you pass e.g. a list and the method you passed the list to runs list.add("foo"), you DO see that. You may think: AHA! That means java does not pass a copy, it passed the real list! Not so. Java passed a copy of a street address (A reference). The method I handed that copy to decided to drive over there and act - that you can see.
In other words, =, ++, that sort of thing? That is done to the copy. . is java for 'drive to the address and enter the house'. Anything you 'do' with . is visible to the caller, = and ++ and such are not.
Fixing the code to avoid the pass-by-value problem
Change your code to:
class thread_creation extends Thread {
static int t; // now its global!
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t++;
// System.out.println(t);
}
}
}
public class test {
public static void main(String[] args) throws Exception {
thread_creation t1 = new thread_creation();
thread_creation t2 = new thread_creation();
t1.start();
Thread.sleep(500);
t2.start();
Thread.sleep(500);
System.out.println(thread_creation.t);
}
}
Note that I remarked out the print line. I did that intentionally - see below. If you run the above code, you'd think you see 20, but depending on your hardware, the OS, the song playing on your mp3 playing app, which websites you have open, and the phase of the moon, it may be less than 20. So what's going on there? Enter the...
The evil coin.
The relevant spec here is the JMM (The Java Memory Model). This spec explains precisely what a JVM must do, and therefore, what a JVM is free not to do, especially when it comes to how memory is actually managed.
The crucial aspect is the following:
Any effects (updates to fields, such as that t field) may or may not be observable, JVM's choice. There's no guarantee that anything you do is visible to anything else... unless there exists a Happens-Before/Happens-After relationship: Any 2 statements with such a relationship have the property that the JVM guarantees that you cannot observe the lack of the update done by the HB line from the HA line.
HB/HA can be established in various ways:
The 'natural' way: Anything that is 'before' something else _and runs in the same thread has an HB/HA relationship. In other words, if you do in one thread x++; System.out.println(x); then you can't observe that the x++ hasn't happened yet. It's stated like this so that if you're not observing, you get no guarantees, which gives the JVM the freedom to optimize. For example, Given x++;y++; and that's all you do, the JVM is free to re-order that and increment y before x. Or not. There are no guarantees, a JVM can do whatever it wants.
synchronized. The moment of 'exiting' a synchronized (x) {} block has HB to the HA of another thread 'entering' the top of any synchronized block on the same object, if it enters later.
volatile - but note that with volatile it's basically impossible which one came first. But one of them did, and any interaction with a volatile field is HB relative to another thread accessing the same field later.
thread starting. thread.start() is HB relative to the first line of the run() of that thread.
thread yielding. thread.yield() is HA relative to the last line of the thread.
There are a few more exotic ways to establish HB/HA but that's pretty much it.
Crucially, in your code there is no HB/HA between any of the statements that modify or print t!
In other words, the JVM is free to run it all in such a way that the effects of various t++ statements run by one thread aren't observed by another thread.
What the.. WHY????
Because of efficiency. Your memory banks on your CPU are, relative to how fast CPUs are, oceans away from the CPU core. Fetching or writing to core memory from a CPU takes an incredibly long time - your CPU is twiddling its thumbs for a very long time while it waits for the memory controller to get the job done. It could be running hundreds of instructions in that time.
So, CPU cores do not write to memory AT ALL. Instead they work with caches: They have an on-core cache page, and the only interaction with your main memory banks (which are shared by CPU cores) is 'load in an entire cache page' and 'write an entire cache page'. That cache page is then effectively a 'local copy' that only that core can see and interact with (but can do so very very quickly, as that IS very close to the core, unlike the main memory banks), and then once the algorithm is done it can flush that page back to main memory.
The JVM needs to be free to use this. Had the JVM actually worked like you want (that anything any thread does is instantly observable by all others), then anything that any line does must first wait 500 cycles to load the relevant page, then wait another 500 cycles to write it back. All java apps would literally be 1000x slower than they could be.
This in passing also explains that actual synchronizing is really slow. Nothing java can do about that, it is a fundamental limitation of our modern multi-core CPUs.
So, evil coin?
Note that the JVM does not guarantee that the CPU must neccessarily work with this cache stuff, nor does it make any promises about when cache pages are flushed. It merely limits the guarantees so that JVMs can be efficiently written on CPUs that work like that.
That means that any read or write to any field any java code ever does can best be thought of as follows:
The JVM first flips a coin. On heads, it uses a local cached copy. On tails, it copies over the value from some other thread's cached copy instead.
The coin is evil: It is not reliably a 50/50 arrangement. It is entirely plausible that throughout developing a feature and testing it, the coin lands tails every time it is flipped. It remains flipping tails 100% of the time for the first week that you deployed it. And then just when that big potential customer comes in and you're demoing your app, the coin, being an evil, evil coin, starts flipping heads a few times and breaking your app.
The correct conclusion is that the coin will mess with you and that you cannot unit test against it. The only way to win the game is to ensure that the coin is never flipped.
You do this by never touching a field from multiple threads unless it is constant (final, or simply never changes), or if all access to it (both reads and writes) has clearly established HB/HA between all threads.
This is hard to do. That's why the vast majority of apps don't do it at all. Instead, they:
Talk between threads using a database, which has vastly more advanced synchronization primitives: Transactions.
Talk using a message bus such as RabbitMQ or similar.
Use stuff from the java.util.concurrent package such as a Latch, ForkJoin, ConcurrentMap, or AtomicInteger. These are easier to use (specifically: It is a lot harder to write code for these abstractions that is buggy but where the bug cannot be observed or tested for on the machine of the developer that wrote it, it'll only blow up much later in production. But not impossible, of course).
Let's fix it!
volatile doesn't 'fix' ++. x++; is 'read x, increment by 1, write result to x' and volatile doesn't make that atomic, so we cannot use this. We can either replace t++ with:
synchronized(thread_creation.class) {
t++;
}
Which works fine but is really slow (and you shouldn't lock on publicly visible stuff if you can help it, so make a custom object to lock on, but you get the gist hopefully), or, better, dig into that j.u.c package for something that seems useful. And so there is! AtomicInteger!
class thread_creation extends Thread {
static AtomicInteger t = new AtomicInteger();
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t.incrementAndGet();
}
}
}
public class test {
public static void main(String[] args) throws Exception {
thread_creation t1 = new thread_creation();
thread_creation t2 = new thread_creation();
t1.start();
Thread.sleep(500);
t2.start();
Thread.sleep(500);
System.out.println(thread_creation.t.get());
}
}
That code will print 20. Every time (unless those threads take longer than 500msec which technically could be, but is rather unlikely of course).
Why did you remark out the print statement?
That HB/HA stuff can sneak up on you: When you call code you did not write, such as System.out.println, who knows what kind of HB/HA relationships are in that code? Javadoc isn't that kind of specific, they won't tell you. Turns out that on most OSes and JVM implementations, interaction with standard out, such as System.out.println, causes synchronization; either the JVM does it, or the OS does. Thus, introducing print statements 'to test stuff' doesn't work - that makes it impossible to observe the race conditions your code does have. Similarly, involving debuggers is a great way to make that coin really go evil on you and flip juuust so that you can't tell your code is buggy.
That is why I remarked it out, because with it in, I bet on almost all hardware you end up seeing 20 eventhough the JVM doesn't guarantee it and that first version is broken. Even if on your particular machine, on this day, with this phase of the moon, it seems to reliably print 20 every single time you run it.

Related

When not to use volatile，it still can see the changes which issued by other thread

public class VisibleDemo {
private boolean flag;
public VisibleDemo setFlag(boolean flag) {
this.flag = flag;
return this;
}
public static void main(String[] args) throws InterruptedException {
VisibleDemo t = new VisibleDemo();
new Thread(()->{
long l = System.currentTimeMillis();
while (true) {
if (System.currentTimeMillis() - l > 600) {
break;
}
}
t.setFlag(true);
}).start();
new Thread(()->{
long l = System.currentTimeMillis();
while (true) {
if (System.currentTimeMillis() - l > 500) {
break;
}
}
while (!t.flag) {
// if (System.currentTimeMillis() - l > 598) {
//
// }
}
System.out.println("end");
}).start();
}
}
if it does not have the following codes, it will not show "end".
if (System.currentTimeMillis() - l > 598) {
}
if it has these codes, it will probably show "end". Sometimes it does not show.
when is less than 598 or not have these codes, like use 550, it will not show "end".
when is 598, it will probably show "end"
when is greater than 598, it will show "end" every time
notes:
598 is on my computer, May be your computer is another number.
the flag is not with volatile, why can know the newest value.
First: I want to know Why?
Second: I need help,
I want to know the scenarios: when the worker cache of jvm thread will refresh to/from main memory.
OS: windows 10
java: jdk8u231

Your code is suffering from a data-race and that is why it is behaving unreliably.
The JMM is defined in terms of the happens-before relation. So if you have 2 actions A and B, and A happens-before B, then B should see A and everything before A. It is very important to understand that happens-before doesn't imply happening-before (so ordering based on physical time) and vice versa.
The 'flag' field is accessed concurrently; one thread is reading it while another thread is writing it. In JMM terms this is called conflicting access.
Conflicting accesses are fine as long as it is done using some form of synchronization because the synchronization will induce happens-before edges. But since the 'flag' accesses are plain loads/stores, there is no synchronization, and as a consequence, there will not be a happens-before edge to order the load and the store. A conflicting access, that isn't ordered by a happens-before edge, is called a data-race and that is the problem you are suffering from.
When there is a data-race; funny things can happen but it will not lead to undefined behavior like is possible under C++ (undefined behavior can effectively lead to any possible outcome including crashes and super weird behavior). So load still needs to see a value that is written and can't see a value coming out of thin air.
If we look at your code:
while (!t.flag) {
...
}
Because the flag field isn't updated within the loop and is just a plain load, the compiler is allowed to optimize this code to:
if(!t.flag){
while(true){...}
}
This particular optimization is called loop hoisting (or loop invariant code motion).
So this explains why the loop doesn't need to complete.
Why does it complete when you access the System.currentTimeMillis? Because you got lucky; apparently this prevents the JIT from applying the above optimization. But keep in mind that System.currentTimeMillis doesn't have any formal synchronization semantics and therefore doesn't induce happens-before edges.
How to fix your code?
The simplest way to fix your code would be to make 'flag' volatile or access both the read/write from a synchronized block. If you want to go really hardcore: use VarHandle get/set opaque. Officially it is still a data-race because opaque doesn't indice happens-before edges, but it will prevent the compiler to optimize out the load/store. It is a benign data race. The primary advantage is slightly better performance because it doesn't prevent the reordering of surrounding loads/stores.
I want to know the scenarios: when the worker cache of jvm thread will refresh to/from main memory.
This is a fallacy. Caches on modern CPUs are always coherent; this is taken care of by the cache coherence protocol like MESI. Writing to main memory for every volatile read/write would be extremely slow. For more information see the following excellent post. If you want to know more about cache coherence and memory ordering, please check this excellent book which you can download for free.

I want to know the scenarios: when the worker cache of jvm thread will refresh to/from main memory.
When Taylor Swift is playing on your music player, it'll be 598, unless it's tuesday, then it'll be 599.
No, really. It's that arbitrary. The JVM spec gives the JVM the right to come up with any old number for any reason if your code isn't properly guarded.
The problem is JVM diversity. There is a crazy combinatorial explosion:
There are about 8 OSes give or take.
There are like 20 different 'chip lines', with different pipelining behaviour.
These chips can be in various mitigating modes to mitigate against attacks like Spectre. Let's call it 3.
There are about 8 different major JVM vendors.
These come in ~10 or so different versions (java 8, java 9, java 10, java 11, etc).
That gives us about 384000 different combinations.
The point of the JMM (Java Memory Model) is to remove the handcuffs from a JVM implementation. A JVM implementation is looking for this optimal case:
It wants the freedom to use the various tricks that CPUs use to run code as fast as possible. For example, it wants the freedom to be capable of 're-ordering' (given a(); b(), to run b() first, and a() later. Which is okay, if a and b are utterly independent and are not in any way looking at each others modifications). The reason it wants to do this is because CPUs are pipelines: Even processing a single instruction is in fact a chain of many separate steps, and the 'parse the instruction' step can get cracking on parsing another instruction the very moment it is done, even if that instruction is still being processed by the rest of the pipe. In fact, the CPU could have 4 separate 'instruction parser units' and they can be parsing 4 instructions in parallel. This is NOT the kind of parallelism that multiple cores do: This is a single core that will parse 4 consecutive instructions in parallel because parsing instructions is slightly slower than running them. For example. But that's just intel chips of the Z-whatever line. That's the point. If the memory model of the java specification indicates that a JVM simply can't use this stuff then that would mean JVMs on that particular intel chip run slow as molasses. We don't want that.
Nevertheless, the memory model rules can't be so preferential to giving the JVM the right to re-order and do all sorts of crazy things that it becomes impossible to write reliable code for JVMs. Imagine the java lang spec says that the JVM can re-order any 2 instructions in one method at any time even if these 2 instructions are touching the same field. That'd be great for JVM engineers, they can go nuts with optimizing code on the fly to re-order it optimally. But it would impossible to write java code.
So, a balance has been struck. This balance takes the following form:
The JMM gives you specific rules - these rules take the form of: "If you do X, then the JVM guarantees Y".
But that is all. In particular, there is nothing written about what happens if you do not do X. All you know is, that then Y is not guaranteed. But 'not guaranteed' does not mean: Will definitely NOT happen.
Here is an example:
class Data {
static int a = 0;
static int b = 0;
}
class Thread1 extends Thread {
public void run() {
Data.a = 5;
Data.b = 10;
}
}
class Thread2 extends Thread {
public void run() {
int a = Data.a;
int b = Data.b;
System.out.println(a);
System.out.println(b);
}
}
class Main {
public static void main(String[] args) {
new Thread1().start();
new Thread2().start();
}
}
This code:
Makes 2 fields, which start out at 0 and 0.
Runs one thread that first sets a to 5 and then sets b to 10.
Starts a second thread that reads these 2 fields into local vars and then prints these.
The JVM spec says that it is valid for a JVM to:
Print 0/0
Print 5/0
Print 0/10
Print 5/10
But it would not be legal for a JVM to e.g. print '20/20', or '10/5'.
Let's zoom in on the 0/10 case because that is utterly bizarre - how could a JVM possibly do that? Well, reordering!
WILL a JVM print 0/10? On some combinations of JVM vender and version+Architecture+OS+phase of the moon, YES IT WILL. On most, no it won't. Ever. Still, imagine you wrote this code, you rely on 0/10 NEVER occurring, and you test the heck out of your code, and you verify that indeed, even running the test a million times, it never happens. You ship it to the production server and it runs fine for a week and then just as you are giving the demo to the really important potential customer, all heck breaks loose: Your app is broken, as from time to time the 0/10 case does occur.
You file a bug with your JVM vendor. And they close it as 'intended behaviour - wontfix'. That will really happen, because that really is the intended behaviour. _If you write code that relies on a thing being true that is NOT guaranteed by the JMM, then YOU wrote a bug, even if on your particular hardware on this particular day it is completely impossible for you to make this bug occur right now.
This means one simple and very nasty conclusion is the only correct one: You cannot test this stuff.
So, if you adhere to the rule that if there are no tests then you can't know if you code works, guess what? You cannot ever know if your code is fine. Ever.
That then leads to the conclusion that you don't want to write any such code.
This sounds crazy (how can you simply not ever, ever write anything multicore?) but it's not as nuts as you think. This only comes up if 2 threads are dependent on ordering relative to each other for some in-process action. For example, if two threads are both accessing the same field of the same instance. Simply... don't do that.
It's easier than you think: If all 'communication' between threads goes via the database and you know how to use transactions in databases, voila. Or you use a message bus service like RabbitMQ.
If for some job you really must write multithread code where the threads interact with each other, don't shoot the messenger: It is NOT POSSIBLE to test that you did it right. So write it very carefully.
A second conclusion is that the JMM doesn't explain how things work or what happens. It merely says: IF you follow these rules, I guarantee you that THIS will happen. If you don't follow these rules, anything can happen. A JVM is free to do all sorts of crazy shenanigans, and this documentation nor any other documentation will ever enumerate all the crazy things that could happen. After all, there are at least 38400 different combinations and it's crazy to attempt to document all 38400!
So, what are the core rules?
The core rules are so-called happens-before relationships. The basic rule is simply this:
There are various ways to establish H-B relationships. Such a relationship is always between 2 lines of code. 2 lines of code might be unrelated, H-B wise. Or, the rules state that line A 'happens-before' line B.
If and only if the rules state this, then it will be impossible to observe a state of the universe (the values of all fields of all instances in the entire JVM) at line B as it was before line A ran.
That's it. For example, if line A 'happens before' line B, but line B does not attempt to witness any field change A made, then the JVM is still free to reorder and have B run before A. The point is that this shouldn't matter - you're not observing, so why does it matter?
We can 'fix' our weird 0/0/5/10 issue by setting up H-B: If the 'grab the static field values and save them to local a/b vars' code happens-after thread1's setting of it, then we can be sure that the code will always print 5/10 and the JMM guarantees means a JVM that doesn't print that is broken.
H-B are also transitive (if HB(A, B) is true, and HB(B, C) is true, then HB(A, C) is also true).
How do you set up HB?
If line B would run after line A as per the usual understanding of how things run, and both are being run by the same thread, HB(A, B). This is obvious: If you just write x(); y();, then y cannot observe state as it was before x ran.
HB(thread.start(), X) where X is the very first line in the started thread.
HB(EndS, StartS), where EndS is the exiting of a synchronized block on object ref Z, and StartS is another thread entering a synchronized block (on ref Z as well) later.
HB(V, V) where V is 'accessing volatile variable Z', but it is hard to know which way the HB goes with volatiles.
There are a few more exotic ways. There's also a separate HB relationship for constructors and final variables that they initialize, but generally this one is real easy to understand (once a constructor returns, whatever final fields it initialized are definitely set and cannot be observed to not be set, even if otherwise no actual HB relationship has been established. This applies only to final fields).
This explains why you observe weird values. This also explains why your question of 'I want to know when a JVM thread will refresh to/from main memory' is not answerable: Because the java memory model spec and the java virtual machine spec intentionally and specifically make no promises on how that works. One JVM can work one way, another JVM can do it completely differently.
The reason I started off making a seeming joke about playing Taylor Swift is: A CPU has cores, and the cores are limited. A modern computer, especially a desktop, is doing thousands of things at once, and will therefore be rotating apps through cores all the time. Whether a field update is 'flushed out' to main memory (NOTE: THAT IS DANGEROUS THINKING - THE DOCS DO NOT ACTUALLY ENFORCE THAT JVMS CAN BE UNDERSTOOD IN THOSE TERMS!) might depend on whether it gets rotated out of a core or not. And that in turn might depend on your music player dealing with a particular compressed music file that takes a few more cores to decompress the next block so that it can be queued up in the audio buffer.
Hence, and this is no joke, the song you are playing on your music player can in fact change the number you get. Hence, why you have to give up: You CANNOT enumerate 'if my computer is in this state, then this code will always produce Y number'. There are billions of states you'd have to enumerate. Impossible.

Thread safety in java multithreading

I found code about thread safety but it doesn't have any explanation from the person who gave the example. I would like to understand why if I don't set the "synchronized" variable before "count" that the count value will be non-atomic ( always =200 is the desired result). Thanks
public class Example {
private static int count = 0;
public static void main(String[] args) {
for (int i = 0; i < 2; i++) {
new Thread(new Runnable() {
#Override
public void run() {
try {
Thread.sleep(10);
} catch (Exception e) {
e.printStackTrace();
}
for (int i = 0; i < 100; i++) {
//add synchronized
synchronized (Example.class){
count++;
}
}
}).start();
}
try{
Thread.sleep(2000);
}catch (Exception e){
e.printStackTrace();
}
System.out.println(count);
}
}

++ is not atomic
The count++ operation is not atomic. That means it is not a single solitary operation. The ++ is actually three operations: load, increment, store.
First the value stored in the variable is loaded (copied) into a register in the CPU core.
Second, that value in the core’s register is incremented.
Third and last, the new incremented value is written (copied) from the core’s register back to the variable’s content in memory. The core’s register is then free to be assigned other values for other work.
It is entirely possible for two or more threads to read the same value for the variable, say 42. Each of those threads would then proceed to increment the value to the same new value 43. They would then each write back 43 to that same variable, unwittingly storing 43 again and again repeatedly.
Adding synchronized eliminates this race condition. When the first thread gets the lock, the second and third threads must wait. So the first thread is guaranteed to be able to read, increment, and write the new value alone, going from 42 to 43. Once completed, the method exits, thereby releasing the lock. The second thread vying for the lock gets the go-ahead, acquiring the lock, and is able to read, increment, and write the new value 44 without interference. And so on, thread-safe.
Another problem: Visibility
However, this code is still broken.
This code has a visibility problem, with various threads possibly reading stale values kept in caches. But that is another topic. Search to learn more about volatile keyword, the AtomicInteger class, and the Java Memory Model.

I would like to understand why if I don't set the "synchronized" variable before "count" that the count value will be non-atomic.
The short answer: Because the JLS says so!
If you don't use synchronized (or volatile or something similar) then the Java Language Specification (JLS) does not guarantee that the main thread will see the values written to count by the child thread.
This is specified in great detail in the Java Memory Model section of the JLS. But the specification is very technical.
The simplified version is that a read of a variable is not guaranteed to see the value written by a preceding write if there is not a happens before (HB) relationship connecting the write and the read. Then there are a bunch of rules that say when an HB relationship exists. One of the rules is that there is an HB between on thread releasing a mutex and a different thread acquiring it.
An alternative intuitive (but incomplete and technically inaccurate) explanation is that the latest value of count may be cached in a register or a chipset's memory caches. The synchronized construct flushes values to be memory.
The reason that is an inaccurate explanation is that JLS doesn't say anything about registers, caches and so on. Rather, the memory visibility guarantees that the JLS specifies are typically implemented by a Java compiler inserting instructions to write registers to memory, flush caches, or whatever is required by the hardware platform.
The other thing to note is that this is not really about count++ being atomic or not1. It is about whether the result of a change to count is visible to a different thread.
1 - It isn't atomic, but you would get the same effect for an atomic operation like a simple assignment!

Let's get back to the basics with a Wall Street example.
Let's say, You (Lets call T1 ) and your friend (Lets call T2) decided to meet at a coffee house on Wall Street. You both started at same time, let's say from southern end of the Wall Street (Though you are not walking together). You are waking on one side of footpath and your friend is walking on other side of the footpath on Wall Street and you both going towards North (Direction is same).
Now, let's say you came in front of a coffee house and you thought this is the coffee house you and your friend decided to meet, so you stepped inside the coffee house, ordered a cold coffee and started sipping it while waiting.
But, On other side of the road, similar incident happened, your friend came across a coffee shop and ordered a hot chocolate and was waiting for you.
After a while, you both decided the other one is not going to come dropped the plan for meeting.
You both missed your destination and time. Why was this happened? Don't have to mention but, Because you did not decided the exact venue.
The code
synchronized(Example.class){
counter++;
}
solves the problem that you and your friend just encountered.
In technical terms the operation counter++ is actually conducted in three steps;
Step 1: Read the value of counter (lets say 1)
Step 2: Add 1 in to the value of counter variable.
Step 3: Write the value of the variable counter back to memory.
If two threads are working simultaneously on counter variable, final value of the counter will be uncertain. For example, Thread1 could read the value of the counter as 1, at the same time thread2 could read the value of variable as 1. The both threads endup incrementing the value of counter to 2. This is called race condition.
To avoid this issue, the operation counter++ has to be atomic. To make it atomic you need to synchronize execution of the thread. Each thread should modify the counter in organized manner.
I suggest you to read book Java Concurrency In Practice, every developer should read this book.

How reordering of instructions can cause concurrency issue

I was reading about JMM (Java Memory Model) and I could understand that how flushing of the cache variables can cause other threads to have dirty reads. It was also mentioned that re-ordering of instructions can cause concurrency issues, even though I understood what is meant by re-ordering of the instructions I wouldn't understand how it can cause concurrency issues.
For example, suppose thread t1 has acquired lock while starting test1(), now even if compiler has done some optimization and there is some re-ordering because of which z = 4; has gone either up or down, now since t2 wouldn't get the lock for test2() until there t1 has released, so how re-ordering in test1() (and even in test2()) could cause concurrency issues/bugs?
public class Testing {
private int z = 2;
public synchronized void test1(){
//some statement..
z = 4;
//some statement..
}
public synchronized void test2(){
//some statement..
System.out.println(z);
//some statement..
}
}
I understood that after proper synchronization re-ordering wouldn't cause the problem, but without synchronization even if compiler doesn't optimize and re-order still there are chances of concurrency issues, right? To be clear I was referring this link, I couldn't understand their point about concurrency issues after re-ordering, because like I said if there is no synchronization then concurrency issues can still arise even without any re-ordering.
EDIT: Please discard my code snippet because after looking at comments it doesn't hold good now, and my updated question is as above.

You won't see the problem of reordering with a single variable. But take two and ...
int foo = 0;
boolean isFooSet = false;
...
// thread 1
foo = 42;
isFooSet = true;
...
// thread 2
while (!isFooSet) {/*waste some time*/} // we wait until the flag is set in the other thread
System.out.println(42/foo); //we can actually divide by zero here
So while thread 1 sees foo set before isFooSet, thread 2 can see them the other way around, which makes the flag isFooSet useless.
Note that without reordering this code would be perfectly safe (from dividing by zero, that is), as you can see if for example isFooSet is declared as volatile, preventing moving the write to foo after the write to isFooSet. It also solves the other, non-reordering related problem of visibility, but that's a different story

When reordering the JVM takes into account the happens-before relationships and does not make any reorderings that would be invalid for those relationships. Reorderings are a concern when you have a data race, see the book Java Concurrency in Practice, 16.1.3
A data race occurs when a variable is read by more than one thread, but the reads and writes are not ordered by happens-before. A correctly synchronized program is one with no data races; correctly synchronized programs exhibit sequential consistency, meaning that all actions within the program appear to happen in a fixed, global order.

I am not going to read the link that is pages and pages so please forgive me. But I think I understand the gist of your question. And I do remember that this link is chapter 2 or 3 of JCIP.
EDIT 1: Answering the second question: "no concurrency, no reordering":
One more thing I would like to add (to the excellent set of answers here) is that you are assigning to an int so any assignment is atomic. Now imagine if it is a double or an object assignment. Without proper concurrency (and if there are no ordering as you take as a prerequisite) there are issues of "that object not being constructed properly" in test1 and used in test2.
For example:
SomeObject z = new SomeObject(yyy);
public void test1() {
z = new SomeObject(xxx);
}
public void test2() {
System.out.print(z);
}
Therefore my recommendation is to read the first 3 chapters of JCIP to get an idea of the Java Memory Model and these concerns.

Do I need to add some locks or synchronization if there is only one thread writing and several threads reading?

Say I have a global object:
class Global {
public static int remoteNumber = 0;
}
There is a thread runs periodically to get new number from remote, and updates it (only write):
new Thread {
#override
public void run() {
while(true) {
int newNumber = getFromRemote();
Global.remoteNumber = newNumber;
Thread.sleep(1000);
}
}
}
And there are one or more threads using this global remoteNumber randomly (only read):
int n = Global.remoteNumber;
doSomethingWith(n);
You can see I don't use any locks or synchronize to protected it, is it correct? Is there any potential issue that might cause problems?
Update:
In my case, it's not really important that the reading threads must get the latest new value in realtime. I mean, if there is any issue (caused of lacking lock/synchronization) make one reading thread missed that value, it doesn't matter, because it will have chance to run the same code soon (maybe in a loop)
But reading a undetermined value is not allowed (I mean, if the old value is 20, the new updated value is 30, but the reading threads reads a non-existent value say 33, I'm not sure if it's possible)

You need synchronization here (with one caveat, which I'll discuss later).
The main problem is that the reader threads may never see any of the updates the writer thread makes. Usually any given write will be seen eventually. But here your update loop is so simple that a write could easily be held in cache and never make it out to main memory. So you really must synchronize here.
EDIT 11/2017 I'm going to update this and say that it's probably not realistic that a value could be held in cache for so long. I think it's a issue though that a variable access like this could be optimized by the compiler and held in a register though. So synchronization is still needed (or volatile) to tell the optimizer to be sure to actually fetch a new value for each loop.
So you either need to use volatile, or you need to use a (static) getter and setter methods, and you need to use the synchronized keyword on both methods. For an occasional write like this, the volatile keyword is much lighter weight.
The caveat is if you truly don't need to see timely updates from the write thread, you don't have to synchronize. If a indefinite delay won't affect your program functionality, you could skip the synchronization. But something like this on a timer doesn't look like a good use case for omitting synchronization.
EDIT: Per Brian Goetz in Java Concurrency in Practice, it is not allowed for Java/a JVM to show you "indeterminate" values -- values that were never written. Those are more technically called "out of thin air" values and they are disallowed by the Java spec. You are guaranteed to see some write that was previously made to your global variable, either the zero it was initialized with, or some subsequent write, but no other values are permitted.

Read threads can read old value for undetermined time, but in practice there no problem. Its because each thread has own copy of this variable. Sometimes they sync. You can use volatile keyword to remove this optimisation:
public static volatile int remoteNumber = 0;

Why does marking a Java variable volatile make things less synchronized?

So I just learned about the volatile keyword while writing some examples for a section that I am TAing tomorrow. I wrote a quick program to demonstrate that the ++ and -- operations are not atomic.
public class Q3 {
private static int count = 0;
private static class Worker1 implements Runnable{
public void run(){
for(int i = 0; i < 10000; i++)
count++; //Inner class maintains an implicit reference to its parent
}
}
private static class Worker2 implements Runnable{
public void run(){
for(int i = 0; i < 10000; i++)
count--; //Inner class maintains an implicit reference to its parent
}
}
public static void main(String[] args) throws InterruptedException {
while(true){
Thread T1 = new Thread(new Worker1());
Thread T2 = new Thread(new Worker2());
T1.start();
T2.start();
T1.join();
T2.join();
System.out.println(count);
count = 0;
Thread.sleep(500);
}
}
}
As expected the output of this program is generally along the lines of:
-1521
-39
0
0
0
0
0
0
However, when I change:
private static int count = 0;
to
private static volatile int count = 0;
my output changes to:
0
3077
1
-3365
-1
-2
2144
3
0
-1
1
-2
6
1
1
I've read When exactly do you use the volatile keyword in Java? so I feel like I've got a basic understanding of what the keyword does (maintain synchronization across cached copies of a variable in different threads but is not read-update-write safe). I understand that this code is, of course, not thread safe. It is specifically not thread-safe to act as an example to my students. However, I am curious as to why adding the volatile keyword makes the output not as "stable" as when the keyword is not present.

Why does marking a Java variable volatile make things less synchronized?
The question "why does the code run worse" with the volatile keyword is not a valid question. It is behaving differently because of the different memory model that is used for volatile fields. The fact that your program's output tended towards 0 without the keyword cannot be relied upon and if you moved to a different architecture with differing CPU threading or number of CPUs, vastly different results would not be uncommon.
Also, it is important to remember that although x++ seems atomic, it is actually a read/modify/write operation. If you run your test program on a number of different architectures, you will find different results because how the JVM implements volatile is very hardware dependent. Accessing volatile fields can also be significantly slower than accessing cached fields -- sometimes by 1 or 2 orders of magnitude which will change the timing of your program.
Use of the volatile keyword does erect a memory barrier for the specific field and (as of Java 5) this memory barrier is extended to all other shared variables. This means that the value of the variables will be copied in/out of central storage when accessed. However, there are subtle differences between volatile and the synchronized keyword in Java. For example, there is no locking happening with volatile so if multiple threads are updating a volatile variable, race conditions will exist around non-atomic operations. That's why we use AtomicInteger and friends which take care of increment functions appropriately without synchronization.
Here's some good reading on the subject:
Java theory and practice: Managing volatility
The volatile keyword in Java
Hope this helps.

An educated guess at what you're seeing - when not marked as volatile the JIT compiler is using the x86 inc/dec operations which can update the variable atomically. Once marked volatile these operations are no longer used and the variable is instead read, incremented/decremented, and then finally written causing more "errors".
The non-volatile setup has no guarantees it'll function well though - on a different architecture it could be worse than when marked volatile. Marking the field volatile does not begin to solve any of the race issues present here.
One solution would be to use the AtomicInteger class, which does allow atomic increments/decrements.

Volatile variables act as if each interaction is enclosed in a synchronized block. As you mentioned, increment and decrement is not atomic, meaning each increment and decrement contains two synchronized regions (the read and the write). I suspect that the addition of these pseudolocks is increasing the chance that the operations conflict.
In general the two threads would have a random offset from another, meaning that the likelihood of either one overwriting the other is even. But the synchronization imposed by volatile may be forcing them to be in inverse-lockstep, which, if they mesh together the wrong way, increases the chance of a missed increment or decrement. Further, once they get in this lockstep, the synchronization makes it less likely that they will break out of it, increasing the deviation.

I stumbled upon this question and after playing with the code for a little bit found a very simple answer.
After initial warm up and optimizations (the first 2 numbers before the zeros) when the JVM is working at full speed T1 simply starts and finishes before T2 even starts, so count is going all the way up to 10000 and then to 0.
When I changed the number of iterations in the worker threads from 10000 to 100000000 the output is very unstable and different every time.
The reason for the unstable output when adding volatile is that it makes the code much slower and even with 10000 iterations T2 has enough time to start and interfere with T1.

The reason for all those zeroes is not that the ++'s and --'s are balancing each other out. The reason is that there is nothing here to cause count in the looping threads to affect count in the main thread. You need synch blocks or a volatile count (a "memory barrier) to force the JVM to make everything see the same value. With your particular JVM/hardware, what is most likely happening that the value is kept in a register at all times and never getting to cache--let alone main memory--at all.
In the second case you are doing what you intended: non-atomic increments and decrements on the same course and getting results something like what you expected.
This is an ancient question, but something needed to be said about each thread keeping it's own, independent copy of the data.

If you see a value of count that is not a multiple of 10000, it just shows that you have a poor optimiser.

It doesn't 'make things less synchronized'. It makes them more synchronized, in that threads will always 'see' an up to date value for the variable. This requires erection of memory barriers, which have a time cost.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.