Java: Allocation of new objects and cache coherency

Java: Allocation of new objects and cache coherency - java

In Java, suppose you have two threads T1 and T2 running simultaneously on two different processors P1 and P2.
At first, thread T2 works with some object obj that is allocated at (say) starting memory location 0x1000. This causes P2 to internally cache the value at that memory location. T2 then nulls out the (only) reference to the object and it is garbage collected.
Thread T1 then does
Foo fooRef = new Foo();
fooRef.x = 10;
and it just happens that fooRef.x's location is also at 0x1000, because this instance of Foo was allocated re-using memory that was freed by T2 above.
T1 then passes the fooRef reference to thread T2 (via a queue, or some other shared memory mechanism).
Will T2 see the old stale cached value from before, or will it see the new value of 10?
Let's say there is no hardware cache coherency mechanism. Does Java itself ensure the clearing of every processors' cache when it deallocates or allocates memory for an object? (Even with a hardware cache coherency mechanism in place, the coherency propagation is not instantaneous, and T2 might still happen to read the stale value, if no other coherency measures by Java itself are taken).

If you don't properly synchronise, then T2 could in principle see one of three things (not necessarily with equal probability):
(a) an apparently correctly formed object, but containing incorrect data;
(b) an object that isn't properly formed in the first place (i.e. never mind your data, the actual housekeeping metadata belonging to the object is not properly visible, potentially causing "bad things to happen");
(c) accidentally, you "dodge the bullet" as it were and T2 sees the object as T1 left it.
If you properly synchronise (or put another way, properly publish the object) then T2 will see the object as T1 defined it. In this article on the final keyword and further articles linked to at the bottom, I discuss some of the issues and solutions. Some of this answers to this previous question on What is object publishing and why do we need it? may also help.
So, practically[*] all of the time, you need to properly synchronise. It is dangerous to try and guess which of the situations (a), (b) or (c) will occur if you don't properly synchronise.
[*] There are very occasional advanced techniques where synchronisation can be safely avoided if you can genuinely calculate all of the possible "paths" resulting from lack of synchronisation, such as a technique referred to as synchronisation piggybacking where you effectively know that synchronisation will be performed 'in time' somewhere else. I recommend you don't go down this route!

You will not see "junk" left over from the first object.
Each primitive in the object will contain either its initial value (0, false, etc) or some value that had been put there at some point -- though reordering may produce weird mixes of values. Additionally, if a primitive is a two-word value (long or double), you may see only one of those words updated: this could produce a value that no thread has ever put there, but it's consistent with the above in that you are seeing the effects of a write to this object -- you're just not seeing all of that write. But you're still not seeing the effects of a write on some totally other, random object.
For reference values, you'll either see the initial value (null) or a correct reference to a constructed object -- though that object's values are subject to the same vague rules as above (they can be either the initial value or any other value some other thread has put in, with reorderings etc allowed).
Now, I can't actually find the exact place in the JLS where this is written. But there are several parts that strongly imply it. For instance, JLS 17.4.5 states in an example:
Since there is no synchronization, each read can see either the write of the initial value or the write by the other thread.
Emphasis mine, but note that it lists the values that the read can see; it doesn't say "each read can see anything, including junk bytes left over from previous objects."
Also, in 17.4.8, another example states:
Since the reads come first in each thread, the very first action in the execution order must be a read. If that read cannot see a write that occurs later, then it cannot see any value other than the initial value for the variable it reads.
(Emphasis mine again). Note that this, though it's in an example and not in the "main" body, explicitly says that junk reads as you describe is not allowed.
And then, JLS 17.7 is all about the non-atomicity of 64 bit primitives (the long and double values I mentioned above). Again, if there were absolutely no guarantees about the bytes you see, then it wouldn't be meaningful to note that you can see one word from one write and another word from another write. In other words, the fact that the JLS says that you can see "broken" values that arise from only one word being updated, is a strong suggestion that you can't see "broken" values that arise from just complete left-over junk.

Java has no access to the underlying hardware cache, so it does not "ensure the clearing of every processsor's cache".
Most modern, real CPUs provide for cache coherency. Some real CPUs require a memory barrier under some circumstances. Your hypothetical CPU without a hardware mechanism will likely suffer from a stale cache under the conditions described.

As long as the accesses to fooRef and fooRef.x are properly synchronized, thread T2 will see the latest value of fooRef.x, i.e., 10.

Related

Does 'volatile' guarantee that any thread reads the most recently written value?

From the book Effective Java:
While the volatile modifier performs no mutual exclusion, it guarantees that any thread that reads the field will see the most recently written value
SO and many other sources claim similar things.
Is this true?
I mean really true, not a close-enough model, or true only on x86, or only in Oracle JVMs, or some definition of "most recently written" that's not the standard English interpretation...
Other sources (SO example) have said that volatile in Java is like acquire/release semantics in C++. Which I think do not offer the guarantee from the quote.
I found that in the JLS 17.4.4 it says "A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)." But I don't quite understand.
There are quite some sources for and against this, so I'm hoping the answer is able to convince that many of those (on either side) are indeed wrong - for example reference or spec, or counter-example code.

Is this true?
I mean really true, not a close-enough model, or true only on x86, or only in Oracle JVMs, or some definition of "most recently written" that's not the standard English interpretation...
Yes, at least in the sense that a correct implementation of Java gives you this guarantee.
Unless you are using some exotic, experimental Java compiler/JVM (*), you can essentially take this as true.
From JLS 17.4.5:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
(*) As Stephen C points out, such an exotic implementation that doesn't implement the memory model semantics described in the language spec can't usefully (or even legally) be described as "Java".

The quote per-se is correct in terms of what is tries to prove, but it is incorrect on a broader view.
It tries to make a distinction of sequential consistency and release/acquire semantics, at least in my understanding. The difference is rather "thin" between these two terms, but very important. I have tried to simplify the difference at the beginning of this answer or here.
The author is trying to say that volatile offers that sequential consistency, as implied by that:
"... it guarantees that any thread.."
If you look at the JLS, it has this sentence:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
The tricky part there is that subsequent and it's meaning, and it has been discussed here. What is really wants to mean is "subsequent that observes that write". So happens-before is guaranteed when the reader observes the value that the writer has written.
This already implies that a write is not necessarily seen on the next read, and this can be the case where speculative execution is allowed. So in this regard, the quote is miss-leading.
The quote that you found:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)
is a complicated to understand without a much broader context. In simple words, it established synchronizes-with order (and implicitly happens-before) between two threads, where volatile v variables is a shared variable. here is an answer where this has broader explanation and thus should make more sense.

It is not true. JMM is based on sequential consistency and for sequential consistency real time ordering isn't guaranteed; for that you need linearizability. In other words, reads and writes can be skewed as long as the program order isn't violated (or as long is it can't be proven po was violated).
A read of volatile variable a, needs to see the most recent written value before it in the memory order. But that doesn't imply real time ordering.
Good read about the topic:
https://concurrency-interest.altair.cs.oswego.narkive.com/G8KjyUtg/relativity-of-guarantees-provided-by-volatile.
I'll make it concrete:
Imagine there are 2 CPU's and (volatile) variable A with initial value 0. CPU1 does a store A=1 and CPU2 does a load of A. And both CPUs have the cacheline containing A in SHARED state.
The store is first speculatively executed and written to the store buffer; eventually the store commits and retires, but since the stored value is still in the store buffer; it isn't visible yet to the CPU2. Till so far it wasn't required for the cacheline to be in an EXCLUSIVE/MODIFIED state, so the cacheline on CPU2 still contains the old value and hence CPU2 can still read the old value.
So in the real time order, the write of A is ordered before the read of A=0, but in the synchronization order, the write of A=1 is ordered after the read of A=0.
Only when the store leaves the store buffer and wants to enter the L1 cache, the request for ownership (RFO) is send to all other CPU's which set the cacheline containing A to INVALID on CPU2 (RFO prefetching I'll leave out of the discussion). If CPU2 would now read A, it is guaranteed to see A=1 (the request will block till CPU1 has completed the store to the L1 cache).
On acknowledgement of the RFO the cacheline is set to MODIFIED on CPU1 and the store is written to the L1 cache.
So there is a period of time between when the store is executed/retired and when it is visible to another CPU. But the only way to determine this is if you would add special measuring equipment to the CPUs.
I believe a similar delaying effect can happen on the reading side with invalidation queues.
In practice this will not be an issue because store buffers have a limited capacity and need to be drained eventually (so a write can't be invisible indefinitely). So in day to day usage you could say that a volatile read, reads the most recent write.
A java volatile write/read provides release/acquire semantics, but keep in mind that the volatile write/read is stronger than release/acquire semantics. A volatile write/read is sequential consistent and release/acquire semantics isn't.

VarHandle get/setOpaque

I keep fighting to understand what VarHandle::setOpaque and VarHandle::getOpaque are really doing. It has not been easy so far - there are some things I think I get (but will not present them in the question itself, not to muddy the waters), but overall this is miss-leading at best for me.
The documentation:
Returns the value of a variable, accessed in program order...
Well in my understanding if I have:
int xx = x; // read x
int yy = y; // read y
These reads can be re-ordered. On the other had if I have:
// simplified code, does not compile, but reads happen on the same "this" for example
int xx = VarHandle_X.getOpaque(x);
int yy = VarHandle_Y.getOpaque(y);
This time re-orderings are not possible? And this is what it means "program order"? Are we talking about insertions of barriers here for this re-ordering to be prohibited? If so, since these are two loads, would the same be achieved? via:
int xx = x;
VarHandle.loadLoadFence()
int yy = y;
But it gets a lot trickier:
... but with no assurance of memory ordering effects with respect to other threads.
I could not come up with an example to even pretend I understand this part.
It seems to me that this documentation is targeted at people who know exactly what they are doing (and I am definitely not one)... So can someone shed some light here?

Well in my understanding if I have:
int xx = x; // read x
int yy = y; // read y
These reads can be re-ordered.
These reads may not only happen to be reordered, they may not happen at all. The thread may use an old, previously read value for x and/or y or values it did previously write to these variables whereas, in fact, the write may not have been performed yet, so the “reading thread” may use values, no other thread may know of and are not in the heap memory at that time (and probably never will).
On the other had if I have:
// simplified code, does not compile, but reads happen on the same "this" for example
int xx = VarHandle_X.getOpaque(x);
int yy = VarHandle_Y.getOpaque(y);
This time re-orderings are not possible? And this is what it means "program order"?
Simply said, the main feature of opaque reads and writes, is, that they will actually happen. This implies that they can not be reordered in respect to other memory access of at least the same strength, but that has no impact for ordinary reads and writes.
The term program order is defined by the JLS:
… the program order of t is a total order that reflects the order in which these actions would be performed according to the intra-thread semantics of t.
That’s the evaluation order specified for expressions and statements. The order in which we perceive the effects, as long as only a single thread is involved.
Are we talking about insertions of barriers here for this re-ordering to be prohibited?
No, there is no barrier involved, which might be the intention behind the phrase “…but with no assurance of memory ordering effects with respect to other threads”.
Perhaps, we could say that opaque access works a bit like volatile was before Java 5, enforcing read access to see the most recent heap memory value (which makes only sense if the writing end also uses opaque or an even stronger mode), but with no effect on other reads or writes.
So what can you do with it?
A typical use case would be a cancellation or interruption flag that is not supposed to establish a happens-before relationship. Often, the stopped background task has no interest in perceiving actions made by the stopping task prior to signalling, but will just end its own activity. So writing and reading the flag with opaque mode would be sufficient to ensure that the signal is eventually noticed (unlike the normal access mode), but without any additional negative impact on the performance.
Likewise, a background task could write progress updates, like a percentage number, which the reporting (UI) thread is supposed to notice timely, while no happens-before relationship is required before the publication of the final result.
It’s also useful if you just want atomic access for long and double, without any other impact.
Since truly immutable objects using final fields are immune to data races, you can use opaque modes for timely publishing immutable objects, without the broader effect of release/acquire mode publishing.
A special case would be periodically checking a status for an expected value update and once available, querying the value with a stronger mode (or executing the matching fence instruction explicitly). In principle, a happens-before relationship can only be established between the write and its subsequent read anyway, but since optimizers usually don’t have the horizon to identify such a inter-thread use case, performance critical code can use opaque access to optimize such scenario.

The opaque means that the thread executing opaque operation is guaranteed to observe its own actions in program order, but that's it.
Other threads are free to observe the threads actions in any order. On x86 it is a common case since it has
write ordered with store-buffer forwarding
memory model so even if the thread does store before load. The store can be cached in the store buffer and some thread being executed on any other core observes the thread action in reverse order load-store instead of store-load. So opaque operation is done on x86 for free (on x86 we actually also have acquire for free, see this extremely exhaustive answer for details on some other architectures and their memory models: https://stackoverflow.com/a/55741922/8990329)
Why is it useful? Well, I could speculate that if some thread observed a value stored with opaque memory semantic then subsequent read will observe "at least this or later" value (plain memory access does not provide such guarantees, does it?).
Also since Java 9 VarHandles are somewhat related to acquire/release/consume semantic in C I think it is worth noting that opaque access is similar to memory_order_relaxed which is defined in the Standard as follows:
For memory_order_relaxed, no operation orders memory.
with some examples provided.

I have been struggling with opaque myself and the documentation is certainly not easy to understand.
From the above link:
Opaque operations are bitwise atomic and coherently ordered.
The bitwise atomic part is obvious. Coherently ordered means that loads/stores to a single address have some total order, each reach sees the most recent address before it and the order is consistent with the program order. For some coherence examples, see the following JCStress test.
Coherence doesn't provide any ordering guarantees between loads/stores to different addresses so it doesn't need to provide any fences so that loads/stores to different addresses are ordered.
With opaque, the compiler will emit the loads/stores as it sees them. But the underlying hardware is still allowed to reorder load/stores to different addresses.
I upgraded your example to the message-passing litmus test:
thread1:
X.setOpaque(1);
Y.setOpaque(1);
thread2:
ry = Y.getOpaque();
rx = X.getOpaque();
if (ry == 1 && rx == 0) println("Oh shit");
The above could fail on a platform that would allow for the 2 stores to be reordered or the 2 loads (again ARM or PowerPC). Opaque is not required to provide causality. JCStress has a good example for that as well.
Also, the following IRIW example can fail:
thread1:
X.setOpaque(1);
thread2:
Y.setOpaque(1);
thread3:
rx_thread3 = X.getOpaque();
[LoadLoad]
ry_thread3 = Y.getOpaque();
thread4:
ry_thread4 = Y.getOpaque();
[LoadLoad]
rx_thread4 = X.getOpaque();
Can it be that we end up with rx_thread3=1,ry_thread3=0,ry_thread4=1 and rx_thread4 is 0?
With opaque this can happen. Even though the loads are prevented from being reordered, opaque accesses do not require multi-copy-atomicity (stores to different addresses issued by different CPUs can be seen in different orders).
Release/acquire is stronger than opaque, since with release/acquire it is allowed to fail, therefor with opaque, it is allowed to fail. So Opaque is not required to provide consensus.

Does ConcurrentMap.remove() provide a happens-before edge before get() returns null?

Are actions in a thread prior to calling ConcurrentMap.remove() guaranteed to happen-before actions subsequent to seeing the removal from another thread?
Documentation says this regarding objects placed into the collection:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
Example code:
{
final ConcurrentMap map = new ConcurrentHashMap();
map.put(1, new Object());
final int[] value = { 0 };
new Thread(() -> {
value[0]++;
value[0]++;
value[0]++;
value[0]++;
value[0]++;
map.remove(1); // A
}).start();
new Thread(() -> {
if (map.get(1) == null) { // B
System.out.println(value[0]); // expect 5
}
}).start();
}
Is A in a happens-before relationship with B? Therefore, should the program only, if ever, print 5?

You have found an interesting subtle aspect of these concurrency tools that is easy to overlook.
First, it’s impossible to provide a general guaranty regarding removal and the retrieval of a null reference, as the latter only proves the absence of a mapping but not a previous removal, i.e. the thread could have read the map’s initial state, before the key ever had a mapping, which, of course, can’t establish a happens-before relationship with the actions that happened after the map’s construction.
Also, if there are multiple threads removing the same key, you can’t assume a happens-before relationship, when retrieving null, as you don’t know which removal has been completed. This issue is similar to the scenario when two threads insert the same value, but the latter can be fixed on the application side by only perform insertions of distinguishable values or by following the usual pattern of performing the desired modifications on the value object which is going to be inserted and to query the retrieved object only. For a removal, there is no such fix.
In your special case, there’s a happens-before relationship between the map.put(1, new Object()) action and the start of the second thread, so if the second thread encounters null when querying the key 1, it’s clear that it witnessed the sole removal of your code, still, the specification didn’t bother to provide an explicit guaranty for this special case.
Instead, the specification of Java 8’s ConcurrentHashMap says,
Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
clearly ruling out null retrievals.
I think, with the current (Java 8) ConcurrentHashMap implementation, your code can’t break as it is rather conservative in that it performs all access to its internal backing array with volatile semantics. But that is only the current implementation and, as explained above, your code is a special case and likely to become broken with every change towards a real-life application.

No, you have the order wrong.
There is a happens-before edge from the put() to the subsequent get(). That edge is not symmetric, and doesn't work in the other direction. There is no happens-before edge from at get() to another get() or a remove(), or from a put() to another put().
In this case, you put an object in the map. Then you modify another object. That's a no-no. There's no edge from the those writes to the get() in the second thread, so those writes may not be visible to the second thread.
On Intel hardware, I think this will always work. However, it isn't guaranteed by the Java memory model, so you have to be wary if you ever port this code to different hardware.

A does not need to happen before B.
Only the original put happens before both. Thus a null at B means that A happened.
However write back of thread local memory cache and instruction order of ++ and remove are not mentioned. volatile is not used; instead a Map and an array are used to hopefully keep thread data synchrone. On writing the data back, in-order relation should hold again.
To my understanding A could remove and be written back, then the last ++ happen, and something like 4 being printed at B. I would add volatile to the array. The Map itself will go fine.
I am far from certain, but as I did not see a corresponding answer, I stick my neck out. (To learn myself.)

As ConcurrentHashMap is a thread safe collection, the statement map.remove(1) must have a read barrier and a write barrier if it alters the map. The expression map.get(1) must have a read barrier or one, or both of those operations are not thread safe.
In reality ConcurrentHashMap up to Java 7, uses partitioned locks, so it always has a read/write barrier for nearly every operation.
A ConcurrentSkipListMap doesn't have to use locks, but to perform any thread safe write action, a write barrier is required.
This means your test should always act as expected.

Why or when would a Map.get(..) need synchronization?

This is a code snippet from collections's SynchronizedMap. My question is not specific to the code snippet below - but a generic one: Why does a get operation need synchronization?
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}

If your threads are only ever getting from the Map, the synchronization is not needed. In this case it might be a good idea to express this fact by using an immutable map, like the one from the Guava libraries, this protects you at compile time from accidentally modifying the map anyway.
The trouble begins when multiple threads are reading and modifying the map, because the internal structure of, e.g. the HashMap implementation from the Java standard libraries is not prepared for that. In this case you can either wrap an external serialization layer around that map, like
using the synchronized keyword,
slightly safer would be to use a SynchronizedMap, because then you can't forget the synchonized keyword everywhere it's needed,
protect the map using a ReadWriteLock, which would allow multiple concurrently reading threads (which is fine)
switch to an ConcurrentHashMap altogether, which is prepared for being accessed by multiple threads.
But coming back to you original question, why is the synchronization needed in the first place: This is a bit hard to tell without looking at the code of the class. Possibly it would break when the put or remove from one thread causes the bucket count to change, which would cause a reading thread to see too many / too few elements because the resize is not finished yet. Maybe something completely different, I don't know and it's not really important because the exact reason(s) why it is unsafe can change at any time with a new Java release. The important fact is only that it is not supported and your code will likely blow up one or another way at runtime.

If the table gets resized in the middle of the call to get(), it could potentially look in the wrong bucket and return null incorrectly.
Consider the steps that happen in m.get():
A hash is calculated for the key.
The current length of the table (the buckets in the HashMap) is read.
This length is used to calculate the correct bucket to get from the table.
The bucket is retrieved and the entries in the bucket are walked until a match is found or until the end of the bucket is reached.
If another thread changes the map and causes the table to be resized in between 2 & 3, the wrong bucket could be used to look for the entry, potentially giving an incorrect result.

The reason why synchronization is needed in a concurrent environment is, that java operations aren't atomic. This means that a single java operation like counter++ causes the underlaying VM to execute more than one machine operation.
Read value
Increment value
Write value
While those three operations are performed, another thread called T2 may be invoked and read the old value e.g 10 of that variable. T1 increments that value und writes the value 11 back. But T2 has read value 10! In cas that T2 should also increment this value, the result stays the same, namely 11 instead of 12.
The synchronisation will avoid such concurrent errors.
T1:
Set synchronizer token
Read value
Another thread T2 was invoked and tries to read the value. But since the synchronizer token was already set, T2 has to wait.
Increment value
Write value
Remove synchronizer token
T2:
Set synchronizer token
Read value
Increment value
Write value
Remove synchronizer token

By synchronising the get method you are forcing the thread to cross the memory barrier and read the value from the main memory. If you wouldn't synchronise the get method then the JVM takes liberties to apply underlying optimisations that might result in that thread reading blissfully unaware a stale value stored in registers and caches.

Volatility of objects other than class variables

I used to believe that any variable that is shared between two threads, can be cached thread-locally and should be declared as volatile. But that belief has been challenged recently by a teammate. We are trying to figure out whether volatile is required in the following case or not.
class Class1
{
void Method1()
{
Worker worker = new Worker();
worker.start();
...
System.out.println(worker.value); // want to poll value at this instant
...
}
class Worker extends Thread
{
int value = 0; // Should this be declared as a volatile?
public void run()
{
...
value = 1; // this is the only piece of code that updates value
...
}
}
}
Now my contention is that, it is possible that the Worker (child) thread could have cached the variable "value" of the Worker object within the thread and updated just it's copy while setting the value to 1. In such a case, main thread may not see the updated value.
But my teammate believes that since the access to "value" is happening through an object (worker), therefore for both the threads to see different values, it could only be possible if both the threads were maintaining separate copies of "worker" object itself (which would further mean that creation of a thread involves creating a deep copy of all the shared objects).
Now I know that that can't be true, for it would be highly inefficient for each thread to maintain wholly different copies of all shared objects. So hence, I am in serious doubt. Does doing a "worker.value" in the main thread reference a different memory location than doing a "this.value" in the child thread? Will the child (Worker) thread cache "value"?
Regards.

Now my contention is that, it is possible that the Worker (child) thread could have cached the variable "value" of the Worker object thread-locally and updated just it's copy while setting the value to 1. In such a case, main thread may not see the updated value.
You are correct. Even though you are both dealing with the same Worker instance, there is no guarantee that the cached memory version of the Worker's fields have been synchronized between the various different thread memory caches.
The value field must be marked as volatile to guarantee that other threads will see the value = 1; update to the value field.
But my teammate believes that since the access to "value" is happening through an object (worker), therefore for both the threads to see different values, it could only be possible if both the threads were maintaining separate copies of "worker" object itself...
No, this is not correct. The tricky part about thread memory revolves around processor memory caches. Without a memory barrier that is imposed by volatile, a process is completely free to cache memory. So even though both threads would be working with the same instance of the Worker, they may have a locally cached copy of the memory associated with Worker.
Thread architectures get much of their speed because they are working with separate high-speed processor-local memory as opposed to always referencing central storage.

But my teammate believes that since the access to "value" is happening through an object (worker), therefore for both the threads to see different values, it could only be possible if both the threads were maintaining separate copies of "worker" object itself (which would further mean that creation of a thread involves creating a deep copy of all the shared objects).
What your coworker does not realize is that values of instance variables (any variables for that matter) can be cached temporarily in machine registers, or in the processor's first or second-level memory caches. The Java Language Specification explicitly says that two threads won't necessarily see the same values for the same variable unless they have taken the appropriate steps.
There is a whole section of the JLS that deals with this issue: JLS 17.4. I recommend that both you and your co-worker read this and 17.5 and 17.6 as well if you are going to debate how Java behaves in this area. Or you could read the last chapter of "Java Concurrency in Practice" by Brian Goetz et al which is rather more easy to read than the JLS.
I'd recommend that you and your co-worker don't rely on your intuition about threading ought to work. Read the specs. Some aspects of thread behavior are not intuitive ... though there are good reasons way they are the way they are,

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.