volatile hashmap with intermittent addition of new entries

volatile hashmap with intermittent addition of new entries - java

I have a use case where I will have a Hashmap which starts out as empty. As the application runs, cache will get filled. Multiple threads will access the entries from the cache concurrently. The entries accessed by the threads will not be modified. These are read-only copies.
But the requirement is that if any particular thread does not find the copy of the object it is looking for in the cache, it will create the object and will add it to the cache. Once that copy is available, it does not have to be created again.
The reason I am thinking of using Volatile Hashmap is, they enforce happens-before semantics, hence if the map gets a new entry, all threads will be able to see it. Since the threads won't modify the entries in cache ever, I am hesitant to use ConcurrentHashMap. Is my understanding correct?

No, this would not work how you're expecting. I see this particular view of volatile frequently enough that it is worth examining.
Member references and the types that inhabit them that bear the volatile keyword do not get bestowed any special properties with respect to concurrency outside of two actions:
volatile read (eg. accessing the field)
volatile write (eg. assignment to the field)
That's it. And these special actions (17.4.2) only apply to actions on the member field itself, and not to any methods that may be called from the stored object.
For example:
private volatile List<Foo> foos = null;
private void assign() {
foos = new ArrayList<>(); // This is a volatile write
// This mutation is not handled any differently than any other list.
// It is not special simply because the list referenced happens
// to be assigned to a volatile. If another thread accessed the
// field 'foos' it may see an inconsistent state (null, empty list,
// or possible worse) because this is not thread-safe.
foos.add(new Foo());
}
Think of the above mutation instead like this:
List<Foo> local = foos; // This is a volatile read, and is specially handled
local.add(new Foo()); // There is no special handling of this mutation
So then, what's the point of volatile? Well, as previously stated the only two actions that are treated differently are assigning to a volatile field and accessing a volatile field. A volatile write creates what's called a 'happens-before' relationship with any reads (accesses) of that field in other threads. In short if thread A executes a volatile write to a field, then Thread B accesses that field it will see it either in it's previous state, or the new state. There will be no inbetween states that Thread B could see the object in, compared to the above example where another thread could see foos in an inconsistent state. You can use this to your advantage:
private volatile List<Bar> bars = null;
private void assign() {
List<Bar> local = new ArrayList<>(); // local copy
local.add(new Bar());
bars = local; // Volatile write
}
See the difference here is we created an object locally, fully initalized it, and then assigned it to the volatile member. Now, any thread that accesses the field 'bars' will see either null (the previous state) or a fully constructed list with one element. This of course only holds true as long as you don't try to mutate the list in place, and also importantly as long as the list doesn't mutate itself when you call accessor methods.
Also, just use a ConcurrentHashMap.

Related

How to create thread safe object array in Java?

I've searched for this question and I only found answer for primitive type arrays.
Let's say I have a class called MyClass and I want to have an array of its objects in my another class.
class AnotherClass {
[modifiers(?)] MyClass myObjects;
void initFunction( ... ) {
// some code
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
return myObjects[index];
}
}
I read somewhere that declaring an array volatile does not give volatile access to its fields, but giving a new value of the array is safe.
So, if I understand it well, if I give my array a volatile modifier in my example code, it would be (kinda?) safe. In case of I never change its values by the [] operator.
Or am I wrong? And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
AtomicXYZArray is not an option because it is only good for a primitive type arrays. AtomicIntegerArray uses native code for get() and set(), so it didn't help me.
Edit 1:
Collections.synchronizedList(...) can be a good alternative I think, but now I'm looking for arrays.
Edit 2: initFunction() is called from a different class.
AtomicReferenceArray seems to be a good answer. I didn't know about it, up to now. (I'm still interested in that my example code would work with volatile modifier (before the array) with only this two function called from somewhere else.)
This is my first question. I hope I managed to reach the formal requirements. Thanks.

Yes you are correct when you say that the volatile word will not fulfill your case, as it will protect the reference to the array and not its elements.
If you want both, Collections.synchronizedList(...) or synchronized collections is the easiest way to go.
Using modifiers like you are inclining to do is not the way to do this, as you will not affect the elements.
If you really, must, use and array like this one: new MyClass[]{ ... };
Then AnotherClass is the one that needs to take responsibility for its safety, you are probably looking for lower level synchronization here: synchronized key word and locks.
The synchonized key word is the easier and yuo may create blocks and method that lock in a object, or in the class instance by default.
In higher levels you can use Streams to perform a job for you. But in the end, I would suggest you use a synchronized version of an arraylist if you are already using arrays. and a volatile reference to it, if necessary. If you do not update the reference to your array after your class is created, you don't need volatile and you better make it final, if possible.

For your data to be thread-safe you want to ensure that there are no simultaneous:
write/write operations
read/write operations
by threads to the same object. This is known as the readers/writers problem. Note that it is perfectly fine for two threads to simultaneously read data at the same time from the same object.
You can enforce the above properties to a satisfiable level in normal circumstances by using the synchronized modifier (which acts as a lock on objects) and atomic constructs (which performs operations "instantaneously") in methods and for members. This essentially ensures that no two threads can access the same resource at the same time in a way that would lead to bad interleaving.
if I give my array a volatile modifier in my example code, it would be (kinda?) safe.
The volatile keyword will place the array reference in main memory and ensure that no thread can cache a local copy of it within their private memory, which helps with thread visibility although it won't guarantee thread safety by itself. Also the use of volatile should be used sparsely unless by experienced programmers as it may cause unintended effects on the program.
And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
Create synchronized mutator methods for the mutable members of your class if they need to be changed or use the methods provided by atomic objects within your classes. This would be the simplest approach to changing your data without causing any unintended side-effects (for example, removing the object from the array whilst a thread is accessing the data in the object being removed).

Volatile does actually work in this case with one caveat: all the operations on MyClass may only read values.
Compared to all what you might read about what volatile does, it has one purpose in the JMM: creating a happens-before relationship. It only affects two kinds of operations:
volatile read (eg. accessing the field)
volatile write (eg. assignment to the field)
That's it. A happens-before relationship, straight from the JLS §17.4.5:
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
These relationships are transitive. Taken all together this implies some important points: All actions taken on a single thread happened-before that thread's volatile write to that field (third point above). A volatile write of a field happens-before a read of that field (point two). So any other thread that reads the volatile field would see all the updates, including all referred to objects like array elements in this case, as visible (first point). Importantly, they are only guaranteed to see the updates visible when the field was written. This means that if you fully construct an object, and then assign it to a volatile field and then never mutate it or any of the objects it refers to, it will be never be in an inconsistent state. This is safe taken with the caveat above:
class AnotherClass {
private volatile MyClass[] myObjects = null;
void initFunction( ... ) {
// Using a volatile write with a fully constructed object.
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
return null; // or something else
}
else {
// should probably check length too
return local[index];
}
}
}
I'm assuming you're only calling initFunction once. Even if you did call it more than once you would just clobber the values there, it wouldn't ever be in an inconsistent state.
You're also correct that updating this structure is not quite straightforward because you aren't allowed to mutate the array. Copy and replace, as you stated is common. Assuming that only one thread will be updating the values you can simply grab a reference to the current array, copy the values into a new array, and then re-assign the newly constructed value back to the volatile reference. Example:
private void add(MyClass newClass) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
// volatile write
myObjects = new MyClass[] { newClass };
}
else {
MyClass[] withUpdates = new MyClass[local.length + 1];
// System.arrayCopy
withUpdates[local.length] = newClass;
// volatile write
myObjects = withUpdates;
}
}
If you're going to have more than one thread updating then you're going to run into issues where you lose additions to the array as two threads could copy and old array, create a new array with their new element and then the last write would win. In that case you need to either use more synchronization or AtomicReferenceFieldUpdater

Safe publication of immutable objects in Java

I want to understand if volatile is needed to publish immutable objects.
For example, assuming we have an immutable object A:
// class A is immutable
class A {
final int field1;
final int field2;
public A(int f1, int f2) {
field1 = f1;
field2 = f2;
}
}
Then we have a class B that is accessed from different threads. It holds a reference to an object of class A:
// class B publishes object of class A through a public filed
class B {
private /* volatile? */ A toShare;
// this getter might be called from different threads
public A getA(){
return toShare;
}
// this might be called from different threads
public void setA(num1, num2) {
toShare = new A(num1, num2);
}
}
From my reading it seems immutable objects can be safely published through any means, so does that mean we don't need to declare toShare as volatile to ensure its memory visibility?

No, you are not guaranteed that you'll be seeing all updates to the toShare field of your shared data. This is because your shared data does not use any synchronization constructs that guarantee its visibility or the visibility of references reachable through it across threads. This makes it open game for numerous optimizations on the compiler and hardware level.
You can safely change your toShare field to reference a String (which is also immutable for all your purposes) and you'll probably (and correctly) feel more uneasy about its update visibility.
Here you can see a rudimentary example I've created that can show how updates are lost without any additional measures to publish changes to the reference of an immutable object. I've ran it using the -server JVM flag on JDK 8u65 and Intel® Core™ i5-2557M, disregarding the possibly thrown NullPointerException and saw the following results:
Without safe being volatile, the second thread doesn't terminate because it doesn't see many of the changes made by the first thread
Console output:
[T1] Shared data visible here is 2147483647
When safe is changed to be volatile, the second thread terminates alongside the first thread
Console output:
[T1] Shared data visible here is 2147483647
[T2] Last read value here is 2147483646
[T2] Shared data visible here is 2147483647
P.S. And a question to you - what happens if sharedData (and not safe) is made volatile? What could happen according to the JMM?

Answer is NO, it is needed to use volatile or any other way (for example, add synchronized keyword to both signatures get and set) to make a Happens/Before edge. Final fields semantic only guarantees that if someone sees a pointer to an instance of the class, all final fields have their values set according to constructor when it is finished:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.5
And this says nothing about visibility of the reference itself. Since your example uses non-final field
private A toShare;
you have to take care about visibility of the field with volatile or synchronized section or a java.util.concurrent.locks.Locks or AtomicReference etc. to initiate/guarantee cache synchronization. Some useful stuff, BTW, about finals and safe publication http://shipilev.net/blog/2014/safe-public-construction/
http://shipilev.net/blog/2014/all-fields-are-final/

It seems like JMM should take care of the visibility problem for publishing immutable objects, at least that what's said in Concurrency in Practice, 3.5.2 Immutable Objects and Initialization Safely:
Because immutable objects are so important, the JavaMemory Model offers a special guarantee of initialization safety
for sharing immutable objects. As we've seen, that an object reference becomes visible to another thread does not
necessarily mean that the state of that object is visible to the consuming thread. In order to guarantee a consistent view
of the object's state, synchronization is needed.
Immutable objects, on the other hand, can be safely accessed even when synchronization is not used to publish the
object reference. For this guarantee of initialization safety to hold, all of the requirements for immutability must be met:
unmodifiable state, all fields are final, and proper construction.
Immutable objects can be used safely by any thread without additional synchronization, even when synchronization is
not used to publish them.
The following chapter 3.5.3 Safe publication Idioms states that safe publication is required only for non-immutable objects using the following approaches:
Static initializer
Storing reference in volatile/final/AtomicReference
Storing reference that is guarded by the lock

Cheapest way of establishing happens-before with non-final field

Many questions/answers have indicated that if a class object has a final field and no reference to it is exposed to any other thread during construction, then all threads are guaranteed to see the value written to the field once the constructor completes. They have also indicated that storing into a final field a reference to a mutable object which has never been accessed by outside threads will ensure that all mutations which have been made to the object prior to the store will be visible on all threads which access the object via the field. Unfortunately, neither guarantee applies to writes of non-final fields.
A question I do not see answered, however, is this: If the semantics of a class are such that a field cannot be final, but one wishes to ensure the "publication" of the field and the object identified thereby, what is the most efficient way of doing that? As an example, consider
class ShareableDataHolder<T>
{
Object data; // Always identifies either a T or a SharedDataHolder<T>
}
private class SharedDataHolder<T> extends ShareableDataHolder<T>
{
Object data; // Always identifies either a T or a lower-numbered SharedDataHolder<T>
final long seq; // Immutable; necessarily unique
}
The intention would be that data will initially identify a data object directly, but that it could legitimately at any time be changed to identify a SharedDataHolder<T> which directly or indirectly encapsulates an equivalent data object. Assume all code is written to work correctly (though not necessarily optimally-efficiently) if any read of data may arbitrarily return any value that was ever written to data, but may fail if it reads null.
Declaring volatile Object data would be semantically correct, but would likely impose extra costs on every subsequent access to the field. Entering a dummy lock after initially setting the field would work, but would be needlessly slow. Having a dummy final field, which the object sets to identify itself would seem like it should work; although technically I think it might require that all accesses to the other field be done through the other field, I can't see any realistic scenario where that would matter. In any case, having a dummy field whose purpose is only to provide the appropriate synchronization via its existence would seem wasteful.
Is there any clean way to inform the compiler that a particular write to data within the constructor should have a happens-before relationship with regard to any reads of that field which occur after the constructor returns (as would be the case if the field were final), without having to pay the costs associated with volatile, locks, etc.? Alternatively, if a thread were to read data and find it null, could it somehow repeat the read in such a fashion as to establish a "happens after" with regard to the write of data [recognizing that such a request might be slow, but shouldn't need to happen very often]?
PS--If happens-before relationships are non-transitive, would a proper happens-before relationship exist in the following scenario?
Thread 1 writes to a non-final field dat in some object Fred and stores a reference to it into to a final field George.
Thread 2 copies the reference from George into a non-final field Larry.
Thread 3 reads Larry.dat.
From what I can tell, a happens-before relationship exists between the write of Fred's field dat and a read of George. Would a happens-before relationship exist between the the write of Fred's dat and a read of Larry that returns a reference to Fred that was copied from a final reference to Fred? If not, is there any "safe" way to copy a reference contained in a final field to a non-final field that would be accessible via other threads?
PPS--If an object and its constituents are never accessed outside their creation thread until the main constructor finishes, and the last step of the main constructor is to stores within the main object a final reference to itself, is there any "plausible" implementation/scenario where another thread could see a partially-constructed object, whether or not anything actually uses that final reference?

Short answer
No.
Longer answer
JLS 17.4.5 lists all* of the ways of establishing a happens-before relationship, other than the special case of final field semantics:
An unlock on a monitor happens-before every subsequent lock on that monitor.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
A call to start() on a thread happens-before any actions in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join() on that thread.
The default initialization of any object happens-before any other actions (other than default-writes) of a program.
(The original lists them as bullet points; I'm changing them to numbers for convenience here.)
Now, you've ruled out locks (#1) and volatile fields (#2). Rules #3 and #4 relate to the life-cycle of the thread, which you don't mention in your question, and doesn't sound like it would apply. Rule #5 doesn't give you any non-null values, so it doesn't apply either.
So of the five possible methods for establishing happens-before, other than final field semantics, three don't apply and two you've explicitly ruled out.
* The rules listed in 17.4.5 are actually consequences of the synchronization order rules defined in 17.4.4, but those relate pretty directly to the ones mentioned in 17.4.5. I mention that because 17.4.5's list can be interpreted as being illustrative and thus non-exhaustive, but 17.4.4's list is non-illustrative and exhaustive, and you can make the same analysis from that directly, if you don't want to rely on the intermediate analysis that 17.4.5 provides.

You can apply final field semantics without making the fields of your class final but by passing your reference through another final field. For this purpose, you need to define a publisher class:
class Publisher<T> {
private final T value;
private Publisher(T value) { this.value = value; }
public static <S> S publish(S value) { return new Publisher<S>(value).value; }
}
If you are now working with an instance of ShareableDataHolder<T>, you can publish the instance by:
ShareableDataHolder<T> holder = new ShareableDataHolder<T>();
// set field values
holder = Publisher.publish(holder);
// Passing holder to other threads is now safe
This approach is tested and benchmarked and turns out to be the most performant alternative on current VMs. The overhead is minimal as escape analysis typically removes the allocation of the very short-lived Publisher instance.

Characteristics of a volatile hashmap

I am trying to get a firm handle on how a variable declared as
private volatile HashMap<Object, ArrayList<String>> data;
would behave in a multi-threaded environment.
What I understand is that volatile means get from main memory and not from the thread cache. That means that if a variable is being updated I will not see the new values until the update is complete and I will not block, rather what I see is the last updated value. (This is exactly what I want BTW.)
My question is when I retrieve the ArrayList<String> and add or remove strings to it in thread A while thread B is reading, what exactly is affected by the volatile keyword? The HashMap only or is the effect extended to the contents (K and V) of the HashMap as well? That is when thread B gets an ArrayList<String> that is currently being modified in thread A what is actually returned is the last value of ArrayList<String> that existed before the updated began.
Just to be clear, lets say the update is adding 2 strings. One string has already been added in thread A when thread B gets the array. Does thread B get the array as it was before the first string was added?

That means that if a variable is being updated I will not see the new values until the update is complete and I will not block, rather what I see is the last updated value
This is your source of confusion. What volatile does is make sure that reads and writes to that field are atomic - so no other threads could ever see a partially written value.
A non-atomic long field (which takes 2 memory addresses on a 32-bit machine) could be read incorrectly if a write operation was preempted after writing to the first address, and before writing to the second address.
Note that the atomicity of reads/writes to a field has nothing to do with updating the inner state of an HashMap. Updating the inner state of an HashMap entails multiple instructions, which are not atomic as a whole. That's why you'd use locks to synchronize access to the HashMap.
Also, since read/write operations on references are always atomic, even if the field is not marked as volatile, there is no difference between a volatile and a non-volatile HashMap, regarding atomicity. In that case, all volatile does is give you acquire-release semantics. This means that, even though the processor and the compiler are still allowed to slightly reorder your instructions, no instructions may ever be moved above a volatile read or below a volatile write.

The volatile keyword here is only applicable to HashMap, not the data stored within it, in this case is ArrayList.
As stated in HashMap documentation:
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with a key that an
instance already contains is not a structural modification.) This is
typically accomplished by synchronizing on some object that naturally
encapsulates the map. If no such object exists, the map should be
"wrapped" using the Collections.synchronizedMap method. This is best
done at creation time, to prevent accidental unsynchronized access to
the map:
Map m = Collections.synchronizedMap(new HashMap(...));

The volatile keywords neither affects operations on the HashMap (e.g. put, get) nor operations on the ArrayLists within the HashMap. The volatile keywords only affects reads and writes on this particular reference to the HashMap. Again, there can be further references to the same HashMap, which are no affected.
If you want to synchronise all operations on:
- the reference
- the HashMap
- and the ArrayList,
then use an additional Lock object for synchronisation as in the following code.
private final Object lock = new Object();
private Map<Object, List<String>> map = new HashMap<>();
// access reference
synchronized (lock) {
map = new HashMap<>();
}
// access reference and HashMap
synchronized (lock) {
return map.contains(42);
}
// access reference, HashMap and ArrayList
synchronized (lock) {
map.get(42).add("foobar");
}
If the reference is not changed, you can use the HashMap for synchronization (instead of the Lock).

Effectively Immutable Object

I want to make sure that I correctly understand the 'Effectively Immutable Objects' behavior according to Java Memory Model.
Let's say we have a mutable class which we want to publish as an effectively immutable:
class Outworld {
// This MAY be accessed by multiple threads
public static volatile MutableLong published;
}
// This class is mutable
class MutableLong {
private long value;
public MutableLong(long value) {
this.value = value;
}
public void increment() {
value++;
}
public long get() {
return value;
}
}
We do the following:
// Create a mutable object and modify it
MutableLong val = new MutableLong(1);
val.increment();
val.increment();
// No more modifications
// UPDATED: Let's say for this example we are completely sure
// that no one will ever call increment() since now
// Publish it safely and consider Effectively Immutable
Outworld.published = val;
The question is:
Does Java Memory Model guarantee that all threads MUST have Outworld.published.get() == 3 ?
According to Java Concurrency In Practice this should be true, but please correct me if I'm wrong.
3.5.3. Safe Publication Idioms
To publish an object safely, both the reference to the object and the
object's state must be made visible to other threads at the same time.
A properly constructed object can be safely published by:
- Initializing an object reference from a static initializer;
- Storing a reference to it into a volatile field or AtomicReference;
- Storing a reference to it into a final field of a properly constructed object; or
- Storing a reference to it into a field that is properly guarded by a lock.
3.5.4. Effectively Immutable Objects
Safely published effectively immutable objects can be used safely by
any thread without additional synchronization.

Yes. The write operations on the MutableLong are followed by a happens-before relationship (on the volatile) before the read.
(It is possible that a thread reads Outworld.published and passes it on to another thread unsafely. In theory, that could see earlier state. In practice, I don't see it happening.)

There is a couple of conditions which must be met for the Java Memory Model to guarantee that Outworld.published.get() == 3:
the snippet of code you posted which creates and increments the MutableLong, then sets the Outworld.published field, must happen with visibility between the steps. One way to achieve this trivially is to have all that code running in a single thread - guaranteeing "as-if-serial semantics". I assume that's what you intended, but thought it worth pointing out.
reads of Outworld.published must have happens-after semantics from the assignment. An example of this could be having the same thread execute Outworld.published = val; then launch other the threads which could read the value. This would guarantee "as if serial" semantics, preventing re-ordering of the reads before the assignment.
If you are able to provide those guarantees, then the JMM will guarantee all threads see Outworld.published.get() == 3.
However, if you're interested in general program design advice in this area, read on.
For the guarantee that no other threads ever see a different value for Outworld.published.get(), you (the developer) have to guarantee that your program does not modify the value in any way. Either by subsequently executing Outworld.published = differentVal; or Outworld.published.increment();. While that is possible to guarantee, it can be so much easier if you design your code to avoid both the mutable object, and using a static non-final field as a global point of access for multiple threads:
instead of publishing MutableLong, copy the relevant values into a new instance of a different class, whose state cannot be modified. E.g.: introduce the class ImmutableLong, which assigns value to a final field on construction, and doesn't have an increment() method.
instead of multiple threads accessing a static non-final field, pass the object as a parameter to your Callable/Runnable implementations. This will prevent the possibility of one rogue thread from reassigning the value and interfering with the others, and is easier to reason about than static field reassignment. (Admittedly, if you're dealing with legacy code, this is easier said than done).

The question is: Does Java Memory Model guarantee that all threads
MUST have Outworld.published.get() == 3 ?
The short answer is no. Because other threads might access Outworld.published before it has been read.
After the moment when Outworld.published = val; had been performed, under condition that no other modifications done with the val - yes - it always be 3.
But if any thread performs val.increment then its value might be different for other threads.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.