concurrenthashmap java

concurrenthashmap java - java

This seems wrong.
static ConcurrentHashMap k; //multiple threads have access to k
X o = k.get("LL");
o.a = 6;
If multiple threads access k concurrently, and get k("LL"), then update (o.a = #) without k.put("ll",o), without synchronizing on 'o', or on 'k' what happens?

A ConcurrentMap has conditional operations that guarantee atomic insert/removal and replacement of key/value pairs. Additionally, accessing a ConcurrentMap creates a happens-before relationship so you can make certain guarantees about the ordering of your code.
In the code presented, the line:
X o = k.get("LL");
accesses the current X value for the key "LL". The next line modifies the a property. Without knowing the implementation of X, this is Java so we know that there is no method call here. If (and only if) the a property is marked as volatile then some subsequent code accessing the X at "LL" will see the a value as 6. If it isn't volatile then there are no guarantees at all. They will probably see 6, particularly on an SMP x86 box, with not many threads doing much at the time. In production, on a big NUMA box, they are less likely to. Mutability brings with it all sorts of complications and difficulty.
Generally, you'll find it is easier to reason about the state the map is in if you use immutable keys AND values.

The ConcurrentHashMap guarantees that getting a value is atomic, but it can't control what you do with the values you get from it. Modifying values in the hashmap is fine from the ConcurrentHashMap's view but may still not result in the behaviour you want. To be sure about thread-safety you need to consider exactly what each thread that has access to it does.
Putting the value back in the ConcurrentHashMap seems redundant and doesn't make the whole operation any safer. You're already modifying the object outside of any synchronization.
Additional synchronization may be necessary, but I can't tell for sure without seeing more context.

Simply speaking:
o.a=6
is an atomic operation, all the threads will compete, and the last thread setting it will "win", overwriting the value.
More specifically, the ConcurrentHashMap only guarantees that the link between a key and its associated value is handled taking multiple threads into account - ie put and get are atomic.
This does not prevent any thread from modifying attributes of the value once it gets a reference to it!

Related

Does ConcurrentMap.remove() provide a happens-before edge before get() returns null?

Are actions in a thread prior to calling ConcurrentMap.remove() guaranteed to happen-before actions subsequent to seeing the removal from another thread?
Documentation says this regarding objects placed into the collection:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
Example code:
{
final ConcurrentMap map = new ConcurrentHashMap();
map.put(1, new Object());
final int[] value = { 0 };
new Thread(() -> {
value[0]++;
value[0]++;
value[0]++;
value[0]++;
value[0]++;
map.remove(1); // A
}).start();
new Thread(() -> {
if (map.get(1) == null) { // B
System.out.println(value[0]); // expect 5
}
}).start();
}
Is A in a happens-before relationship with B? Therefore, should the program only, if ever, print 5?

You have found an interesting subtle aspect of these concurrency tools that is easy to overlook.
First, it’s impossible to provide a general guaranty regarding removal and the retrieval of a null reference, as the latter only proves the absence of a mapping but not a previous removal, i.e. the thread could have read the map’s initial state, before the key ever had a mapping, which, of course, can’t establish a happens-before relationship with the actions that happened after the map’s construction.
Also, if there are multiple threads removing the same key, you can’t assume a happens-before relationship, when retrieving null, as you don’t know which removal has been completed. This issue is similar to the scenario when two threads insert the same value, but the latter can be fixed on the application side by only perform insertions of distinguishable values or by following the usual pattern of performing the desired modifications on the value object which is going to be inserted and to query the retrieved object only. For a removal, there is no such fix.
In your special case, there’s a happens-before relationship between the map.put(1, new Object()) action and the start of the second thread, so if the second thread encounters null when querying the key 1, it’s clear that it witnessed the sole removal of your code, still, the specification didn’t bother to provide an explicit guaranty for this special case.
Instead, the specification of Java 8’s ConcurrentHashMap says,
Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
clearly ruling out null retrievals.
I think, with the current (Java 8) ConcurrentHashMap implementation, your code can’t break as it is rather conservative in that it performs all access to its internal backing array with volatile semantics. But that is only the current implementation and, as explained above, your code is a special case and likely to become broken with every change towards a real-life application.

No, you have the order wrong.
There is a happens-before edge from the put() to the subsequent get(). That edge is not symmetric, and doesn't work in the other direction. There is no happens-before edge from at get() to another get() or a remove(), or from a put() to another put().
In this case, you put an object in the map. Then you modify another object. That's a no-no. There's no edge from the those writes to the get() in the second thread, so those writes may not be visible to the second thread.
On Intel hardware, I think this will always work. However, it isn't guaranteed by the Java memory model, so you have to be wary if you ever port this code to different hardware.

A does not need to happen before B.
Only the original put happens before both. Thus a null at B means that A happened.
However write back of thread local memory cache and instruction order of ++ and remove are not mentioned. volatile is not used; instead a Map and an array are used to hopefully keep thread data synchrone. On writing the data back, in-order relation should hold again.
To my understanding A could remove and be written back, then the last ++ happen, and something like 4 being printed at B. I would add volatile to the array. The Map itself will go fine.
I am far from certain, but as I did not see a corresponding answer, I stick my neck out. (To learn myself.)

As ConcurrentHashMap is a thread safe collection, the statement map.remove(1) must have a read barrier and a write barrier if it alters the map. The expression map.get(1) must have a read barrier or one, or both of those operations are not thread safe.
In reality ConcurrentHashMap up to Java 7, uses partitioned locks, so it always has a read/write barrier for nearly every operation.
A ConcurrentSkipListMap doesn't have to use locks, but to perform any thread safe write action, a write barrier is required.
This means your test should always act as expected.

Why or when would a Map.get(..) need synchronization?

This is a code snippet from collections's SynchronizedMap. My question is not specific to the code snippet below - but a generic one: Why does a get operation need synchronization?
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}

If your threads are only ever getting from the Map, the synchronization is not needed. In this case it might be a good idea to express this fact by using an immutable map, like the one from the Guava libraries, this protects you at compile time from accidentally modifying the map anyway.
The trouble begins when multiple threads are reading and modifying the map, because the internal structure of, e.g. the HashMap implementation from the Java standard libraries is not prepared for that. In this case you can either wrap an external serialization layer around that map, like
using the synchronized keyword,
slightly safer would be to use a SynchronizedMap, because then you can't forget the synchonized keyword everywhere it's needed,
protect the map using a ReadWriteLock, which would allow multiple concurrently reading threads (which is fine)
switch to an ConcurrentHashMap altogether, which is prepared for being accessed by multiple threads.
But coming back to you original question, why is the synchronization needed in the first place: This is a bit hard to tell without looking at the code of the class. Possibly it would break when the put or remove from one thread causes the bucket count to change, which would cause a reading thread to see too many / too few elements because the resize is not finished yet. Maybe something completely different, I don't know and it's not really important because the exact reason(s) why it is unsafe can change at any time with a new Java release. The important fact is only that it is not supported and your code will likely blow up one or another way at runtime.

If the table gets resized in the middle of the call to get(), it could potentially look in the wrong bucket and return null incorrectly.
Consider the steps that happen in m.get():
A hash is calculated for the key.
The current length of the table (the buckets in the HashMap) is read.
This length is used to calculate the correct bucket to get from the table.
The bucket is retrieved and the entries in the bucket are walked until a match is found or until the end of the bucket is reached.
If another thread changes the map and causes the table to be resized in between 2 & 3, the wrong bucket could be used to look for the entry, potentially giving an incorrect result.

The reason why synchronization is needed in a concurrent environment is, that java operations aren't atomic. This means that a single java operation like counter++ causes the underlaying VM to execute more than one machine operation.
Read value
Increment value
Write value
While those three operations are performed, another thread called T2 may be invoked and read the old value e.g 10 of that variable. T1 increments that value und writes the value 11 back. But T2 has read value 10! In cas that T2 should also increment this value, the result stays the same, namely 11 instead of 12.
The synchronisation will avoid such concurrent errors.
T1:
Set synchronizer token
Read value
Another thread T2 was invoked and tries to read the value. But since the synchronizer token was already set, T2 has to wait.
Increment value
Write value
Remove synchronizer token
T2:
Set synchronizer token
Read value
Increment value
Write value
Remove synchronizer token

By synchronising the get method you are forcing the thread to cross the memory barrier and read the value from the main memory. If you wouldn't synchronise the get method then the JVM takes liberties to apply underlying optimisations that might result in that thread reading blissfully unaware a stale value stored in registers and caches.

Is ConcurrentHashMap totally safe?

this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."

The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.

It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}

ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.

HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.

get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.

It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.

Does re-putting an object into a ConcurrentHashMap cause a "happens-before" memory relation?

I'm working with existing code that has an object store in the form of a ConcurrentHashMap. Within the map are stored mutable objects, use by multiple threads. No two threads try to modify an object at once by design. My concern is regarding the visibility of the modifications between the threads.
Currently the objects' code has synchronization on the "setters" (guarded by the object itself). There is no synchronization on the "getters" nor are the members volatile. This, to me, would mean that visibility is not guaranteed. However, when an object is modified it is re-put back into the map (the put() method is called again, same key). Does this mean that when another thread pulls the object out of the map, it will see the modifications?
I've researched this here on stackoverflow, in JCIP, and in the package description for java.util.concurrent. I've basically confused myself I think... but the final straw that caused me to ask this question was from the package description, it states:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
In relation to my question, do "actions" include the modifications to the objects stored in the map before the re-put()? If all this does result in visibility across threads, is this an efficient approach? I'm relatively new to threads and would appreciate your comments.
Edit:
Thank you all for you responses! This was my first question on StackOverflow and it has been very helpful to me.
I have to go with ptomli's answer because I think it most clearly addressed my confusion. To wit, establishing a "happens-before" relation doesn't necessarily affect modification visibility in this case. My "title question" is poorly constructed regarding my actual question described in the text. ptomli's answer now jives with what I read in JCIP: "To ensure all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock" (page 37). Re-putting the object back into the map doesn't provide this common lock for the modification to the inserted object's members.
I appreciate all the tips for change (immutable objects, etc), and I wholeheartedly concur. But for this case, as I mentioned there is no concurrent modification because of careful thread handling. One thread modifies an object, and another thread later reads the object (with the CHM being the object conveyer). I think the CHM is insufficient to ensure that the later executing thread will see the modifications from the first given the situation I provided. However, I think many of you correctly answered the title question.

You call concurrHashMap.put after each write to an object. However you did not specified that you also call concurrHashMap.get before each read. This is necessary.
This is true of all forms of synchronization: you need to have some "checkpoints" in both threads. Synchronizing only one thread is useless.
I haven't checked the source code of ConcurrentHashMap to make sure that put and get trigger an happens-before, but it is only logical that they should.
There is still an issue with your method however, even if you use both put and get. The problem happens when you modify an object and it is used (in an inconsistent state) by the other thread before it is put. It's a subtle problem because you might think the old value would be read since it hasn't been put yet and it would not cause a problem. The problem is that when you don't synchronize, you are not guaranteed to get a consistent older object, but rather the behavior is undefined. The JVM can update whatever part of the object in the other threads, at any time. It's only when using some explicit synchronization that you are sure you are updating the values in a consistent way across threads.
What you could do:
(1) synchronize all accesses (getters and setters) to your objects everywhere in the code. Be careful with the setters: make sure that you can't set the object in an inconsistent state. For example, when setting first and last name, having two synchronized setters is not sufficient: you must get the object lock for both operations together.
or
(2) when you put an object in the map, put a deep copy instead of the object itself. That way the other threads will never read an object in an inconsistent state.
EDIT:
I just noticed
Currently the objects' code has synchronization on the "setters"
(guarded by the object itself). There is no synchronization on the
"getters" nor are the members volatile.
This is not good. As I said above synchronizing on only one thread is no synchronization at all. You might synchronize on all your writer threads, but who cares since the readers won't get the right values.

I think this has been already said across a few answers but to sum it up
If your code goes
CHM#get
call various setters
CHM#put
then the "happens-before" provided by the put will guarantee that all the mutate calls are executed before the put. This means that any subsequent get will be guaranteed to see those changes.
Your problem is that the actual state of the object will not be deterministic because if the actual flow of events is
thread 1: CHM#get
thread 1: call setter
thread 2: CHM#get
thread 1: call setter
thread 1: call setter
thread 1: CHM#put
then there is no guarantee over what the state of the object will be in thread 2. It might see the object with the value provided by the first setter or it might not.
The immutable copy would be the best approach as then only completely consistent objects are published. Making the various setters synchronized (or the underlying references volatile) still doesn't let you publish consistent state, it just means that the object will always see the latest value for each getter on each call.

I think your question relates more to the objects you're storing in the map, and how they react to concurrent access, than the concurrent map itself.
If the instances you're storing in the map have synchronized mutators, but not synchronized accessors, then I don't see how they can be thread safe as described.
Take the Map out of the equation and determine if the instances you're storing are thread safe by themselves.
However, when an object is modified it is re-put back into the map (the put() method is called again, same key). Does this mean that when another thread pulls the object out of the map, it will see the modifications?
This exemplifies the confusion. The instance that is re-put into the Map will be retrieved from the Map by another thread. This is the guarantee of the concurrent map. That has nothing to do with visibility of the state of the stored instance itself.

My understanding is that it should work for all gets after the re-put, but this would be a very unsafe method of synchronization.
What happens to gets that happen before the re-put, but while modifications are happening. They may see only some of the changes, and the object would have an inconsistent state.
If you can, I'd recommend store immutable objects in the map. Then any get will retrieve a version of the object that was current when it did the get.

That's a code snippet from java.util.concurrent.ConcurrentHashMap (Open JDK 7):
919 public V get(Object key) {
920 Segment<K,V> s; // manually integrate access methods to reduce overhead
921 HashEntry<K,V>[] tab;
922 int h = hash(key.hashCode());
923 long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
924 if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
925 (tab = s.table) != null) {
926 for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
927 (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
928 e != null; e = e.next) {
929 K k;
930 if ((k = e.key) == key || (e.hash == h && key.equals(k)))
931 return e.value;
932 }
933 }
934 return null;
935 }
UNSAFE.getObjectVolatile() is documented as getter with internal volatile semantics, thus the memory barrier will be crossed when getting the reference.

yes, put incurs a volatile write, even if key-value already exists in the map.
using ConcurrentHashMap to publish objects across thread is pretty effecient. Objects should not be modified further once they are in the map. (They don't have to be strictly immutable (with final fields))

Does a variable accessed by multiple threads in a java servlet need to be declared volatile?

In the book Java Servlet Programming, there's an example servlet on page 54 which searches for primes in a background thread. Each time a client accesses the servlet, the most recently found prime number is returned.
The variable which is used to store the most recently found prime is declared as such:
long lastprime = 0;
Since this variable is begin accessed from multiple threads (the background thread that's doing the calculations and any client threads that are accessing it), doesn't it need to be declared volatile or have its access synchronized in some way?

Yes, assuming you really want to see the most recently calculated prime on any thread, it should either be volatile or be accessed in a thread-safe way via synchronized blocks/methods. Additionally, as pointed out in the comments, non-volatile long variables may not be updated atomically - so you could see the top 32 bits of an old value and the bottom 32 bits of a new value (or vice versa).
I forgot about the atomicity side of things earlier because it's almost always solved automatically by when you make sure you get the most recently published value, and make sure you fully publish new values. In practice this is almost always what you want, so atomicity becomes a non-issue if your code is working properly to start with.
It's not a SingleThreadModel servlet is it? That would obviously make a difference.
Another alternative would have been to use AtomicLong.

Yes. A servlet's variables aren't thread-safe.

There is a clean read/write split between the threads; one thread "publishes" the last prime for others to read, then you can get away with making it volatile.
If the access pattern involved some read-modify-write sequences or the like, then you'd have to synchronize the access to the field.

Assuming Java 5 or later then declaring it volatile gives well-defined semantics as desscribed here. On the principle of removing doubt from the code maintainer's mind I would use volatile, saying "yes I know that multiple threads use this variable".
The intersting question is the effect of not declaring it volatile. Provided that you got a prime, does it matter if it's the very latest available? Volatile ensures taht values are taken from memory, not any "CPU" caches, so you should get a more up to date value.
What about the possibility of seeing a partial assigment? Could you get really unlucky and see a long whose LSBs are part of an old value and MSBs part of a different value? Well, assignments to longs and doubles are not atomic, so in theory yes!
Ergo, volatile or synchronized is not just a nice-to-have ... you need it

Semantics of volatile variable in Java are not strong enough to make the increment operation (lastprime++) atomic, unless you can guarantee that the variable is written only from a single thread - not in servlet's case
On the other hand, using AtomicXXX variables is thread-safe, as long as no compounded operations are performed. There will be window of vulnerability when updating more than one atomic variables, even though each call to is atomic.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.