ConcurrentHashMap guarantees - java

I have the concurrent hashmap in some service class:
class MyClass implements Flushable {
private volatile ConcurrentHashMap<Integer, Object> hashMap = ...
public void add(int id, Object value) {
hashMap.put(id, value);
}
#Override
public void flush() throws IOException {
hashMap.foreach((k, v) -> ...)
hashMap.clear();
}
}
Do I need to do some additional locking to be sure that:
1. flush will process all map entries (what if add is invoked between foreach and clear?)
2. clear will not remove entries which were inserted/updated after foreach
From javadoc, there is a guarantee that update happens before read. So as far as I understand clear will block put invocations, however to reach what I want I need some additional locks.

In your case you need any extra locking here, both are operation should be locked , clear method also do locking on segment level..

Related

Java List in Multi-thread environment

I would like to convert the following code to fit multithread environment.
List<Observer> list = new ArrayList<>();
public void removeObserver(Observer p) {
for (Observer observer: list) {
if (observer.equals(p)) {
list.remove(observer);
break;
}
}
}
public void addObserver(Observer p) {
list.add(p);
}
public void notifyObserver(Event obj) {
for (Observer observer: list) {
observer.notify(obj);
}
}
Definitely, one of the easiest way to do so, is to add synchronized keyword, which ensure only one thread can runs the logic, and thereby ensuring result is correct.
However, is there better way to solve the issue. I have do some sort of research, and found that I can use Collections.synchronizedList, and also notice such list.iterator is not thread-safe, so I should avoid use of forEach loop or iterator directly unless I do a synchronized (list)
I just don't want to use synchronized, and think if there is another possible approach. Here is my second attempt.
List<Observer> list = Collections.synchronizedList(new ArrayList<Observer>()); // which is thread safe
public void removeObserver(Observer p) {
// as the list may get modify, I create a copy first
List<Observer> copy = new CopyOnWriteArrayList(list);
for (Observer observer: copy) {
if (observer.equals(p)) {
// but now, no use of iterator
list.remove(observer); // remove it from the original copy
break;
}
}
}
public void addObserver(Observer p) {
list.add(p);
}
public void notifyObserver(Event obj) {
List<Observer> copy = new CopyOnWriteArrayList(list);
// not use iterator, as thread safe list's iterator can be thread unsafe
// and for-each loop use iterator concept
for (Observer observer: copy) {
observer.notify(obj);
}
}
I just want to ask if my second attempt is thread-safe? Also, is there a better approach to do this then my proposed second method?
Definitely, one of the easiest way to do so, is to add synchronized keyword, which ensure only one thread can runs the logic, and thereby ensuring result is correct.
This is correct.
However, is there better way to solve the issue?
Possibly. But lets take look at your second attempt:
List<Observer> list = Collections.synchronizedList(new ArrayList<Observer>());
// which is thread safe
Yes it is thread-safe. With certain constraints.
public void removeObserver(Observer p) {
// as the list may get modify, I create a copy first
List<Observer> copy = new CopyOnWriteArrayList(list);
...
Three problems here:
You are creating a copy of the list. That is an O(N) operation.
The CopyOnWriteArrayList constructor is going to iterate list ... and iteration of a list created by synchronizedList is not atomic / thread-safe so you have a race condition.
There is no actual benefit of using CopyOnWriteArrayList here over (say) ArrayList. The copy object is local and thread-confined so it doesn't need to be thread-safe.
In summary, this is not thread-safe AND it is more expensive simply making the original methods synchronized.
A possibly better way:
List<Observer> list = new CopyOnWriteArrayList()
public void removeObserver(Observer p) {
list.remove(p)
}
public void addObserver(Observer p) {
list.add(p);
}
public void notifyObserver(Event obj) {
for (Observer observer: list) {
observer.notify(obj);
}
}
This is thread-safe with the caveat that an Observer added while a notifyObserver call is in progress will not be notified.
The only potential problem is that mutations to a CopyOnWriteArrayList are expensive since they create a copy of the entire list. So if the ratio of mutations to notifies is too high, this may be more expensive than the solution using synchronized methods. On the other hand, if multiple threads call notifyObserver, those calls can proceed in parallel.

java non blocking cache implementation

(please note i cannot use external libraries for the cache).
---may it can be done using the stream API? ---
i need to implement a cache, it's has 1 key property:
If the cache is asked for a key that it doesn't contain, it should fetch the data using an externally provided function that reads the data from another source (database or similar).
i've started to create a basic skelaton code:
public interface ICache<K,V> {
}
interface IDataSource<K,V> {
void put(K key, V value);
V get(K key);
}
public class Cache<K,V> implements ICache<K,V> {
Map<K,V> cache = new HashMap<>();
IDataSource<K,V> dataSource;
public Cache(IDataSource<K,V> dataSrc) {
dataSource = dataSrc;
}
//may it change to a future? how it can be done?
public V getAsync(K key) {
if (cache.containsKey(key)) {
return cache.get(key);
}
else {
//do some async op
}
}
}
can you advice?
do u think it's need more features?
In reality what you are writing is a lazy evaluator. You are providing a Supplier for the value, without computing it the first time. The moment someone asks for your value you compute it and return it, memoizing (caching) it for future use.
Have a look at Vavr's Lazy class, it is doing precisely this (but for a single value). You can take some ideas from what that is doing, and also some extra utility methods like checking if it was already computed.
https://github.com/vavr-io/vavr/blob/master/vavr/src/main/java/io/vavr/Lazy.java
Another option is to simply use ConcurrentHashMap. It provides methods to safely (atomically) update values if they are not in the map.
If you want it to be asynchronous you need to introduce some ExecutorService or use CompletableFuture (with your own ExecutorService or the default thread pool that is used by parallel streams etc.). For example:
public class Cache<K,V> implements ICache<K,V> {
Map<K,V> cache = new ConcurrentHashMap<>();
IDataSource<K,V> dataSource;
public Cache(IDataSource<K,V> dataSrc) {
dataSource = dataSrc;
}
// async non-blocking call
public CompletableFuture<V> getAsync(K key) {
return CompletableFuture.supplyAsync(() -> get(key));
}
// blocking call
public V get(K key) {
//computeIfAbsent is atomic and threadsafe, in case multiple CompletableFutures try this in parallel
return cache.computeIfAbsent(key, (k) -> dataSource.get(k));
}
}
If you also wanted to have an async direct cache and datasource update you could do something like:
public CompletableFuture<Void> putAsync(K key, V value) {
return CompletableFuture.runAsync(() -> {
synchronized (cache) {
dataSource.put(key, value);
cache.put(key, value);
}
}
}
Although honestly I would avoid having 2 entrypoints to update the dataSource (the cache and the dataSource directly). Also it is difficult to make this completely thread safe without having the synchronized (which blocks concurrent cache puts completely from happening in parallel, even if the key is different).

Synchronizing on cached items

I'm using something like
Cache<Integer, Item> cache;
where the Items are independent of each other and look like
private static class Item {
private final int id;
... some mutable data
synchronized doSomething() {...}
synchronized doSomethingElse() {...}
}
The idea is to obtain the item from the cache and call a synchronized method on it. In case of a miss, the item can be recreated, that's fine.
A problem occurs when an item gets evicted from the cache and recreated while a thread runs a synchronized method. A new thread obtains a new item and synchronizes on it... so for a single id, there are two threads inside the synchronized method. FAIL.
Is there an easy way around it? It's Guava Cache, if it helps.
I think the suggestion from Louis, using the the keys for locking is the most simple and practical one. Here is code some snippet, that, without the help of Guava libraries, illustrates the idea:
static locks[] = new Lock[ ... ];
static { /* initialize lock array */ }
int id;
void doSomething() {
final lock = locks[id % locks.length];
lock.lock();
try {
/* protected code */
} finally {
lock.unlock();
}
}
The size of the lock array limits the maximum amount of parallelism you get. If your code is only using CPU, you can initialize it by the number of available processors and this is the perfect solution. If your code waits for I/O you might need an arbitrary big array of locks or you limit the number of threads that can run the critical section. In this case another approach might be better.
Comments on a more conceptual level:
If you want to prevent the item from being evicted, you need a mechanism called pinning. Internally this is used by most cache implementations, e.g. for blocking during I/O operations. Some caches may expose a way to do it by the applications.
In a JCache compatible cache, there is the concept of an EntryProcessor. The EntryProcessor allows you to process a peace of code on an entry in an atomic way. This means the cache is doing all the locking for you. Depending of the scope of the problem, this may have an advantage, since this also works in clustered scenarios, which means the locking is cluster wide.
Another idea which comes to my mind is the vetoable eviction. This is a concept EHCache 3 is implementing. By specifying a vetoable eviction policy you can implement a pinning mechanism on your own.
I'm sure that there are multiple solutions for your issue.
I wrote down one of them with using a unique lock for each ietmId:
public class LockManager {
private Map<Integer, Lock> lockMap = new ConcurrentHashMap<>();
public synchronized Lock getOrCreateLockForId(Integer itemId) {
Lock lock;
if (lockMap.containsKey(itemId)) {
System.out.println("Get lock");
lock = lockMap.get(itemId);
} else {
System.out.println("Create lock");
lock = new ReentrantLock();
lockMap.put(itemId, lock);
}
return lock;
}
public synchronized Lock getLockForId(Integer itemId) {
Lock lock;
if (lockMap.containsKey(itemId)) {
System.out.println("get lock");
return lockMap.get(itemId);
} else {
throw new IllegalStateException("First lock, than unlock");
}
}
}
So, instead of using synchronised methods in class Item use LockManager to get Lock by itemId and call lock.lock() after it was retrieved.
Also note that LockManager should have singleton scope and the same instance should be shared across all usages.
Below you can see example of LockManager using:
try {
lockManager.getOrCreateLockForId(itemId).lock();
System.out.println("start doing something" + num);
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("completed doing something" + num);
} finally {
lockManager.getLockForId(itemId).unlock();
}

Extending java's ThreadLocal to allow the values to be reset across all threads

After looking at this question, I think I want to wrap ThreadLocal to add a reset behavior.
I want to have something similar to a ThreadLocal, with a method I can call from any thread to set all the values back to the same value. So far I have this:
public class ThreadLocalFlag {
private ThreadLocal<Boolean> flag;
private List<Boolean> allValues = new ArrayList<Boolean>();
public ThreadLocalFlag() {
flag = new ThreadLocal<Boolean>() {
#Override protected Boolean initialValue() {
Boolean value = false;
allValues.add(value);
return value;
}
};
}
public boolean get() {
return flag.get();
}
public void set(Boolean value) {
flag.set(value);
}
public void setAll(Boolean value) {
for (Boolean tlValue : allValues) {
tlValue = value;
}
}
}
I'm worried that the autoboxing of the primitive may mean the copies I've stored in the list will not reference the same variables referenced by the ThreadLocal if I try to set them. I've not yet tested this code, and with something tricky like this I'm looking for some expert advice before I continue down this path.
Someone will ask "Why are you doing this?". I'm working in a framework where there are other threads that callback into my code, and I don't have references to them. Periodically I want to update the value in a ThreadLocal variable they use, so performing that update requires that the thread which uses the variable do the updating. I just need a way to notify all these threads that their ThreadLocal variable is stale.
I'm flattered that there is new criticism recently regarding this three year old question, though I feel the tone of it is a little less than professional. The solution I provided has worked without incident in production during that time. However, there are bound to be better ways to achieve the goal that prompted this question, and I invite the critics to supply an answer that is clearly better. To that end, I will try to be more clear about the problem I was trying to solve.
As I mentioned earlier, I was using a framework where multiple threads are using my code, outside my control. That framework was QuickFIX/J, and I was implementing the Application interface. That interface defines hooks for handling FIX messages, and in my usage the framework was configured to be multithreaded, so that each FIX connection to the application could be handled simultaneously.
However, the QuickFIX/J framework only uses a single instance of my implementation of that interface for all the threads. I'm not in control of how the threads get started, and each is servicing a different connection with different configuration details and other state. It was natural to let some of that state, which is frequently accessed but seldom updated, live in various ThreadLocals that load their initial value once the framework has started the thread.
Elsewhere in the organization, we had library code to allow us to register for callbacks for notification of configuration details that change at runtime. I wanted to register for that callback, and when I received it, I wanted to let all the threads know that it's time to reload the values of those ThreadLocals, as they may have changed. That callback comes from a thread I don't control, just like the QuickFIX/J threads.
My solution below uses ThreadLocalFlag (a wrapped ThreadLocal<AtomicBoolean>) solely to signal the other threads that it may be time to update their values. The callback calls setAll(true), and the QuickFIX/J threads call set(false) when they begin their update. I have downplayed the concurrency issues of the ArrayList because the only time the list is added to is during startup, and my use case was smaller than the default size of the list.
I imagine the same task could be done with other interthread communication techniques, but for what it's doing, this seemed more practical. I welcome other solutions.
Interacting with objects in a ThreadLocal across threads
I'll say up front that this is a bad idea. ThreadLocal is a special class which offers speed and thread-safety benefits if used correctly. Attempting to communicate across threads with a ThreadLocal defeats the purpose of using the class in the first place.
If you need access to an object across multiple threads there are tools designed for this purpose, notably the thread-safe collections in java.util.collect.concurrent such as ConcurrentHashMap, which you can use to replicate a ThreadLocal by using Thread objects as keys, like so:
ConcurrentHashMap<Thread, AtomicBoolean> map = new ConcurrentHashMap<>();
// pass map to threads, let them do work, using Thread.currentThread() as the key
// Update all known thread's flags
for(AtomicBoolean b : map.values()) {
b.set(true);
}
Clearer, more concise, and avoids using ThreadLocal in a way it's simply not designed for.
Notifying threads that their data is stale
I just need a way to notify all these threads that their ThreadLocal variable is stale.
If your goal is simply to notify other threads that something has changed you don't need a ThreadLocal at all. Simply use a single AtomicBoolean and share it with all your tasks, just like you would your ThreadLocal<AtomicBoolean>. As the name implies updates to an AtomicBoolean are atomic and visible cross-threads. Even better would be to use a real synchronization aid such as CyclicBarrier or Phaser, but for simple use cases there's no harm in just using an AtomicBoolean.
Creating an updatable "ThreadLocal"
All of that said, if you really want to implement a globally update-able ThreadLocal your implementation is broken. The fact that you haven't run into issues with it is only a coincidence and future refactoring may well introduce hard-to-diagnose bugs or crashes. That it "has worked without incident" only means your tests are incomplete.
First and foremost, an ArrayList is not thread-safe. You simply cannot use it (without external synchronization) when multiple threads may interact with it, even if they will do so at different times. That you aren't seeing any issues now is just a coincidence.
Storing the objects as a List prevents us from removing stale values. If you call ThreadLocal.set() it will append to your list without removing the previous value, which introduces both a memory leak and the potential for unexpected side-effects if you anticipated these objects becoming unreachable once the thread terminated, as is usually the case with ThreadLocal instances. Your use case avoids this issue by coincidence, but there's still no need to use a List.
Here is an implementation of an IterableThreadLocal which safely stores and updates all existing instances of the ThreadLocal's values, and works for any type you choose to use:
import java.util.Iterator;
import java.util.concurrent.ConcurrentMap;
import com.google.common.collect.MapMaker;
/**
* Class extends ThreadLocal to enable user to iterate over all objects
* held by the ThreadLocal instance. Note that this is inherently not
* thread-safe, and violates both the contract of ThreadLocal and much
* of the benefit of using a ThreadLocal object. This class incurs all
* the overhead of a ConcurrentHashMap, perhaps you would prefer to
* simply use a ConcurrentHashMap directly instead?
*
* If you do really want to use this class, be wary of its iterator.
* While it is as threadsafe as ConcurrentHashMap's iterator, it cannot
* guarantee that all existing objects in the ThreadLocal are available
* to the iterator, and it cannot prevent you from doing dangerous
* things with the returned values. If the returned values are not
* properly thread-safe, you will introduce issues.
*/
public class IterableThreadLocal<T> extends ThreadLocal<T>
implements Iterable<T> {
private final ConcurrentMap<Thread,T> map;
public IterableThreadLocal() {
map = new MapMaker().weakKeys().makeMap();
}
#Override
public T get() {
T val = super.get();
map.putIfAbsent(Thread.currentThread(), val);
return val;
}
#Override
public void set(T value) {
map.put(Thread.currentThread(), value);
super.set(value);
}
/**
* Note that this method fundamentally violates the contract of
* ThreadLocal, and exposes all objects to the calling thread.
* Use with extreme caution, and preferably only when you know
* no other threads will be modifying / using their ThreadLocal
* references anymore.
*/
#Override
public Iterator<T> iterator() {
return map.values().iterator();
}
}
As you can hopefully see this is little more than a wrapper around a ConcurrentHashMap, and incurs all the same overhead as using one directly, but hidden in the implementation of a ThreadLocal, which users generally expect to be fast and thread-safe. I implemented it for demonstration purposes, but I really cannot recommend using it in any setting.
It won't be a good idea to do that since the whole point of thread local storage is, well, thread locality of the value it contains - i.e. that you can be sure that no other thread than your own thread can touch the value. If other threads could touch your thread local value, it won't be "thread local" anymore and that will break the memory model contract of thread local storage.
Either you have to use something other than ThreadLocal (e.g. a ConcurrentHashMap) to store the value, or you need to find a way to schedule an update on the threads in question.
You could use google guava's map maker to create a static final ConcurrentWeakReferenceIdentityHashmap with the following type: Map<Thread, Map<String, Object>> where the second map is a ConcurrentHashMap. That way you'd be pretty close to ThreadLocal except that you can iterate through the map.
I'm disappointed in the quality of the answers received for this question; I have found my own solution.
I wrote my test case today, and found the only issue with the code in my question is the Boolean. Boolean is not mutable, so my list of references wasn't doing me any good. I had a look at this question, and changed my code to use AtomicBoolean, and now everything works as expected.
public class ThreadLocalFlag {
private ThreadLocal<AtomicBoolean> flag;
private List<AtomicBoolean> allValues = new ArrayList<AtomicBoolean>();
public ThreadLocalFlag() {
flag = new ThreadLocal<AtomicBoolean>() {
#Override protected AtomicBoolean initialValue() {
AtomicBoolean value = new AtomicBoolean();
allValues.add(value);
return value;
}
};
}
public boolean get() {
return flag.get().get();
}
public void set(boolean value) {
flag.get().set(value);
}
public void setAll(boolean value) {
for (AtomicBoolean tlValue : allValues) {
tlValue.set(value);
}
}
}
Test case:
public class ThreadLocalFlagTest {
private static ThreadLocalFlag flag = new ThreadLocalFlag();
private static boolean runThread = true;
#AfterClass
public static void tearDownOnce() throws Exception {
runThread = false;
flag = null;
}
/**
* #throws Exception if there is any issue with the test
*/
#Test
public void testSetAll() throws Exception {
startThread("ThreadLocalFlagTest-1", false);
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
//ignore
}
startThread("ThreadLocalFlagTest-2", true);
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
//ignore
}
startThread("ThreadLocalFlagTest-3", false);
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
//ignore
}
startThread("ThreadLocalFlagTest-4", true);
try {
Thread.sleep(8000L); //watch the alternating values
} catch (InterruptedException e) {
//ignore
}
flag.setAll(true);
try {
Thread.sleep(8000L); //watch the true values
} catch (InterruptedException e) {
//ignore
}
flag.setAll(false);
try {
Thread.sleep(8000L); //watch the false values
} catch (InterruptedException e) {
//ignore
}
}
private void startThread(String name, boolean value) {
Thread t = new Thread(new RunnableCode(value));
t.setName(name);
t.start();
}
class RunnableCode implements Runnable {
private boolean initialValue;
RunnableCode(boolean value) {
initialValue = value;
}
#Override
public void run() {
flag.set(initialValue);
while (runThread) {
System.out.println(Thread.currentThread().getName() + ": " + flag.get());
try {
Thread.sleep(4000L);
} catch (InterruptedException e) {
//ignore
}
}
}
}
}

Implementation of "canonical" lock objects

I have a store of data objects and I wish to synchronize modifications that are related to one particular object at a time.
class DataStore {
Map<ID, DataObject> objects = // ...
// other indices and stuff...
public final void doSomethingToObject(ID id) { /* ... */ }
public final void doSomethingElseToObject(ID id) { /* ... */ }
}
That is to say, I do not wish my data store to have a single lock since modifications to different data objects are completely orthogonal. Instead, I want to be able to take a lock that pertains to a single data object only.
Each data object has a unique id. One way is to create a map of ID => Lock and synchronize upon the one lock object associated with the id. Another way is to do something like:
synchronize(dataObject.getId().toString().intern()) {
// ...
}
However, this seems like a memory leak -- the internalized strings may never be collected.
Yet another idea is to synchronize upon the data object itself; however, what if you have an operation where the data object doesn't exist yet? For example, what will a method like addDataObject(DataObject) synchronize upon?
In summary, how can I write a function f(s), where s is a String, such that f(s)==f(t) if s.equals(t) in a memory-safe manner?
Add the lock directly to this DataObject, you could define it like this:
public class DataObject {
private Lock lock = new ReentrantLock();
public void lock() { this.lock.lock(); }
public void unlock() { this.lock.unlock(); }
public void doWithAction( DataObjectAction action ) {
this.lock();
try {
action.doWithLock( this ) :
} finally {
this.unlock();
}
}
// other methods here
}
public interface DataObjectAction { void doWithLock( DataObject object ); }
And when using it, you could simply do it like this:
DataObject object = // something here
object.doWithAction( new DataObjectAction() {
public void doWithLock( DataObject object ) {
object.setProperty( "Setting the value inside a locked object" );
}
} );
And there you have a single object locked for changes.
You could even make this a read-write lock if you also have read operations happening while writting.
For such case, I normally have 2 level of lock:
First level as a reader-writer-lock, which make sure update to the map (add/delete) is properly synchronized by treating them as "write", and access to entries in map is considered as "read" on the map. Once accessed to the value, then synchronize on the value. Here is a little example:
class DataStore {
Map<ID, DataObject> objMap = // ...
ReadWritLock objMapLock = new ReentrantReadWriteLock();
// other indices and stuff...
public void addDataObject(DataObject obj) {
objMapLock.writeLock().lock();
try {
// do what u need, u may synchronize on obj too, depends on situation
objMap.put(obj.getId(), obj);
} finally {
objMapLock.writeLock().unlock();
}
}
public final void doSomethingToObject(ID id) {
objMapLock.readLock().lock();
try {
DataObject dataObj = this.objMap.get(id);
synchronized(dataObj) {
// do what u need
}
} finally {
objMapLock.readLock().unlock();
}
}
}
Everything should then be properly synchronized without sacrificing much concurrency
Yet another idea is to synchronize upon the data object itself; however, what if you have an operation where the data object doesn't exist yet? For example, what will a method like addDataObject(DataObject) synchronize upon?
Synchronizing on the object is probably viable.
If the object doesn't exist yet, then nothing else can see it. Provided that you can arrange that the object is fully initialized by its constructor, and that it is not published by the constructor before the constructor returns, then you don't need to synchronize it. Another approach is to partially initialize in the constructor, and then use synchronized methods to do the rest of the construction and the publication.

Categories