I have code similar to following:
public class Cache{
private final Object lock = new Object();
private HashMap<Integer, TreeMap<Long, Integer>> cache =
new HashMap<Integer, TreeMap<Long, Integer>>();
private AtomicLong FREESPACE = new AtomicLong(102400);
private void putInCache(TreeMap<Long, Integer> tempMap, int fileNr){
int length; //holds the length of data in tempMap
synchronized(lock){
if(checkFreeSpace(length)){
cache.get(fileNr).putAll(tmpMap);
FREESPACE.getAndAdd(-length);
}
}
}
private boolean checkFreeSpace(int length){
while(FREESPACE.get() < length && thereIsSomethingToDelete()){
// deleteSomething returns the length of deleted data or 0 if
// it could not delete anything
FREESPACE.getAndAdd(deleteSomething(length));
}
if(FREESPACE.get() < length) return true;
return false;
}
}
putInCache is called by about 139 threads a second. Can I be sure that these two methods will synchronize on both cache and FREESPACE? Also, is checkFreeSpace() multithread-safe i.e can I be sure that there will be only one invocation of this method at a time? Can the "multithread-safety" of this code be improved?
To have your question answered fully, you would need to show the implementations of the thereIsSomethingToDelete() and deleteSomething() methods.
Given that checkFreeSpace is a public method (does it really need to be?), and is unsynchronized, it is possible it could be called by another thread while the synchronized block in the putInCache() method is running. This by itself might not break anything, since it appears that the checkFreeSpace method can only increase the amount of free space, not reduce it.
What would be more serious (and the code sample doesn't allow us to determine this) is if the thereIsSomethingToDelete() and deleteSomething() methods don't properly synchronize their access to the cache object, using the same Object lock as used by putInCache().
You don't usually synchronize on the fields you want to control access to directly.
The fields that you want to synchronize access to must only be accessed from within synchronized blocks (on the same object) to be considered thread safe. You are already doing this in putInCache().
Therefore, because checkFreeSpace() accesses shared state in an unsynchronized fashion, it is not thread safe.
Related
I need to provide thread-safe implementation of the following container:
public interface ParameterMetaData<ValueType> {
public String getName();
}
public interface Parameters {
public <M> M getValue(ParameterMetaData<M> pmd);
public <M> void put(ParameterMetaData<M> p, M value);
public int size();
}
The thing is the size method should return the accurate number of paramters currently contained in a Parameters instance. So, my first attempt was to try delegating thread-safety as follows:
public final class ConcurrentParameters implements Parameters{
private final ConcurrentMap<ParameterMetaData<?>, Object> parameters =
new ConcurrentHashMap<>();
//Should represent the ACCURATE size of the internal map
private final AtomicInteger size = new AtomicInteger();
#Override
public <M> M getValue(ParameterMetaData<M> pmd) {
#SuppressWarnings("unchecked")
M value = (M) parameters.get(pmd);
return value;
}
#Override
public <M> void put(ParameterMetaData<M> p, M value){
if(value == null)
return;
//The problem is in the code below
M previous = (M) parameters.putIfAbsent(p, value);
if(previous != null)
//throw an exception indicating that the parameter already exists
size.incrementAndGet();
}
#Override
public int size() {
return size.intValue();
}
The problem is that I can't just call parameters.size() on the ConcurrentHashMap instance to return the actual size, as that the operation performs traversal without locking and there's no guaratee that it will retrieve the actual size. It isn't acceptable in my case. So, I decided to maintain the field containing the size.
QUESTION: Is it possible somehow to delegate thread safety and preserve the invariatns?
The outcome you want to achieve is non-atomic. You want to modify map and then get count of elements that would be consistent in a scope of single thread. The only way to achieve that is to make this flow "atomic operation" by synchronizing access to the map. This is the only way to assure that count will not change due to modifications made in another thread.
Synchronize modify-count access to the map via synchronized or Semaphore to allow only single thread to modify map and count elements at the time.
Using additional field as a counter does not guarantee thread safety here, as after map modification and before counter manipulation, other thread can in fact modify map, and the counter value will not be valid.
This is the reason why map does not keeps its size internally but has to traversal over elements - to give most accurate results at given point in time.
EDIT:
To be 100% clear, this is the most convinient way to achieve this:
synchronized(yourMap){
doSomethingWithTheMap();
yourMap.size();
}
so if you will change every map operation to such block, you will guarantee that size() will return accurate count of elements. The only condition is that all data manipulations are done using such synchronized block.
I am going to parallelize some code with some global variables.
I am going to use ReentrantReadWriteLock.
Did I understand it right, that I need one own instance of ReentrantReadWriteLock per variable I want to make thread safe?
I mean, when I have two lists where every thread can attach an item and all threads are sometimes reading items from that lists.
In that case I would implement something like:
private static String[] globalVariables = null;
private static String[] processedItems = null;
private final ReentrantReadWriteLock globalVariablesLock = new ReentrantReadWriteLock();
private final Lock globalVariablesEeadLock = globalVariablesLock .readLock();
private final Lock globalVariablesWriteLock = globalVariablesLock .writeLock();
private final ReentrantReadWriteLock processedItemsLock = new ReentrantReadWriteLock();
private final Lock processedItemsLockReadLock = processedItemsLock .readLock();
private final Lock processedItemsLockWriteLock = processedItemsLock .writeLock();
What if I have much more variables like databaseconnection(pool)s, loggers, further lists, etc.
Do I need to make a new ReentrantReadWriteLock or do I missing something?
Samples on the internet only handles one variable.
Thanks in advance.
What are you trying to protect?
Don't think of locking variables. The purpose of a lock is to protect an invariant. An invariant is some assertion that you can make about the state of your program that must always be true. An example might be, "the sum of variables A, B, and C will always be zero."
In that case, it doesn't do you any good to have separate locks for A, B, and C. You want one lock that protects that particular invariant. Any thread that wants to change A, B, or C must lock that lock, and any thread that depends on their sum being zero must lock that same lock.
Often it is not possible for a thread to make progress without temporarily breaking some invariant. E.g.,
A += 1; //breaks the invariant
B -= 1; //fixes it again.
Without synchronization, some other thread could examine A, B, and C in-between those two statements, and find the invariant broken.
With synchronization:
private final Object zeroSumLock = new Object();
void bumpA() {
synchronized(zeroSumLock) {
A += 1;
B -= 1;
}
}
boolean verifySum() {
synchronized(zeroSumLock) {
return (A+B+C) == 0;
}
}
Yes, you should have one Lock per thread-safe variable (arrays in your case). However, consider using
ArrayList<String> syncList = Collections.synchronizedList(new ArrayList<String>());
instead of arrays. It is usually way better when you delegate to the library (in this case, not only the synchronization but also the resize of the arrays). Of course, before doing it check that the library does exactly what you would expect (in this case, as #SashaSalauyou pointed out, you'd lose the ability to read concurrently).
One of solutions is creating an immutable Map, where you put locks for all items you need:
final static Map<String, ReadWriteLock> locks = Collections.unmodifiableMap(
new HashMap<String, ReadWriteLock>() {{
put("globalVariables", new ReentrantReadWriteLock());
put("processedItems", new ReentrantReadWriteLock());
// rest items
}}
);
As HashMap is wrapped by Collections.unmodifiableMap() and thus cannot be modified, it becomes thread-safe.
Then, in code:
Lock lo = locks.get("globalVariables").readLock();
lo.acquire();
try {
// ...
} catch (Exception e) {
// ...
} finally {
lo.release();
}
I need to keep track of multiple values against unique keys i.e. 1(a,b) 2(c,d) etc...
The solution is accessed by multiple threads so effectively I have the following defined;
ConcurrentSkipListMap<key, ConcurrentSkipListSet<values>>
My question is does the removal of the key when the value set size is 0 need to be synchronized? I know that the two classes are "concurrent" and I've looked through the OpenJDK source code but I there would appear to be a window between one thread T1 checking that the Set is empty and removing the Map in remove(...) and another thread T2 calling add(...). Result being T1 removes last Set entry and removes the Map interleaved with T2 just adding a Set entry. Thus the Map and T2 Set entry are removed by T1 and data is lost.
Do I just "synchronize" the add() and remove() methods or is there a "better" way?
The Map is modified by multiple threads but only through two methods.
Code snippet as follows;
protected static class EndpointSet extends U4ConcurrentSkipListSet<U4Endpoint> {
private static final long serialVersionUID = 1L;
public EndpointSet() {
super();
}
}
protected static class IDToEndpoint extends U4ConcurrentSkipListMap<String, EndpointSet> {
private static final long serialVersionUID = 1L;
protected Boolean add(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
endpoints = new EndpointSet();
put(id, endpoints);
}
endpoints.add(endpoint);
return true;
}
protected Boolean remove(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
return false;
} else {
endpoints.remove(endpoint);
if (endpoints.size() == 0) {
remove(id);
}
return true;
}
}
}
As it is your code has data races. Examples of what could happen:
a thread could add between if (endpoints.size() == 0) and remove(id); - you saw that
in add, a thread could read a non null value in EndpointSet endpoints = get(id); and another thread could remove data from that set, remove the set from the map because the set is empty. The initial thread would then add a value to the set, which is not held in the map any longer => data gets lost too as it becomes unreachable.
The easiest way to solve your issue is to make both add and remove synchronized. But you then lose all the performance benefits of using a ConcurrentMap.
Alternatively, you could simply leave the empty sets in the map - unless you have memory constraints. You would still need some form of synchronization but it would be easier to optimise.
If contention (performance) is an issue, you could try a more fine grained locking strategy by synchronizing on the keys or values but it could be quite tricky (and locking on Strings is not such a good idea because of String pooling).
It seems that in all cases, you could use a non concurrent set as you will need to synchronize it externally yourself.
Let's say I have a HashMap declared as follows:
#GuardedBy("pendingRequests")
private final Map<UInt32, PendingRequest> pendingRequests = new HashMap<UInt32, PendingRequest>();
Access to the map is multi-threaded, and all access is guarded by synchronizing on this final instance of the map, e.g.:
synchronized (pendingRequests) {
pendingRequests.put(reqId, request);
}
Is this enough? Should the map be created using Collections.synchronizedMap()? Should I be locking on a dedicated lock object instead of the map instance? Or maybe both?
External synchronization (in addition to possibly using Collections.synchronizedMap()) is needed in a couple areas where multiple calls on the map must be atomic.
Synchronizing on the map itself is essentially what the Map returned by Collection.synchronizedMap() would do. For your situation it is a reasonable approach, and there is not much to recommend using a separate lock object other than personal preference (or if you wish to have more fine grained control and use a ReentrantReadWriteLock to allow concurrent reading of the map).
E.g.
private Map<Integer,Object> myMap;
private ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
public void myReadMethod()
{
rwl.readLock().lock();
try
{
myMap.get(...);
...
} finally
{
rwl.readLock().unlock();
}
}
public void myWriteMethod()
{
// may want / need to call rwl.readLock().unlock() here,
// since if you are holding the readLock here already then
// you cannot get the writeLock (so be careful on how your
// methods lock/unlock and call each other).
rwl.writeLock().lock();
try
{
myMap.put(key1,item1);
myMap.put(key2,item2);
} finally
{
rwl.writeLock().unlock();
}
}
All calls to the map need to be synchronized, and Collections.synchronizedMap() gives you that.
However, there is also an aspect of compound logic. If you need the integrity of the compound logic, synchronization of individual calls is not enough. For example, consider the following code:
Object value = yourMap.get(key); // synchronized
if (value == null) {
// do more action
yourMap.put(key, newValue); // synchronized
}
Although individual calls (get() and put()) are synchronized, your logic will not be safe against concurrent access.
Another interesting case is when you iterate. For an iteration to be safe, you'd need to synchronize for the entire duration of the iteration, or you will get ConcurrentModificationExceptions.
I have a multithreaded application, where a shared list has write-often, read-occasionally behaviour.
Specifically, many threads will dump data into the list, and then - later - another worker will grab a snapshot to persist to a datastore.
This is similar to the discussion over on this question.
There, the following solution is provided:
class CopyOnReadList<T> {
private final List<T> items = new ArrayList<T>();
public void add(T item) {
synchronized (items) {
// Add item while holding the lock.
items.add(item);
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>();
synchronized (items) {
// Make a copy while holding the lock.
for (T t : items) copy.add(t);
}
return copy;
}
}
However, in this scenario, (and, as I've learned from my question here), only one thread can write to the backing list at any given time.
Is there a way to allow high-concurrency writes to the backing list, which are locked only during the makeSnapshot() call?
synchronized (~20 ns) is pretty fast and even though other operations can allow concurrency, they can be slower.
private final Lock lock = new ReentrantLock();
private List<T> items = new ArrayList<T>();
public void add(T item) {
lock.lock();
// trivial lock time.
try {
// Add item while holding the lock.
items.add(item);
} finally {
lock.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(), ret;
lock.lock();
// trivial lock time.
try {
ret = items;
items = copy;
} finally {
lock.unlock();
}
return ret;
}
public static void main(String... args) {
long start = System.nanoTime();
Main<Integer> ints = new Main<>();
for (int j = 0; j < 100 * 1000; j++) {
for (int i = 0; i < 1000; i++)
ints.add(i);
ints.makeSnapshot();
}
long time = System.nanoTime() - start;
System.out.printf("The average time to add was %,d ns%n", time / 100 / 1000 / 1000);
}
prints
The average time to add was 28 ns
This means if you are creating 30 million entries per second, you will have one thread accessing the list on average. If you are creating 60 million per second, you will have concurrency issues, however you are likely to be having many more resourcing issue at this point.
Using Lock.lock() and Lock.unlock() can be faster when there is a high contention ratio. However, I suspect your threads will be spending most of the time building the objects to be created rather than waiting to add the objects.
You could use a ConcurrentDoublyLinkedList. There is an excellent implementation here ConcurrentDoublyLinkedList.
So long as you iterate forward through the list when you make your snapshot all should be well. This implementation preserves the forward chain at all times. The backward chain is sometimes inaccurate.
First of all, you should investigate if this really is too slow. Adds to ArrayLists are O(1) in the happy case, so if the list has an appropriate initial size, CopyOnReadList.add is basically just a bounds check and an assignment to an array slot, which is pretty fast. (And please, do remember that CopyOnReadList was written to be understandable, not performant.)
If you need a non-locking operation, you can have something like this:
class ConcurrentStack<T> {
private final AtomicReference<Node<T>> stack = new AtomicReference<>();
public void add(T value){
Node<T> tail, head;
do {
tail = stack.get();
head = new Node<>(value, tail);
} while (!stack.compareAndSet(tail, head));
}
public Node<T> drain(){
// Get all elements from the stack and reset it
return stack.getAndSet(null);
}
}
class Node<T> {
// getters, setters, constructors omitted
private final T value;
private final Node<T> tail;
}
Note that while adds to this structure should deal pretty well with high contention, it comes with several drawbacks. The output from drain is quite slow to iterate over, it uses quite a lot of memory (like all linked lists), and you also get things in the opposite insertion order. (Also, it's not really tested or verified, and may actually suck in your application. But that's always the risk with using code from some random dude on the intertubes.)
Yes, there is a way. It is similar to the way ConcurrentHashMap made, if you know.
You should make your own data structure not from one list for all writing threads, but use several independent lists. Each of such lists should be guarded by it's own lock. .add() method should choose list for append current item based on Thread.currentThread.id (for example, just id % listsCount). This will gives you good concurrency properties for .add() -- at best, listsCount threads will be able to write without contention.
On makeSnapshot() you should just iterate over all lists, and for each list you grab it's lock and copy content.
This is just an idea -- there are many places to improve it.
You can use a ReadWriteLock to allow multiple threads to perform add operations on the backing list in parallel, but only one thread to make the snapshot. While the snapshot is being prepared all other add and snapshot request are put on hold.
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
class CopyOnReadList<T> {
// free to use any concurrent data structure, ConcurrentLinkedQueue used as an example
private final ConcurrentLinkedQueue<T> items = new ConcurrentLinkedQueue<T>();
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock shared = rwLock.readLock();
private final Lock exclusive = rwLock.writeLock();
public void add(T item) {
shared.lock(); // multiple threads can attain the read lock
// try-finally is overkill if items.add() never throws exceptions
try {
// Add item while holding the lock.
items.add(item);
} finally {
shared.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(); // probably better idea to use a LinkedList or the ArrayList constructor with initial size
exclusive.lock(); // only one thread can attain write lock, all read locks are also blocked
// try-finally is overkill if for loop never throws exceptions
try {
// Make a copy while holding the lock.
for (T t : items) {
copy.add(t);
}
} finally {
exclusive.unlock();
}
return copy;
}
}
Edit:
The read-write lock is so named because it is based on the readers-writers problem not on how it is used. Using the read-write lock we can have multiple threads achieve read locks but only one thread achieve the write lock exclusively. In this case the problem is reversed - we want multiple threads to write (add) and only thread to read (make the snapshot). So, we want multiple threads to use the read lock even though they are actually mutating. Only thread is exclusively making the snapshot using the write lock even though snapshot only reads. Exclusive means that during making the snapshot no other add or snapshot requests can be serviced by other threads at the same time.
As #PeterLawrey pointed out, the Concurrent queue will serialize the writes aqlthough the locks will be used for as minimal a duration as possible. We are free to use any other concurrent data structure, e.g. ConcurrentDoublyLinkedList. The queue is used only as an example. The main idea is the use of read-write locks.