I'm writing a modificted Kademlia P2P system here but the problem I'm describing here is very similar to the implementation of the original one.
So, what's the most efficient way of implementing k-Buckets? What matters for me are access time, parallelism (read & write) and memory consuption.
Thought doing it with a ConcurrentLinkedQueue and a ConcurrentHashMap but that's pretty redundant and nasty, isn't it?
At the moment I'm simply synchronizing a LinkedList.
Here is my code:
import java.util.LinkedList;
class Bucket {
private final LinkedList<Neighbour> neighbours;
private final Object lock;
Bucket() {
neighbours = new LinkedList<>();
lock = new Object();
}
void sync(Neighbour n) {
synchronized(lock) {
int index = neighbours.indexOf(n);
if(index == -1) {
neighbours.add(n);
n.updateLastSeen();
} else {
Neighbour old = neighbours.remove(index);
neighbours.add(old);
old.updateLastSeen();
}
}
}
void remove(Neighbour n) {
synchronized(lock) {
neighbours.remove(n);
}
}
Neighbour resolve(Node n) throws ResolveException {
Neighbour nextHop;
synchronized(lock) {
int index = neighbours.indexOf(n);
if(index == -1) {
nextHop = neighbours.poll();
neighbours.add(nextHop);
return nextHop;
} else {
return neighbours.get(index);
}
}
}
}
Please don't wonder, I have implemented another neighbour eviction process.
So, what's the most efficient way of implementing k-Buckets?
That depends. If you want to do an implementation with bells and whistles (e.g. bucket splitting, multi-homing) then you need a flexible list or a tree.
In my experience a copy on write array + binary search works well for the routing table because you rarely modify the total number of buckets, just the content of buckets.
With CoW semantics you need less locking since you can just fetch a current copy of the array, retrieve the bucket of interest and then lock on the bucket. Or use an atomic array inside each bucket.
But of course such optimizations are only necessary if you expect high throughput, most DHT nodes see very little traffic, a few packets per second at most, i.e. there's no need to involve multiple threads unless you implement a specialized node that has so much throughput that multiple threads are needed to process the data.
CoW works less well for routing-table-like lookup caches or the ephemeral visited-node/target node sets built during a lookup since those get rapidly modified. ConcurrentSkipListMaps can be a better choice if you are expecting high load.
If you want a simplified, approximate implementation then just use a fixed-size array of 160 elements where the array index is the shared-prefix-bits count relative to your node ID. This performs reasonably well but doesn't allow for some of the optimizations proposed by the full kademlia paper.
Related
I have been asked to implement fine grained locking on a hashlist. I have done this using synchronized but the questions tells me to use Lock instead.
I have created a hashlist of objects in the constructor
private LinkedList<E> data[];;
private Lock lock[];
private Lock lockR = new ReentrantLock();
// The constructors ensure that both the data and the dataLock are the same size
#SuppressWarnings("unchecked")
public ConcurrentHashList(int n){
if(n > 1000) {
data = (LinkedList<E>[])(new LinkedList[n/10]);
lock = new Lock [n/10];
}
else {
data = (LinkedList<E>[])(new LinkedList[100]);
lock = new Lock [100]; ;
}
for(int j = 0; j < data.length;j++) {
data[j] = new LinkedList<E>();
lock[j] = new ReentrantLock();// Adding a lock to each bucket index
}
}
The original method
public void add(E x){
if(x != null){
lock.lock();
try{
int index = hashC(x);
if(!data[index].contains(x))
data[index].add(x);
}finally{lock.unlock();}
}
}
Using synchronization to grab a handle on the object hashlist to allow mutable Threads to work on mutable indexes concurrently.
public void add(E x){
if(x != null){
int index = hashC(x);
synchronized (dataLock[index]) { // Getting handle before adding
if(!data[index].contains(x))
data[index].add(x);
}
}
}
I do not know how to implement it using Lock though I can not lock a single element in a array only the whole method which means it is not coarse grained.
Using an array of ReentrantLock
public void add(E x){
if(x != null){
int index = hashC(x);
dataLock[index].lock();
try {
// Getting handle before adding
if(!data[index].contains(x))
data[index].add(x);
}finally {dataLock[index].unlock();}
}
}
The hash function
private int hashC(E x){
int k = x.hashCode();
int h = Math.abs(k % data.length);
return(h);
}
Presumably, hashC() is a function that is highly likely to produce unique numbers. As in, you have no guarantee that the hashes are unique, but the incidence of non-unique hashes is extremely low. For a data structure with a few million entries, you have a literal handful of collisions, and any given collision always consists of only a pair or maybe 3 conflicts (2 to 3 objects in your data structure have the same hash, but not 'thousands').
Also, assumption: the hash for a given object is constant. hashC(x) will produce the same value no matter how many times you call it, assuming you provide the same x.
Then, you get some fun conclusions:
The 'bucket' (The LinkedList instance found at array slot hashC(x) in data) that your object should go into, is always the same - you know which one it should be based solely on the result of hashC.
Calculating hashC does not require a lock of any sort. It has no side effects whatsoever.
Thus, knowing which bucket you need for a given operation on a single value (Be it add, remove, or check-if-in-collection) can be done without locking anything.
Now, once you know which bucket you need to look at / mutate, okay, now locking is involved.
So, just have 1 lock for each bucket. Not a List<Object> locks[];, that's a whole list worth of locks per bucket. Just Object[] locks is all you need, or ReentrantLock[] locks if you prefer to use lock/unlock instead of synchronized (lock[bucketIdx]) { ... }.
This is effectively fine-grained: After all, the odds that one operation needs to twiddle its thumbs because another thread is doing something, even though that other thread is operating on a different object, is very low; it would require the two different objects to have a colliding hash, which is possible, but extremely rare - as per assumption #1.
NB: Note that therefore lock can go away entirely, you don't need it, unless you want to build into your code that the code may completely re-design its bucket structure. For example, 1000 buckets feels a bit meh if you end up with a billion objects. I don't think 'rebucket everything' is part of the task here, though.
I came across the performance issue when implementing a data structure of non-duplicate concurrent ArrayList(or ConcurrentLinkedQueue).
public class NonDuplicateList implements Outputable {
private Map<Term, Integer> map;
private List<Term> terms;
public NonDuplicateList() {
this.map = new HashMap<>();
this.terms = new ArrayList<>();
}
public synchronized int addTerm(Term term) { //bad performance :(
Integer index = map.get(term);
if (index == null) {
index = terms.size();
terms.add(term);
map.put(term, index);
}
return index;
}
#Override
public void output(DataOutputStream out) throws IOException {
out.writeInt(terms.size());
for (Term term : terms) {
term.output(out);
}
}
}
Note that Term and NonDuplicateList both implement Outputable interface to output.
In order to keep NonDuplicateList thread-safe, I use synchronized to guard the method addTerm(Term) and the performance is as bad as expected, when currently invoking addTerm.
It seems that ConcurrentHashMap isn't suitable for this case, since it doesn't keep strong data consistency. Any idea how to improve the performance of addTerm without losing its thread-safety?
EDIT:
output method, i.e. iteration through NonDuplicateList, might not be thread-safe since only one thread will access this method after concurrently invoking addTerm, but addTerm must return the index value immediately as soon as a term is added into the NonDuplicateList.
There is a possibility to reuse ConcurrentHashMap in your implementation if you can sacrifice addTerm return type. Instead of returning actual index you can return boolean which indicates whether addition was successful or produced duplicate. This will also allow you to remove method synchronization and improve performance:
private ConcurrentMap<Term, Boolean> map;
private List<Term> terms;
public boolean addTerm(Term term) {
Boolean previousValue = map.putIfAbsent(term, Boolean.TRUE);
if (previousValue == null) {
terms.add(term);
return true;
}
return false;
}
I am afraid you will not get much faster solution here. The point is to avoid synchronization when you don't need it. If you don't mind weak consistency, using ConcurrentHashMap iterator can be significantly cheaper than either preventing other threads from adding items while you're iterating or taking a consistent snapshot when the iterator is created.
On the other hand, when you need synchronization and a consistent iterator, you'll need an alternative for ConcurrentHashMap. One that comes to my mind is java.util.Collections#synchronizedMap, but it's using synchronization at Object level, so every read/write operation needs to acquire lock, which is a performance overhead.
Take a look at ConcurrentSkipListMap, which guarantees average O(log(n)) performance on a wide variety of operations. It also has a number of operations that ConcurrentHashMap doesn't: ceilingEntry/Key, floorEntry/Key, etc. It also maintains a sort order, which would otherwise have to be calculated (at notable expense) if you were using a ConcurrentHashMap. Maybe it would be possible to get rid of list+map and use ConcurrentSkipListMap instead. Index of element might be computed using ConcurrentSkipListMap api.
The requirement is that, I need to write an ArrayList of integers. I need thread-safe access of the different integers (write, read, increase, decrease), and also need to allow maximum concurrency.
The operation with each integer is also special, like this:
Mostly frequent operation is to read
Secondly frequent operation is to decrease by one only if the value is greater than zero. Or, to increase by one (unconditionally)
Adding/removing elements is rare, but still needed.
I thought about AtomicInteger. However this becomes unavailable, because the atomic operation I want is to compare if not zero, then decrease. However the atomic operation provided by AtomicInteger, is compare if equal, and set. If you know how to apply AtomicInteger in this case, please raise it here.
What I am thinking is to synchronized the access to each integer like this:
ArrayList <Integer> list;
... ...
// Compare if greater than zero, and decrease
MutableInt n = list.get(index);
boolean success = false;
synchronized (n) {
if (n.intValue()>0) { n.decrement(); success=true; }
}
// To add one
MutableInt n = list.get(index);
synchronized (n) {
n.increment();
}
// To just read, I am thinking no need synchronization at all.
int n = list.get(index).intValue();
With my solution, is there any side-effect? Is it efficient to maintain hundreds or even thousands of synchronized integers?
Update: I am also thinking that allowing concurrent access to every element is not practical and not beneficial, as the actual concurrent access is limited by the number of processors. Maybe I just use several synchronization objects to guard different portions of the List, then it is enough?
Then another thing is to implement the operation of add/delete, that it is thread-safe, but do not impact much of the concurrency of the other operations. I am thinking ReadWriteLock, for add/delete, need to acquire the write lock, for other operations (change the value of one integer), acquire the read lock. Is this a right approach?
I think you're right to use read lock for accessing the list and write lock for add/remove on the list.
You can still use AtomicInteger for the values:
// Increase value
value.incrementAndGet()
// Decrease value, lower bound is 0
do {
int num = value.get();
if (num == 0)
break;
} while (! value.compareAndSet(num, num - 1)); // try again if concurrently updated
I think, if you can live with a fixed size list, using a single AtomicIntegerArray is a better choice than using multiple AtomicIntegers:
public class AtomicIntList extends AbstractList<Integer> {
private final AtomicIntegerArray array;
public AtomicIntList(int size) {
array=new AtomicIntegerArray(size);
}
public int size() {
return array.length();
}
public Integer get(int index) {
return array.get(index);
}
// for code accessing this class directly rather than using the List interface
public int getAsInt(int index) {
return array.get(index);
}
public Integer set(int index, Integer element) {
return array.getAndSet(index, element);
}
// for code accessing this class directly rather than using the List interface
public int setAsInt(int index, int element) {
return array.getAndSet(index, element);
}
public boolean decrementIfPositive(int index) {
for(;;) {
int old=array.get(index);
if(old<=0) return false;
if(array.compareAndSet(index, old, old-1)) return true;
}
}
public int incrementAndGet(int index) {
return array.incrementAndGet(index);
}
}
Code accessing this class directly rather than via the List<Integer> interface may use the methods getAsInt and setAsInt to avoid boxing conversions. This is a common pattern. Since the methods decrementIfPositive and incrementAndGet are not part of the List interface anyway, they always use int values.
As an update of this question... I found out that the simplest solution, just synchronizing the entire code-block for all possible conflict methods, turns out to be the best, even from performance's point of view. Synchronizing the code block solves both issues - accessing each counters, and also add/delete elements to the counter list.
This is because ReentrantReadWriteLock has a really high overhead, even when only the read lock is applied. Comparing to the overhead of read/write lock, the cost of the operation itself is so tiny, that any additional locking is not worthy doing that.
The statement in the API doc of ReentrantReadWriteLock shall be highly put attention to: "ReentrantReadWriteLocks... is typically worthwhile only when ... and entail operations with overhead that outweighs synchronization overhead".
I have read that in concurrent hashmap in Java, simultaneous insertions are possible because it is divided into segments and separate lock is taken for each segment.
But if two insertions are going to happen on same segment, then these simultaneous will not happen.
My question is what will happen in such a case? Will second insertion waits till first one gets completed or what?
In general you don't need be too concerned how ConcurrentHashMap is implemented. It simply complies to the the contract of ConcurrentMap which ensures that concurrent modifications are possible.
But to answer your question: yes, one insertion may wait for completion of the other one. Internally, it uses locks which ensure that one thread is waiting until the other one releases the lock. Class Segment used internally actually inherits from ReentrantLock. Here is a shortened version of Segmenet.put():
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
HashEntry<K,V> node = tryLock() ? null : scanAndLockForPut(key, hash, value);
V oldValue;
try {
// modifications
} finally {
unlock();
}
return oldValue;
}
private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
// ...
int retries = -1; // negative while locating node
while (!tryLock()) {
if (retries < 0) {
// ...
}
else if (++retries > MAX_SCAN_RETRIES) {
lock();
break;
}
else if ((retries & 1) == 0 && (f = entryForHash(this, hash)) != first) {
e = first = f; // re-traverse if entry changed
retries = -1;
}
}
return node;
}
This could give you an idea.
ConcurrentHashMap does not block when performing retrieval operations, and there is no locking for the usual operations.
The heuristic with most Concurrent Data Structures is that there's a backing data structure that gets modified first, with a front-facing data structure that's visible to outside methods. Then, when the modification is complete, the backing data structure is made the public data structure and the public data structure is pushed to the back. There's way more to it than that, but that's the typical contract.
If 2 updates try to happen on the same segment they will go into contention with each other and one of them will have to wait. You can optimise this by choosing a concurrencyLevel value which takes into account the number of threads which will be concurrently updating the hashmap.
You can find all the details in the javadoc for the class
ConcurrentHashMap contains array of Segment which in turn holds array of HashEntry. Each HashEntry holds a key, a value, and a pointer to it's next adjacent entry.
But it acquires the lock in segment level. Hence you are correct. i.e second insertion waits till first one gets completed
Take a look at the javadoc for ConcurrentMap. It describes the extra methods available to deal with concurrent map mutations.
I have a multithreaded application, where a shared list has write-often, read-occasionally behaviour.
Specifically, many threads will dump data into the list, and then - later - another worker will grab a snapshot to persist to a datastore.
This is similar to the discussion over on this question.
There, the following solution is provided:
class CopyOnReadList<T> {
private final List<T> items = new ArrayList<T>();
public void add(T item) {
synchronized (items) {
// Add item while holding the lock.
items.add(item);
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>();
synchronized (items) {
// Make a copy while holding the lock.
for (T t : items) copy.add(t);
}
return copy;
}
}
However, in this scenario, (and, as I've learned from my question here), only one thread can write to the backing list at any given time.
Is there a way to allow high-concurrency writes to the backing list, which are locked only during the makeSnapshot() call?
synchronized (~20 ns) is pretty fast and even though other operations can allow concurrency, they can be slower.
private final Lock lock = new ReentrantLock();
private List<T> items = new ArrayList<T>();
public void add(T item) {
lock.lock();
// trivial lock time.
try {
// Add item while holding the lock.
items.add(item);
} finally {
lock.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(), ret;
lock.lock();
// trivial lock time.
try {
ret = items;
items = copy;
} finally {
lock.unlock();
}
return ret;
}
public static void main(String... args) {
long start = System.nanoTime();
Main<Integer> ints = new Main<>();
for (int j = 0; j < 100 * 1000; j++) {
for (int i = 0; i < 1000; i++)
ints.add(i);
ints.makeSnapshot();
}
long time = System.nanoTime() - start;
System.out.printf("The average time to add was %,d ns%n", time / 100 / 1000 / 1000);
}
prints
The average time to add was 28 ns
This means if you are creating 30 million entries per second, you will have one thread accessing the list on average. If you are creating 60 million per second, you will have concurrency issues, however you are likely to be having many more resourcing issue at this point.
Using Lock.lock() and Lock.unlock() can be faster when there is a high contention ratio. However, I suspect your threads will be spending most of the time building the objects to be created rather than waiting to add the objects.
You could use a ConcurrentDoublyLinkedList. There is an excellent implementation here ConcurrentDoublyLinkedList.
So long as you iterate forward through the list when you make your snapshot all should be well. This implementation preserves the forward chain at all times. The backward chain is sometimes inaccurate.
First of all, you should investigate if this really is too slow. Adds to ArrayLists are O(1) in the happy case, so if the list has an appropriate initial size, CopyOnReadList.add is basically just a bounds check and an assignment to an array slot, which is pretty fast. (And please, do remember that CopyOnReadList was written to be understandable, not performant.)
If you need a non-locking operation, you can have something like this:
class ConcurrentStack<T> {
private final AtomicReference<Node<T>> stack = new AtomicReference<>();
public void add(T value){
Node<T> tail, head;
do {
tail = stack.get();
head = new Node<>(value, tail);
} while (!stack.compareAndSet(tail, head));
}
public Node<T> drain(){
// Get all elements from the stack and reset it
return stack.getAndSet(null);
}
}
class Node<T> {
// getters, setters, constructors omitted
private final T value;
private final Node<T> tail;
}
Note that while adds to this structure should deal pretty well with high contention, it comes with several drawbacks. The output from drain is quite slow to iterate over, it uses quite a lot of memory (like all linked lists), and you also get things in the opposite insertion order. (Also, it's not really tested or verified, and may actually suck in your application. But that's always the risk with using code from some random dude on the intertubes.)
Yes, there is a way. It is similar to the way ConcurrentHashMap made, if you know.
You should make your own data structure not from one list for all writing threads, but use several independent lists. Each of such lists should be guarded by it's own lock. .add() method should choose list for append current item based on Thread.currentThread.id (for example, just id % listsCount). This will gives you good concurrency properties for .add() -- at best, listsCount threads will be able to write without contention.
On makeSnapshot() you should just iterate over all lists, and for each list you grab it's lock and copy content.
This is just an idea -- there are many places to improve it.
You can use a ReadWriteLock to allow multiple threads to perform add operations on the backing list in parallel, but only one thread to make the snapshot. While the snapshot is being prepared all other add and snapshot request are put on hold.
A ReadWriteLock maintains a pair of associated locks, one for
read-only operations and one for writing. The read lock may be held
simultaneously by multiple reader threads, so long as there are no
writers. The write lock is exclusive.
class CopyOnReadList<T> {
// free to use any concurrent data structure, ConcurrentLinkedQueue used as an example
private final ConcurrentLinkedQueue<T> items = new ConcurrentLinkedQueue<T>();
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock shared = rwLock.readLock();
private final Lock exclusive = rwLock.writeLock();
public void add(T item) {
shared.lock(); // multiple threads can attain the read lock
// try-finally is overkill if items.add() never throws exceptions
try {
// Add item while holding the lock.
items.add(item);
} finally {
shared.unlock();
}
}
public List<T> makeSnapshot() {
List<T> copy = new ArrayList<T>(); // probably better idea to use a LinkedList or the ArrayList constructor with initial size
exclusive.lock(); // only one thread can attain write lock, all read locks are also blocked
// try-finally is overkill if for loop never throws exceptions
try {
// Make a copy while holding the lock.
for (T t : items) {
copy.add(t);
}
} finally {
exclusive.unlock();
}
return copy;
}
}
Edit:
The read-write lock is so named because it is based on the readers-writers problem not on how it is used. Using the read-write lock we can have multiple threads achieve read locks but only one thread achieve the write lock exclusively. In this case the problem is reversed - we want multiple threads to write (add) and only thread to read (make the snapshot). So, we want multiple threads to use the read lock even though they are actually mutating. Only thread is exclusively making the snapshot using the write lock even though snapshot only reads. Exclusive means that during making the snapshot no other add or snapshot requests can be serviced by other threads at the same time.
As #PeterLawrey pointed out, the Concurrent queue will serialize the writes aqlthough the locks will be used for as minimal a duration as possible. We are free to use any other concurrent data structure, e.g. ConcurrentDoublyLinkedList. The queue is used only as an example. The main idea is the use of read-write locks.