Efficiency of HashTable and Collections.synchronized(HashMap)

Efficiency of HashTable and Collections.synchronized(HashMap) - java

I was going to implement some code which needs a synchronized data structure. I came up with HashTable and Collections.synchronized(HashMap). I wouldn't be needing ConcurrentHashMap for this. I was wondering which one of the two would be better.
PS : I will be calling a lot of getter of this object and they would not be at the same time. So their is no problem with concurrency issue also.

ConcurrentHashMap is much more scaleable: http://www.javamex.com/tutorials/concurrenthashmap_scalability.shtml
HashTable and Collections.synchronized(HashMap) provide with the same performance, but they are conditionally thread-safe (i.e. they are not fully thread-safe)
If there are a lot read operations, I would recommend to wrap it with read-write locks:
public class MyHashMap<K, V> extends HashMap<K, V> {
private final ReadWriteLock lock = new ReentrantReadWriteLock();
#Override
public V put(K key, V value) {
final Lock w = lock.writeLock();
w.lock();
try {
return super.put(key, value);
} finally {
w.unlock();
}
}
#Override
public V get(Object key) {
final Lock r = lock.readLock();
r.lock();
try {
return super.get(key);
} finally {
r.unlock();
}
}
.... // the same approach distinguishing read and write operations
}
UPDATE:
I will be calling a lot of getter of this object and they would not be at the same time
It doesn't guarantee that you don't need synchronization.

Unless you need to acquire a lock on the whole map for some reason (unlikely) you should go for ConcurrentHashMap which gives much better scalability.
HashTable and the synchronized wrapper (Collections.synchronized(HashMap)) use one lock whereas ConcurrentHashMap partitions the map in 16 segments by default, each having its own lock, which gives much better concurrent access.

Although HashTable is thread-safe but it doesn't guarantee that it make your whole code thread-safe.HashTable also has some performance issue. So you should use HashMap but you have to manage all thread-safety yourself.

Related

ConcurrentHashMap and ReentrantReadWriteLock

I have one thread that updates data in Map and several threads that read this data. Now my code looks like this:
public class Updater {
private ConcurrentMap<String, Integer> valuesMap = new ConcurrentHashMap<>();
private ReadWriteLock reentrantReadWriteLock = new ReentrantReadWriteLock();
public void update(Settings settings) {
reentrantReadWriteLock.writeLock().lock();
try {
for (Map.Entry<String, Integer> entry : valuesMap.entrySet()) {
valuesMap.put(entry.getKey(),
entry.getValue() + settings.getUpdateValue());
}
} finally {
reentrantReadWriteLock.writeLock().unlock();
}
}
public Integer getValue(String key) {
reentrantReadWriteLock.readLock().lock();
try {
return valuesMap.get(key);
} finally {
reentrantReadWriteLock.readLock().unlock();
}
}
}
But I think I overdid it. Can I use only ConcurrentHashMap in this situation?

Can I use only ConcurrentHashMap in this situation?
It depends on whether you want the update operation to be atomic; i.e. whether you want a reader to never see the state of the map when only some of the put operations have been performed.
If update doesn't need to be atomic, then locking is unnecessary. In fact if is an unwanted concurrency bottleneck.
If update needs to be atomic, then the lock is necessary. But if you are controlling access with locks, then you don't need to use a ConcurrentHashMap. A simple HashMap would be better.
I don't think that ConcurrentHashMap has a way to implement multiple put operations as an atomic action.

Applying lock on HashMap

I have a HashMap which is static and three threads which try to access HashMap simultaneously from their corresponding class`s.
each thread task is get list value of a specified key, process some operations on the list(modify the list). and put the processed list in HashMap.
I want to make other threads trying to access the HashMap wait until current thread finishes the processing and modifying the HashMap.
in some situation, the flow is like this,
thread A is retrieved HashMap, while Thread A is processing on the list of HashMap, other Thread B retrieves the HashMap and starts its processing.
Actual behaviour has to be like:
Thread A -> retrieves HashMap -> process -> put value in HashMap.
Thread B -> retrieves HashMap -> process -> put value in HashMap.
Thread C -> retrieves HashMap -> process -> put value in HashMap.
logic :
apply lock on HashMap
retrieve.
process.
put into HashMap.
release lock.
help me in converting the logic to code, or any suggestions are accepted with smile.

You can really make use the ReentrantReadWriteLock. Here is the link for that.
Javadoc for ReadWriteReentrant lock
I would implement the feature as something like this..........
public class Test {
private Map<Object, Object> map = new HashMap<>();
private ReentrantReadWriteLock reentrantReadWriteLock = new ReentrantReadWriteLock();
public void process() {
methodThatModifiesMap();
methodThatJustReadsmap();
}
private void methodThatModifiesMap() {
//if the code involves modifying the structure of the map like 'put(), remove()' i will acquire the write reentrantReadWriteLock
reentrantReadWriteLock.writeLock().lock();
try {
//DO your thing and put() or remove from map
}
finally {
//Dont forget to unlock
reentrantReadWriteLock.writeLock().unlock();
}
}
private void methodThatJustReadsmap() {
// if all you are doing is reading ie 'get()'
reentrantReadWriteLock.readLock().lock(); // this does not block other reads from other threads as long as there is no writes during this thread's read
try {
} finally {
reentrantReadWriteLock.readLock().unlock();
}
}
}
Not only your map is thread-safe, the throughput is better too.

You can use ConcurrentHashMap instead of HashMap. The ConcurrentHashMap gives better performance and reduces overhead of locking the whole HashMap while other thread is accessing it.
You can find more details on this page as well - http://crunchify.com/hashmap-vs-concurrenthashmap-vs-synchronizedmap-how-a-hashmap-can-be-synchronized-in-java/

You can either use ConcurrentHashMap as suggested above or use class level locks.What I mean by it is by using synchronized keyword on static method.eg
public class SynchronizedExample extends Thread {
static HashMap map = new HashMap();
public synchronized static void execute() {
//Modify and read HashMap
}
public void run() {
execute();
}
}
Also as others mentioned it will incur performance bottlenecks if you use synchronized methods, depends on how atomic functions you make.
Also you can check class level locks vs object level locks(Although its almost same, but do check that.)

Thread-safe Map in Java

I understand the overall concepts of multi-threading and synchronization but am new to writing thread-safe code. I currently have the following code snippet:
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
where compiledStylesheets is a HashMap (private, final). I have a few questions.
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative. Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct? This is the only code that hits this object other than initialization/instantiation.
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill. The putIfAbsent() method will not be usable in this instance because it doesn't allow me to skip the compile() method call. I also don't know if it will solve the "modified after containsKey() but before put()" problem, or if that's even really a concern in this case.
Edit: Spelling

For tasks of this nature, I highly recommend Guava caching support.
If you can't use that library, here is a compact implementation of a Multiton. Use of the FutureTask was a tip from assylias, here, via OldCurmudgeon.
public abstract class Cache<K, V>
{
private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<>();
public final V get(K key)
throws InterruptedException, ExecutionException
{
Future<V> ref = cache.get(key);
if (ref == null) {
FutureTask<V> task = new FutureTask<>(new Factory(key));
ref = cache.putIfAbsent(key, task);
if (ref == null) {
task.run();
ref = task;
}
}
return ref.get();
}
protected abstract V create(K key)
throws Exception;
private final class Factory
implements Callable<V>
{
private final K key;
Factory(K key)
{
this.key = key;
}
#Override
public V call()
throws Exception
{
return create(key);
}
}
}

I think you are looking for a Multiton.
There's a very good Java one here that #assylas posted some time ago.

You can loosen the lock at the risk of an occasional doubly compiled stylesheet in race condition.
Object y;
// lock here if needed
y = map.get(x);
if(y == null) {
y = compileNewY();
// lock here if needed
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
}
This requires get and put to be atomic, which is true in the case of ConcurrentHashMap and you can achieve by wrapping individual calls to get and put with a lock in your class. (As I tried to explain with "lock here if needed" comments - the point being you only need to wrap individual calls, not have one big lock).
This is a standard thread safe pattern to use even with ConcurrentHashMap (and putIfAbsent) to minimize the cost of compiling twice. It still needs to be acceptable to compile twice sometimes, but it should be okay even if expensive.
By the way, you can solve that problem. Usually the above pattern isn't used with a heavy function like compileNewY but a lightweight constructor new Y(). e.g. do this:
class PrecompiledY {
public volatile Y y;
private final AtomicBoolean compiled = new AtomicBoolean(false);
public void compile() {
if(!compiled.getAndSet(true)) {
y = compile();
}
}
}
// ...
ConcurrentMap<X, PrecompiledY> myMap; // alternatively use proper locking
py = map.get(x);
if(py == null) {
py = new PrecompiledY(); // much cheaper than compiling
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
y.compile(); // object that didn't get inserted never gets compiled
}
Also:
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill.
Given that your code is heavily locking, ConcurrentHashMap is almost certainly far faster, so not overkill. (And much more likely to be bug-free. Concurrency bugs are not fun to fix.)

Please see Erickson's comment below. Using double-checked locking with Hashmaps is not very smart
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative.
You can use double-checked locking, and note that you don't need any lock before get since you never remove anything from the map.
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
// another thread might have created it while
// this thread was waiting for lock
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
}
}
Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct?
Correct
This is the only code that hits this object other than initialization/instantiation.

First of all, the code as you posted it is race-condition-free because containsKey() result will never change while compile() method is running.
Collections.synchronizedMap() is useless for your case as stated above because it wraps all map methods into a synchronized block using either this as a mutex or another object you provided (for two-argument version).
IMO using ConcurrentHashMap is also not an option because it stripes locks based on key hashCode() result; its concurrent iterators is also useless here.
If you really want compile() out of synchronized block, you may pre-calculate if before checking containsKey(). This may draw the overall performance back, but may be better than calling it in synchronized block. To make a decision, personally I would consider how often key "miss" is happening and so, which option is preferrable - keep the lock for longer times or calculate your stuff always.

ThreadLocal HashMap vs ConcurrentHashMap for thread-safe unbound caches

I'm creating a memoization cache with the following characteristics:
a cache miss will result in computing and storing an entry
this computation is very expensive
this computation is idempotent
unbounded (entries never removed) since:
the inputs would result in at most 500 entries
each stored entry is very small
cache is relatively shorted-lived (typically less than an hour)
overall, memory usage isn't an issue
there will be thousands of reads - over the cache's lifetime, I expect 99.9%+ cache hits
must be thread-safe
What would have superior performance, or under what conditions would one solution be favored over the other?
ThreadLocal HashMap:
class MyCache {
private static class LocalMyCache {
final Map<K,V> map = new HashMap<K,V>();
V get(K key) {
V val = map.get(key);
if (val == null) {
val = computeVal(key);
map.put(key, val);
}
return val;
}
}
private final ThreadLocal<LocalMyCache> localCaches = new ThreadLocal<LocalMyCache>() {
protected LocalMyCache initialValue() {
return new LocalMyCache();
}
};
public V get(K key) {
return localCaches.get().get(key);
}
}
ConcurrentHashMap:
class MyCache {
private final ConcurrentHashMap<K,V> map = new ConcurrentHashMap<K,V>();
public V get(K key) {
V val = map.get(key);
if (val == null) {
val = computeVal(key);
map.put(key, val);
}
return val;
}
}
I figure the ThreadLocal solution would initially be slower if there a lot of threads because of all the cache misses per thread, but over thousands of reads, the amortized cost would be lower than the ConcurrentHashMap solution. Is my intuition correct?
Or is there an even better solution?

use ThreadLocal as cache is a not good practice
In most containers, threads are reused via thread pools and thus are never gc. this would lead something wired
use ConcurrentHashMap you have to manage it in order to prevent mem leak
if you insist, i suggest using week or soft ref and evict after rich maxsize
if you are finding a in mem cache solution ( do not reinventing the wheel )
try guava cache
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html

this computation is very expensive
I assume this is the reason you created the cache and this should be your primary concern.
While the speed of the solutions might be slightly different << 100 ns, I suspect it is more important that you be able to share results between threads. i.e. ConcurrentHashMap is likely to be the best for your application is it is likely to save you more CPU time in the long run.
In short, the speed of you solution is likely to be tiny compared to the cost of computing the same thing multiple times (for multiple threads)

Note that your ConcurrentHashMap implementation is not thread safe and could lead to one item being computed twice. It is actually quite complicated to get it right if you store the results directly without using explicit locking, which you certainly want to avoid if performance is a concern.
It is worth noting that ConcurrentHashMap is highly scalable and works well under high contention. I don't know if ThreadLocal would perform better.
Apart from using a library, you could take some inspiration from Java Concurrency in Practice Listing 5.19. The idea is to save a Future<V> in your map instead of a V. That helps a lot in making the whole method thread safe while staying efficient (lock-free). I paste the implementation below for reference but the chapter is worth reading to understand that every detail counts.
public interface Computable<K, V> {
V compute(K arg) throws InterruptedException;
}
public class Memoizer<K, V> implements Computable<K, V> {
private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<K, Future<V>>();
private final Computable<K, V> c;
public Memoizer(Computable<K, V> c) {
this.c = c;
}
public V compute(final K arg) throws InterruptedException {
while (true) {
Future<V> f = cache.get(arg);
if (f == null) {
Callable<V> eval = new Callable<V>() {
public V call() throws InterruptedException {
return c.compute(arg);
}
};
FutureTask<V> ft = new FutureTask<V>(eval);
f = cache.putIfAbsent(arg, ft);
if (f == null) {
f = ft;
ft.run();
}
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
throw new RuntimeException(e.getCause());
}
}
}
}

Given that it's relatively easy to implement both of these, I would suggest you try them both and test at steady state load to see which one performs the best for your application.
My guess is that the the ConcurrentHashMap will be a little faster since it does not have to make native calls to Thread.currentThread() like a ThreadLocal does. However, this may depend on the objects you are storing and how efficient their hash coding is.
I may also be worthwhile trying to tune the concurrent map's concurrencyLevel to the number of threads you need. It defaults to 16.

The lookup speed is probably similar in both solutions. If there are no other concerns, I'd prefer ThreadLocal, since the best solution to multi-threading problems is single-threading.
However, your main problem is you don't want concurrent calculations for the same key; so there should be a lock per key; such locks can usually be implemented by ConcurrentHashMap.
So my solution would be
class LazyValue
{
K key;
volatile V value;
V getValue() { lazy calculation, doubled-checked locking }
}
static ConcurrentHashMap<K, LazyValue> centralMap = ...;
static
{
for every key
centralMap.put( key, new LazyValue(key) );
}
static V lookup(K key)
{
V value = localMap.get(key);
if(value==null)
localMap.put(key, value=centralMap.get(key).getValue())
return value;
}

The performance question is irrelevant, as the solutions are not equivalent.
The ThreadLocal hash map isn't shared between threads, so the question of thread safety doesn't even arise, but it also doesn't meet your specification, which doesn't say anything about each thread having its own cache.
The requirement for thread safety implies that a single cache is shared among all threads, which rules out ThreadLocal completely.

Correct HashMap Synchronization

Let's say I have a HashMap declared as follows:
#GuardedBy("pendingRequests")
private final Map<UInt32, PendingRequest> pendingRequests = new HashMap<UInt32, PendingRequest>();
Access to the map is multi-threaded, and all access is guarded by synchronizing on this final instance of the map, e.g.:
synchronized (pendingRequests) {
pendingRequests.put(reqId, request);
}
Is this enough? Should the map be created using Collections.synchronizedMap()? Should I be locking on a dedicated lock object instead of the map instance? Or maybe both?
External synchronization (in addition to possibly using Collections.synchronizedMap()) is needed in a couple areas where multiple calls on the map must be atomic.

Synchronizing on the map itself is essentially what the Map returned by Collection.synchronizedMap() would do. For your situation it is a reasonable approach, and there is not much to recommend using a separate lock object other than personal preference (or if you wish to have more fine grained control and use a ReentrantReadWriteLock to allow concurrent reading of the map).
E.g.
private Map<Integer,Object> myMap;
private ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
public void myReadMethod()
{
rwl.readLock().lock();
try
{
myMap.get(...);
...
} finally
{
rwl.readLock().unlock();
}
}
public void myWriteMethod()
{
// may want / need to call rwl.readLock().unlock() here,
// since if you are holding the readLock here already then
// you cannot get the writeLock (so be careful on how your
// methods lock/unlock and call each other).
rwl.writeLock().lock();
try
{
myMap.put(key1,item1);
myMap.put(key2,item2);
} finally
{
rwl.writeLock().unlock();
}
}

All calls to the map need to be synchronized, and Collections.synchronizedMap() gives you that.
However, there is also an aspect of compound logic. If you need the integrity of the compound logic, synchronization of individual calls is not enough. For example, consider the following code:
Object value = yourMap.get(key); // synchronized
if (value == null) {
// do more action
yourMap.put(key, newValue); // synchronized
}
Although individual calls (get() and put()) are synchronized, your logic will not be safe against concurrent access.
Another interesting case is when you iterate. For an iteration to be safe, you'd need to synchronize for the entire duration of the iteration, or you will get ConcurrentModificationExceptions.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Efficiency of HashTable and Collections.synchronized(HashMap) - java

Although HashTable is thread-safe but it doesn't guarantee that it make your whole code thread-safe.HashTable also has some performance issue. So you should use HashMap but you have to manage all thread-safety yourself.

Related

ConcurrentHashMap and ReentrantReadWriteLock

Applying lock on HashMap

Thread-safe Map in Java

ThreadLocal HashMap vs ConcurrentHashMap for thread-safe unbound caches

Correct HashMap Synchronization

Categories

Resources