I have a HashMap that is loaded from a database during setup. A change to the database triggers an update on the map. I need to clear out the entire map and reload it with new values. I need to block any other threads from being able to access the map's get method until the entire map is reloaded.
I cannot use a ConcurrentHashMap because the docs say:
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
I have tried using a synchronized block and a ReadWriteLock for the method that reloads the map but both still allow other threads to read from it. I cannot put a lock on each read because that would hinder the performance of my app.
All read access to the map is provided through a getter. I have considered using a Semaphore but I am not sure it is the best solution. Is there any other solution that would allow optimal performance without a major implementation change?
this.getMap().clear();
this.getMap().putAll(data);
Related
What is a generic approach to achieve thread safety when an object (e.g. a HashMap or an ArrayList or some POJO) is always modified by a single (same) thread but can be accessed by multiple threads?
HashMap is of most interest for me but I need a generic approach.
Is it enough to make it volatile?
Thanks.
Maybe you should take a look at ConcurrentHashMap.
public class ConcurrentHashMap
extends AbstractMap
implements ConcurrentMap, Serializable
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
More info here :
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
It behaves programmaticaly exactly like a classical hashmap.
A generic approach will not be easy to implement and will need a lot of effort on your part, and still things will go wrong (as it is too difficult). So, your best bet is ConcurrentHashMap as suggested by Gomoku7.
Generic approach will require an implementation based on locks. You will have to lock objects before update and release lock afterwords. There are different types of locks in Java. So, chose locks which suits your need. There are some tip:
Final is your friend, do not mutate objects unless it is necessary, make objects final where possible
Avoid creating temporary variables
Use Executors and Fork/Join where possible
In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?
HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.
First, I'll describe what I want and then I'll elaborate on the possibilities I am considering. I don't know which is the best so I want some help.
I have a hash map on which I do read and write operations from a Servlet. Now, since this Servlet is on Tomcat, I need the hash map to be thread safe. Basically, when it is being written to, nothing else should write to it and nothing should be able to read it as well.
I have seen ConcurrentHashMap but noticed its get method is not thread-safe. Then, I have seen locks and something called synchronized.
I want to know which is the most reliable way to do it.
ConcurrentHashMap.get() is thread safe.
You can make HashMap thread safe by wrapping it with Collections.synchronizedMap().
EDIT: removed false information
In any case, the synchronized keyword is a safe bet. It blocks any threads from accessing the object while inside a synchronized block.
// Anything can modify map at this point, making it not thread safe
map.get(0);
as opposed to
// Nothing can modify map until the synchronized block is complete
synchronized(map) {
map.get(0);
}
I would like to suggest you to go with ConcurrentHashMap , the requirement that you have mentioned above ,earlier I also had the same type of requirement for our application but we were little more focused on the performance side.
I ran both ConcurrentHashMap and map returned by Colecctions.synchronizedMap(); , under various types of load and launching multiple threads at a time using JMeter and I monitored them using JProfiler .After all these tests we came to conclusion that that map returned by Colecctions.synchronizedMap() was not as efficient in terms of performance in comaprison to ConcurrentHashMap.
I have written a post also on the same about my experience with both.
Thanks
Collections.synchronizedMap(new HashMap<K, V>);
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
This is the point of ConcurrentHashMap class. It protects your collection, when you have more than 1 thread.
I am currently implementing cache. I have completed basic implementation, like below. What I want to do is to run a thread that will remove entry that satisfy certain conditions.
class Cache {
int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
int maxEntries = 10; //maximum number of Entries
HashSet<String> set = new HashSet<String>();
public void add(Entry t){
....
}
public Entry access(String key){
//mark Entry that it has been used
//Since it has been marked, background thread should remove this entry after timeLimit seconds.
return set.get(key);
}
....
}
My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ?
edit
Above is just simplified version of codes, that I did not write synchronized statements.
Why are you reinventing the wheel? EhCache (and any decent cache implementation) will do this for you. Also much more lightweight MapMaker Cache from Guava can automatically remove old entries.
If you really want to implement this yourself, it is not really that simple.
Remember about synchronization. You should use ConcurrentHashMap or synchronized keyword to store entries. This might be really tricky.
You must store last access time somehow of each entry somehow. Every time you access an entry, you must update that timestamp.
Think about eviction policy. If there are more than maxEntries in your cache, which ones to remove first?
Do you really need a background thread?
This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). Instead it waits until the map is full and removes entries lazily. This looks like a good trade-off as threads are expensive.
If you have a background thread, should there be one per cache or one global? Do you start a new thread while creating a new cache or have a global list of all caches? This is harder than you think...
Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry.
I'd use Guava's Cache type for this, personally. It's already thread-safe and has methods built in for eviction from the cache based on some time limit. If you want a thread to periodically sweep it, you can just do something like this:
new Thread(new Runnable() {
public void run() {
cache.cleanUp();
try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
}
}).start();
I don't imagine you really need a background thread. Instead you can just remove expired entries before or after you perform a lookup. This simplifies the entire implementation and its very hard to tell the difference.
BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example)
First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.
When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap is not thread-safe, but ConcurrentHashMap is. There are also a number of other concurrent Map implementations out there, namely in Guava, Javolution and high-scale lib. There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReferences.
Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.
However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.
The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.
EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its ConcurrentMap based API.
As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.
Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.
Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
Stormpot, on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?
First, make access to your collection either synchronized or use ConcurrentHashSet a ConcurrentHashMap based Set as indicated in the comments below.
Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection".
I am writing an application which will return a HashMap to user. User will get reference to this MAP.
On the backend, I will be running some threads which will update the Map.
What I have done so far?
I have made all the backend threads so share a common channel to update the MAP. So at backend I am sure that concurrent write operation will not be an issue.
Issues I am having
If user tries to update the MAP and simultaneously MAP is being updated at backend --> Concurrent write operation problem.
If use tries to read something from MAP and simultaneously MAP is being updated at backend --> concurrent READ and WRITE Operation problem.
Untill now I have not face any such issue, but i m afraid that i may face in future. Please give sugesstions.
I am using ConcurrentHashMap<String, String>.
You are on the right track using ConcurrentHashMap. For each point:
Check out the methods putIfAbsent and replace both are threadsafe and combine checking current state of hashmap and updating it into one atomic operation.
The get method is not synchronized internally but will return the most recent value for the specified key available to it (check the ConcurrentHashMap class Javadoc for discussion).
The benefit of ConcurrentHashMap over something like Collections.synchronizedMap is the combined methods like putIfAbsent which provide traditional Map get and put logic in an internally synchronized way. Use these methods and do not try to provide your own custom synchronization over ConcurrentHashMap as it will not work. The java.util.concurrent collections are internally synchronized and other threads will not respond to attempts at synchronizing the object (e.g. synchronize(myConcurrentHashMap){} will not block other threads).
Side Note:
You might want to look into the lock free hash table implementation by Cliff Click, it's part of the Highly Scalable Java library
(Here's a Google Talk by Cliff Click about this lock free hash.)
ConcurrentHashMap was designed and implemented to avoid any issues with the scenarios you describe. You have nothing to worry about.
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates.updates.
javadoc of ConcurrentHashMap