Since Java 8 we can use .compute* methods on ConcurrentHashMap to synchronize processing by key, so that if two threads execute .compute* method on the same key at the same time, callbacks still will be executed one after another and not simultaneously. But ConcurrentHashMap doesn't provide ability to delete data in timely fashion as caches usually allow.
Guava/Caffeine caches provide ability to automatically delete values based on time, but you don't have that nasty feature of synchronized processing based on key, as in ConcurrentHashMap (you can get ConcurrentMap using asMap method, but .compute* implementations doesn't provide synchronization based on key)
My goal is to have both processing synchronization by key as in ConcurrentHashMap AND removals by time as in Guava/Caffeine.
What is the best way to achieve it in Java?
I was wrong about Caffeine - it supports atomic operations with compute.
Guava support was added since 21 version.
Related
I use the CacheBuilder with expireAfterWrite(2000, TimeUnit.Milliseconds). I send 10000 requests to my program and I expect the CacheBuilder to call RemovalListener 10000 times after 2 seconds for each request. I do not observe this behaviour and instead I get RemovalListener called 1 or 2 times.
Can someone please explain to me what CacheBuilder is doing because as I explained above it is doing something totally different from the documentation that Guava is providing.
In the same spirit as above, I use maximumSize(1000) and after sending my program 10000 requests, I expect the RemovalListener to be called 9000 times. But it's called only 1 or 2 times.
How does this module works in reality?
EDIT
I explicitly call clean cleanup each time I receive a request
The removal behavior is documented and works as expected (emphasis mine):
When Does Cleanup Happen?
Caches built with CacheBuilder do not perform cleanup and evict values "automatically," or instantly after a value expires, or anything of the sort. Instead, it performs small amounts of maintenance during write operations, or during occasional read operations if writes are rare.
The reason for this is as follows: if we wanted to perform Cache maintenance continuously, we would need to create a thread, and its operations would be competing with user operations for shared locks. Additionally, some environments restrict the creation of threads, which would make CacheBuilder unusable in that environment.
Instead, we put the choice in your hands. If your cache is high-throughput, then you don't have to worry about performing cache maintenance to clean up expired entries and the like. If your cache does writes only rarely and you don't want cleanup to block cache reads, you may wish to create your own maintenance thread that calls Cache.cleanUp() at regular intervals.
If you want to have more control over the cache and have dedicated executor to take care for calling RemovalListeners, use Caffeine -- a high performance, near optimal caching library based on Java 8 -- which has an API similar to Guava's Cache (same author). Caffeine has more advanced removal handling:
You may specify a removal listener for your cache to perform some operation when an entry is removed, via Caffeine.removalListener(RemovalListener). The RemovalListener gets passed the key, value, and RemovalCause.
Removal listener operations are executed asynchronously using an Executor. The default executor is ForkJoinPool.commonPool() and can be overridden via Caffeine.executor(Executor). When the operation must be performed synchronously with the removal, use CacheWriter instead.
I'm working on an ultra low latency and high performance application.
The core is single threaded, so don't need to worry about concurrency.
I'm developing a schedule log function which log messages periodically to prevent same messages flush in the log.
So the log class contains a ConcurrentHashMap, one thread update it (put new key or update existing value), another thread periodically loop through the map to log all the messages.
My concern is since need to log when loop through the Map which may take time, would it block the thread trying to update the Map? Any blocking is not acceptable since our application core is single threaded.
And is there any other data structure other than ConcurrentHashMap I can use to reduce the memory footprint?
Is there a thread-safe way to iterate the Map without blocking in the read-only case? Even the data iterated may be stale is still acceptable.
According to the java API docs, it says that :
[...] even though all operations are thread-safe, retrieval operations do not entail locking [...]
Moreover, the entrySet() method documentation tells us that:
The [returned] set is backed by the map, so changes to the map are reflected in the set, and vice-versa.
That would mean that modification of the map is possible while an iteration is done over it, meaning that it indeed doesn't block the whole map.
There may be other structures that would allow you to reduce the memory footprint, reduce latency, and perhaps offer a more uniform & consistent performance profile.
One of these is a worker pattern where your main worker publishes to a nonblocking queue and the logger pulls from that queue. This should decouple the two processes, allow for multiple concurrent publishers and allow you to scale out loggers.
One possible data structure for this is ConcurrentLinked Queue. I have very little experience with Java so I'm not sure how the performance profile of this differs from concurrent hashmap; but this is a very common pattern in distributed systems and in golang.
What is a generic approach to achieve thread safety when an object (e.g. a HashMap or an ArrayList or some POJO) is always modified by a single (same) thread but can be accessed by multiple threads?
HashMap is of most interest for me but I need a generic approach.
Is it enough to make it volatile?
Thanks.
Maybe you should take a look at ConcurrentHashMap.
public class ConcurrentHashMap
extends AbstractMap
implements ConcurrentMap, Serializable
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
More info here :
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
It behaves programmaticaly exactly like a classical hashmap.
A generic approach will not be easy to implement and will need a lot of effort on your part, and still things will go wrong (as it is too difficult). So, your best bet is ConcurrentHashMap as suggested by Gomoku7.
Generic approach will require an implementation based on locks. You will have to lock objects before update and release lock afterwords. There are different types of locks in Java. So, chose locks which suits your need. There are some tip:
Final is your friend, do not mutate objects unless it is necessary, make objects final where possible
Avoid creating temporary variables
Use Executors and Fork/Join where possible
In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?
HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.
I am writing an application which will return a HashMap to user. User will get reference to this MAP.
On the backend, I will be running some threads which will update the Map.
What I have done so far?
I have made all the backend threads so share a common channel to update the MAP. So at backend I am sure that concurrent write operation will not be an issue.
Issues I am having
If user tries to update the MAP and simultaneously MAP is being updated at backend --> Concurrent write operation problem.
If use tries to read something from MAP and simultaneously MAP is being updated at backend --> concurrent READ and WRITE Operation problem.
Untill now I have not face any such issue, but i m afraid that i may face in future. Please give sugesstions.
I am using ConcurrentHashMap<String, String>.
You are on the right track using ConcurrentHashMap. For each point:
Check out the methods putIfAbsent and replace both are threadsafe and combine checking current state of hashmap and updating it into one atomic operation.
The get method is not synchronized internally but will return the most recent value for the specified key available to it (check the ConcurrentHashMap class Javadoc for discussion).
The benefit of ConcurrentHashMap over something like Collections.synchronizedMap is the combined methods like putIfAbsent which provide traditional Map get and put logic in an internally synchronized way. Use these methods and do not try to provide your own custom synchronization over ConcurrentHashMap as it will not work. The java.util.concurrent collections are internally synchronized and other threads will not respond to attempts at synchronizing the object (e.g. synchronize(myConcurrentHashMap){} will not block other threads).
Side Note:
You might want to look into the lock free hash table implementation by Cliff Click, it's part of the Highly Scalable Java library
(Here's a Google Talk by Cliff Click about this lock free hash.)
ConcurrentHashMap was designed and implemented to avoid any issues with the scenarios you describe. You have nothing to worry about.
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates.updates.
javadoc of ConcurrentHashMap