thread safe or not my class?

thread safe or not my class? - java

Service must be cached in-memory data and save data in the database. getAmount(id) retrieves current balance or zero if addAmount() method was not called before for
specified id. addAmount(id, amount) increases balance or set if method was called first time. Service must be thread-safe. Thread Safety Is my implementation? What improvements can be made?
public class AccountServiceImpl implements AccountService {
private static final Logger LOGGER = LoggerFactory.getLogger(AccountServiceImpl.class);
private LoadingCache cache;
private AccountDAO accountDAO = new AccountDAOImpl();
public AccountServiceImpl() {
cache = CacheBuilder.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.concurrencyLevel(4)
.maximumSize(10000)
.recordStats()
.build(new CacheLoader<Integer, Account>() {
#Override
public Account load(Integer id) throws Exception {
return new Account(id, accountDAO.getAmountById(id));
}
});
}
public Long getAmount(Integer id) throws Exception {
synchronized (cache.get(id)) {
return cache.get(id).getAmount();
}
}
public void addAmount(Integer id, Long value) throws Exception {
Account account = cache.get(id);
synchronized (account) {
accountDAO.addAmount(id, value);
account.setAmount(accountDAO.getAmountById(id));
cache.put(id, account);
}
}
}

A race condition could occur if the Account is evicted from the cache and multiple updates to that account are taking place. The eviction results in multiple Account instances, so the synchronization doesn't provide atomicity and a stale value could be inserted into the cache.
The race is more obvious if you change the settings, e.g. maximumSize(0). At the current settings likelihood of the race may be rare, but eviction may still occur even after the access. This is because the entry might be chosen for eviction but not yet removed, so a subsequent read succeeds even though the access is ignored from the policy's perspective.
The proper way to do this in Guava is to Cache.invalidate() the entry. The DAO is transactionally updating the system of record, so it ensures atomicity of the operation. The LoadingCache ensures atomicity of an entry being computed, so reads will be blocked while a fresh value is loaded. This results in an extra database lookup which seems unnecessary, but is negligible in practice. Unfortunately there is a tiny potential race even still, because Guava does not invalidate loading entries.
Guava doesn't support the write-through caching behavior you are trying to implement. Its successor, Caffeine, does by exposing Java 8's compute map methods and soon a CacheWriter abstraction. That said, the loading approach Guava expects is simple, elegant, and less error prone than manual updates.

There are two issues here to take care of:
The update of the amount value must be atomic.
If you have declared:
class Account { long amount; }
Changing the field value is not atomic on 32 bit systems. It is atomic on 64 bit systems. See: Are 64 bit assignments in Java atomic on a 32 bit machine?
So, the best way would be to change the declaration to "volatile long amout;" Then the update of the value is always atomic, plus, the volatile ensures that the others Threads/CPUs see the changed value.
That means for updating the single value, you don't need the synchronized block.
Race between inserting and modify
With your synchronized statements you just solve the first problem. But there are multiple races in your code.
See this code:
synchronized (cache.get(id)) {
return cache.get(id).getAmount();
}
You obviously assume that cache.get(id) returns the same object instance if called for the same id. That is not the case, since a cache essentially does not guarantee this.
Guava Cache blocks until the loading is complete. Other Cache may or may not block, meaning if requests come in in parallel multiple loads will be called resulting in multiple changes of the stored cache value.
Still, Guava Cache is a cache, so the item may be evicted from the cache any time, so for the next get another instance is returned.
Same problem here:
public void addAmount(Integer id, Long value) throws Exception {
Account account = cache.get(id);
/* what happens if lots of requests come in and another
threads evict the account object from the cache? */
synchronized (account) {
. . .
In general: Never synchronize on an object that life cycle is not in your control. BTW: Other cache implementations may store just the serialized object value and return another instance on each request.
Since you have a cache.put after the modify, your solution will probably work. However, the synchronize does just fulfill the purpose of flushing the memory, it may or may not really do the locking.
The update of the cache happens after the value changed in the database. This means an application may read a former value even if it is changed in the database already. This may lead to inconsistencies.
Solution 1
Have a static set of lock objects that you chose by the key value e.g. by locks[id % locks.length]. See my answer here: Guava Cache, how to block access while doing removal
Solution 2
Use database transactions, and update with the pattern:
Transaction.begin();
cache.remove(id);
accountDAO.addAmount(id, value);
Transaction.commit();
Do not update the value directly inside the cache. This will lead to update races and needs locking again.
If the transactions are solely handled in the DAO, this means for your software architecture that the caching should be implemented in the DAO and not outside.
Solution 3
Why not just store the amount value in the cache? If it is allowed that the cache results may be inconsistent with the database content while updating, the simplest solution is:
public AccountServiceImpl() {
cache = CacheBuilder.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.concurrencyLevel(4)
.maximumSize(10000)
.recordStats()
.build(new CacheLoader<Integer, Account>() {
#Override
public Account load(Integer id) throws Exception {
return accountDAO.getAmountById(id);
}
});
}
Long getAmount(Integer id) {
return cache.get(id);
}
void addAmount(Integer id, Long value) {
accountDAO.addAmount(id, value);
cache.remove(id);
}

No,
private LoadingCache cache;
must be final.
cache.get(id)
must be synchronized. Are you using a library for that?

Cache must be synchronized . Otherwise two threads updating amount at same time, you will never be sure of final result. check the implementation of `put' method of used library

Related

How does the synchronized keyword work on an instance variable?

I'm looking at some legacy code that's of the form
public class Client {
private final Cache cache;
.....
Client( final Cache cache) {
this.cache = cache;
}
public Value get(Key key) {
synchronized(cache){
return this.cache.get(key);
}
}
public void put(Key k , Value v) {
synchronized(this.cache){
return cache.put(k, v);
}
}
}
}
I've never seen an instance variable that could be modified being used as a lock Object, since typically locks are usually final Object instances or just direct Locks via the Java API.
How does the synchronized key word have any effect in this case? Isnt a new lock created for each instance of the Client object?
Would the usage of the synchronized keyword force the cache to be updated before a get/put operation is applied?
Why would synchronization be necessary before a get? Is it to get the cache to be updated to the latest values assuming another thread applied a put interim.

synchronized provides the same guarantees irrespective of whether it is used on a static variable or an instance variable. i.e., memory visibility and atomicity. In your case, it provides thread safety at instance level for the attribute cache.
So, coming to your questions
You are right. Each instance of Clientwill have its own lock. But this is useful when an instance of Client is shared between multiple clients.
After the execution of the synchronized block, CPU local caches will be flushed into the main memory and this ensures memory visibility for other threads. At the start of execution of the synchronized block, local CPU caches will be invalidated and loaded from the main memory. So yes, synchronized will cause the instance variable cache to have the up to date values. Please see Synchronization for more details.
The reason is the same as 2. i.e., to provide memory visibility.

Is this spring code thread safe?

This class is designed to run in a Spring Boot controller. The admin data lives in an Oracle table, and there is only one record of it. This is needed because the data could be changed by another application and, if it does, this app needs to read the new data.
So AdminData is an entity bean (Hibernate). In practice, the admin data will almost never be updated, but this is a high volume web application so that data is read very frequently. It's needed on every call to GET and POST.
I considered using AtomicReference<> but in this case I'm not sure it's any better than just using the volatile keyword.
I am thinking this is thread safe because:
1 - The get() method just returns a reference, and in Java fetching or updating a reference is atomic.
2 - The onDatabaseChangeNotification() call is probably not going to execute atomically because of the call to the repository, but this method can only ever be executed by a call from Oracle, so there would only be one thread ever running it. Again, the reference assignment to cachedAd should be atomic.
3 - I'm thinking that the call to setInitialValue() probably will only ever be executed by one thread as well, but I'm not sure so I added the synchronized.
Am I right? Thanks for your help.
#DependsOn("DecLogger")
#Service
public class AdminDataCacher implements DatabaseChangeListener
{
#Autowired
private AdminDataRepository adRep;
private volatile AdminData cachedAd = null;
public AdminData get()
{
return cachedAd;
}
#Override
public void onDatabaseChangeNotification(oracle.jdbc.dcn.DatabaseChangeEvent e)
{
cachedAd = adRep.findByKey(1L);
DecLogger.DEC_LOGIN.finer(() -> "Oracle DCN Call on Admin Data - Invalidating Cached Data");
}
#PostConstruct
private synchronized void setInitialValue()
{
cachedAd = adRep.findByKey(1L);
DecLogger.DEC_LOGIN.finer(() -> "AdminDataCacher - Initial value set");
}
}
UPDATE, based on comments and some sleep:
If AdminData were not threadsafe and could not be made threadsafe (by making it immutable) perhaps this approach would work, although I am concerned about performance:
public AdminData get()
{
AdminData tmp = cachedAd;
return tmp.clone();
}
ANOTHER UPDATE
Based upon more comments and more research, I have rewritten the class.
I decided that what I need is an immutable object to hold the administrative data, so I created an additional, immutable class called AdminDataImmutable. This class, since it is immutable, is inherently threadsafe so I can return it to every caller, thus avoiding the overhead of cloneing the cached instance and I won't have to worry about another developer misusing it in the future, and I won't have to defend / protect copies out of it.
When the database changes, as pointed out, I should synchronize on the repository, and I can update the reference for the cached object without concern because, in Java, reference update is atomic by design.
So now, in the get() method, I can simply return the reference. Code below. Does this new version make sense???
Thanks!
#DependsOn("DecLogger")
#Service
public class AdminDataCacher implements DatabaseChangeListener
{
private volatile AdminDataImmutable cachedAd;
#Autowired
private AdminDataRepository adRep;
public AdminDataImmutable get()
{
return cachedAd;
}
#Override
public void onDatabaseChangeNotification(oracle.jdbc.dcn.DatabaseChangeEvent e)
{
DecLogger.DEC_LOGIN.finer(() -> "Oracle DCN Call on Admin Data - Invalidating Cached Data");
synchronized(adRep)
{
AdminDataEntity ade = adRep.findByKey(1L);
cachedAd = new AdminDataImmutable(ade);
}
}
#PostConstruct
private void loadInitialValue()
{
synchronized(adRep)
{
AdminDataEntity ade = adRep.findByKey(1L);
cachedAd = new AdminDataImmutable(ade);
}
}
}
LAST UPDATE
I made cachedAd volatile.

I suppose volatile AdminData is useless here, because this only secure updating of reference, but not making AdminData object thread safe itself. And as you mentioned about get() method, in java updating of references are always atomic operation. So you try to oversecure reference. If you want to be sure that AdminData object are threadsafe by itself, you should review code of AdminData object.
About 2) and 3) I would notice that may be it is worth to take a look on code of findByKey method and make it thread safe, but not try to assume number on caller threads (seems like you are not sure in both cases). Try to make thread safe code as high by stack as it possible - it will decrease number of critical sections and reduce code complexity.
If you can not rework or review code of AdminDataRepository, then in case of 2) you assume that only one caller will be. But by analogy with 3) may be worth to add synchronized to be sure, because it is still possible that findByKey will be called by at least two threads at the same time one: at least one call from not threadsafe onDatabaseChangeNotification() (but there could be more threads) and one call from synchronized setInitialValue(). So you secure one method, but still could get two concurrency calls to findByKey(). It can cause problems in case findByKey() interacts with some shared data in adRep object, not just retrieve data from Oracle database (As simple example just imagine that every call of findByKey increment some internal counter, which is shared across all calls).
Next, just putting synchronized on method onDatabaseChangeNotification() has one more pitfall. In this case, you are using this object (AdminDataCacher object) as lock object and it will secure in case AdminDataRepository adRep are injected only in AdminDataCacher. But if the same singelton object AdminDataRepository adRep will be injected in some class - you are in trouble, because synchronized is useless, you could still get several calls to adRep.findByKey() at the same time (one from AdminDataCacher and some more from other classes where AdminDataRepository adRep are injected). In this case you should secure on adRep object:
#Override
public void onDatabaseChangeNotification(oracle.jdbc.dcn.DatabaseChangeEvent e)
{
synchronized(adRep) {
cachedAd = adRep.findByKey(1L);
}
DecLogger.DEC_LOGIN.finer(() -> "Oracle DCN Call on Admin Data - Invalidating Cached Data");
}
#PostConstruct
private void setInitialValue()
{
synchronized(adRep) {
cachedAd = adRep.findByKey(1L);
}
DecLogger.DEC_LOGIN.finer(() -> "AdminDataCacher - Initial value set");
}
Sorry for too many letter, but I try to give you an idea how to analyse code and choose correct decision. So, in conclusion step by step:
Try to make AdminDataRepository object threadsafe and call it
without synchronized
If it is not possible use object adRep as
lock
Use AdminData object without volatile, but it is also
worth to review its internal code
P.S. This info are valid in case of plain Java. I'm not 100% sure that Spring does not have some internal logic to make calls to a bean thread safe.

2 - The onDatabaseChangeNotification() call is probably not going to execute atomically because of the call to the repository, but this method can only ever be executed by a call from Oracle, so there would only be one thread ever running it.
That is not the right way of thinking about thread-safety.
Methods never need synchronization. No matter how many threads call the same method in your program at the same time, nothing bad will ever happen to the method.
It's data that need synchronization.
Your onDatabaseChangeNotification(...) method calls adRep.findByKey(1L). What you need to think about is, what other threads could be accessing or modifying any of the same program state at the same time. That is where you might run into trouble.

How to guarantee get() of ConcurrentHashMap to always return the latest actual value?

Introduction
Suppose I have a ConcurrentHashMap singleton:
public class RecordsMapSingleton {
private static final ConcurrentHashMap<String,Record> payments = new ConcurrentHashMap<>();
public static ConcurrentHashMap<String, Record> getInstance() {
return payments;
}
}
Then I have three subsequent requests (all processed by different threads) from different sources.
The first service makes a request, that gets the singleton, creates Record instance, generates unique ID and places it into Map, then sends this ID to another service.
Then the second service makes another request, with that ID. It gets the singleton, finds Record instance and modifies it.
Finally (probably after half an hour) the second service makes another request, in order to modify Record further.
Problem
In some really rare cases, I'm experiencing heisenbug. In logs I can see, that first request successfully placed Record into Map, second request found it by ID and modified it, and then third request tried to find Record by ID, but found nothing (get() returned null).
The single thing that I found about ConcurrentHashMap guarantees, is:
Actions in a thread prior to placing an object into any concurrent
collection happen-before actions subsequent to the access or removal
of that element from the collection in another thread.
from here. If I got it right, it literally means, that get() could return any value that actually was sometime into Map, as far as it doesn't ruin happens-before relationship between actions in different threads.
In my case it applies like this: if third request doesn't care about what happened during processing of first and second, then it could read null from Map.
It doesn't suit me, because I really need to get from Map the latest actual Record.
What have I tried
So I started to think, how to form happens-before relationship between subsequent Map modifications; and came with idea. JLS says (in 17.4.4) that:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all
subsequent reads of v by any thread (where "subsequent" is defined
according to the synchronization order).
So, let's suppose, I'll modify my singleton like this:
public class RecordsMapSingleton {
private static final ConcurrentHashMap<String,Record> payments = new ConcurrentHashMap<>();
private static volatile long revision = 0;
public static ConcurrentHashMap<String, Record> getInstance() {
return payments;
}
public static void incrementRevision() {
revision++;
}
public static long getRevision() {
return revision;
}
}
Then, after each modification of Map or Record inside, I'll call incrementRevision() and before any read from Map I'll call getRevision().
Question
Due to nature of heisenbugs no amount of tests is enough to tell that this solution is correct. And I'm not an expert in concurrency, so couldn't verify it formally.
Can someone approve, that following this approach guarantees that I'm always going to get the latest actual value from ConcurrentHashMap? If this approach is incorrect or appears to be inefficient, could you recommend me something else?

You approach will not work as you are actually repeating the same mistake again. As ConcurrentHashMap.put and ConcurrentHashMap.get will create a happens before relationship but no time ordering guaranty, exactly the same applies to your reads and writes to the volatile variable. They form a happens before relationship but no time ordering guaranty, if one thread happens to call get before the other performed put, the same applies to the volatile read that will happen before the volatile write then. Besides that, you are adding another error as applying the ++ operator to a volatile variable is not atomic.
The guarantees made for volatile variables are not stronger than these made for a ConcurrentHashMap. It’s documentation explicitly states:
Retrievals reflect the results of the most recently completed update operations holding upon their onset.
The JLS states that external actions are inter-thread actions regarding the program order:
An inter-thread action is an action performed by one thread that can be detected or directly influenced by another thread. There are several kinds of inter-thread action that a program may perform:
…
External Actions. An external action is an action that may be observable outside of an execution, and has a result based on an environment external to the execution.
Simply said, if one thread puts into a ConcurrentHashMap and sends a message to an external entity and a second thread gets from the same ConcurrentHashMap after receiving a message from an external entity depending on the previously sent message, there can’t be a memory visibility issue.
It might be the case that these action aren’t programmed that way or that the external entity doesn’t have the assumed dependency, but it might be the case that the error lies in a completely different area but we can’t tell as you didn’t post the relevant code, e.g. the key doesn’t match or the printing code is wrong. But whatever it is, it won’t be fixed by the volatile variable.

Avoiding concurrent structures by manually triggering memory barriers

Background
I have a class whose instances are used to collect and publish data (uses Guava's HashMultimap):
public class DataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
public void addLabelValue(String label, String value) {
valueSetsByLabel.put(label, value);
}
public Set<String> getLabels() {
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.get(label);
}
}
Instances of this class will now be passed between threads, so I need to modify it for thread-safety. Since Guava's Multimap implementations aren't thread-safe, I used a LoadingCache that lazily creates concurrent hash sets instead (see the CacheBuilder and MapMaker javadocs for details):
public class ThreadSafeDataCollector {
private final LoadingCache<String, Set<String>> valueSetsByLabel
= CacheBuilder.newBuilder()
.concurrencyLevel(1)
.build(new CacheLoader<String, Set<String>>() {
#Override
public Set<String> load(String label) {
// make and return a concurrent hash set
final ConcurrentMap<String, Boolean> map = new MapMaker()
.concurrencyLevel(1)
.makeMap();
return Collections.newSetFromMap(map);
}
});
public void addLabelValue(String label, String value) {
valueSetsByLabel.getUnchecked(label).add(value);
}
public Set<String> getLabels() {
return valueSetsByLabel.asMap().keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.getUnchecked(label);
}
}
You'll notice I'm setting the concurrency level for both the loading cache and nested concurrent hash sets to 1 (meaning they each only read from and write to one underlying table). This is because I only expect one thread at a time to read from and write to these objects.
(To quote the concurrencyLevel javadoc, "A value of one permits only one thread to modify the map at a time, but since read operations can proceed concurrently, this still yields higher concurrency than full synchronization.")
Problem
Because I can assume there will only be a single reader/writer at a time, I feel that using many concurrent hash maps per object is heavy-handed. Such structures are meant to handle concurrent reads and writes, and guarantee atomicity of concurrent writes. But in my case atomicity is unimportant - I only need to make sure each thread sees the last thread's changes.
In my search for a more optimal solution I came across this answer by erickson, which says:
Any data that is shared between thread needs a "memory barrier" to ensure its visibility.
[...]
Changes to any member that is declared volatile are visible to all
threads. In effect, the write is "flushed" from any cache to main
memory, where it can be seen by any thread that accesses main memory.
Now it gets a bit trickier. Any writes made by a thread before that
thread writes to a volatile variable are also flushed. Likewise, when
a thread reads a volatile variable, its cache is cleared, and
subsequent reads may repopulate it from main memory.
[...]
One way to make this work is to have the thread that is populating
your shared data structure assign the result to a volatile variable. [...]
When other threads access that variable, not only are they guaranteed
to get the most recent value for that variable, but also any changes
made to the data structure by the thread before it assigned the value
to the variable.
(See this InfoQ article for a further explanation of memory barriers.)
The problem erickson is addressing is slightly different in that the data structure in question is fully populated and then assigned to a variable that he suggests be made volatile, whereas my structures are assigned to final variables and gradually populated across multiple threads. But his answer suggests I could use a volatile dummy variable to manually trigger memory barriers:
public class ThreadVisibleDataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
private volatile boolean dummy;
private void readMainMemory() {
if (dummy) { }
}
private void writeMainMemory() {
dummy = false;
}
public void addLabelValue(String label, String value) {
readMainMemory();
valueSetsByLabel.put(label, value);
writeMainMemory();
}
public Set<String> getLabels() {
readMainMemory();
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
readMainMemory();
return valueSetsByLabel.get(label);
}
}
Theoretically, I could take this a step further and leave it to the calling code to trigger memory barriers, in order to avoid unnecessary volatile reads and writes between calls on the same thread (potentially by using Unsafe.loadFence and Unsafe.storeFence, which were added in Java 8). But that seems too extreme and hard to maintain.
Question
Have I drawn the correct conclusions from my reading of erickson's answer (and the JMM) and implemented ThreadVisibleDataCollector correctly? I wasn't able to find examples of using a volatile dummy variable to trigger memory barriers, so I want to verify that this code will behave as expected across architectures.

The thing you are trying to do is called “Premature Optimization”. You don’t have a real performance problem but try to make your entire program very complicated and possibly error prone, without any gain.
The reason why you will never experience any (notable) gain lies in the way how a lock works. You can learn a lot of it by studying the documentation of the class AbstractQueuedSynchronizer.
A Lock is formed around a simple int value with volatile semantics and atomic updates. In the simplest form, i.e. without contention, locking and unlocking consist of a single atomic update of this int variable. Since you claim that you can be sure that there will be only one thread accessing the data at a given time, there will be no contention and the lock state update has similar performance characteristics compared to your volatile boolean attempts but with the difference that the Lock code works reliable and is heavily tested.
The ConcurrentMap approach goes a step further and allows a lock-free read that has the potential to be even more efficient than your volatile read (depending on the actual implementation).
So you are creating a potentially slower and possibly error prone program just because you “feel that using many concurrent hash maps per object is heavy-handed”. The only answer can be: don’t feel. Measure. Or just leave it as is as long as there is no real performance problem.

Some value is written to volatile variable happens-before this value can be read from it. As a consequence, the visibility guarantees you want will be achieved by reading/writing it, so the answer is yes, this solves visibility issues.
Besides the problems mentioned by Darren Gilroy in his answer, I'd like to remember that in Java 8 there are explicit memory barrier instructions in Unsafe class:
/**
* Ensures lack of reordering of loads before the fence
* with loads or stores after the fence.
*/
void loadFence();
/**
* Ensures lack of reordering of stores before the fence
* with loads or stores after the fence.
*/
void storeFence();
/**
* Ensures lack of reordering of loads or stores before the fence
* with loads or stores after the fence.
*/
void fullFence();
Although Unsafe is not a public API, I still recommend to at least consider using it, if you're using Java 8.
One more solution is coming to my mind. You have set your concurrencyLevel to 1 which means that only one thread at a time can do anything with a collection. IMO standard Java synchronized or ReentrantLock (for the cases of high contention) will also fit for your task and do provide visibility guarantees. Although, if you want one writer, many readers access pattern, consider using ReentrantReadWriteLock.

Well, that's still not particularly safe, b/c it depends a lot of the underlying implementation of the HashMultimap.
You might take a look at the following blog post for a discussion: http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
For this type of thing, a common pattern is to load a "most recent version" into a volatile variable and have your readers read immutable versions through that. This is how CopyOnWriteArrayList is implemented.
Something like ...
class Collector {
private volatile HashMultimap values = HashMultimap.create();
public add(String k, String v) {
HashMultimap t = HashMultimap.create(values);
t.put(k,v);
this.values = t; // this invokes a memory barrier
}
public Set<String> get(String k) {
values.get(k); // this volatile read is memory barrier
}
}
However, both your and my solution still have a bit of a problem -- we are both returning mutable views on the underlying data structure. I might change the HashMultimap to an ImmutableMultimap to fix the mutability issue. Beware also that callers retain a reference to the full internal map (not just the returned Set) as a side effect of things being a view.
Creating a new copy can seem somewhat wasteful, but I suspect that if you have only one thread writing, then you have an understanding of the rate of change and can decide if that's reasonable or not. For example, f you wanted to return Set<String> instances which update dynamically as things change then the solution based on map maker doesn't seem heavy handed.

Is it appropriate to use AtomicReference.compareAndSet to set a reference to the results of a database call?

I am implementing a simple cache with the cache stored as an AtomicReference.
private AtomicReference<Map<String, String>> cacheData;
The cache object should be populated (lazily) from a database table.
I provide a method to return the cache data to a caller, but if the data is null (ie. not loaded), then the code needs to load the data from the database. To avoid synchronized I thought of using the compareAndSet() method:
public Object getCacheData() {
cacheData.compareAndSet(null, getDataFromDatabase()); // atomic reload only if data not set!
return Collections.unmodifiableMap(cacheData.get());
}
Is it ok to use compareAndSet in this way ie. to involve a database call as part of the atomic action? Is it any better/worse than just synchronizing the method?
Many thanks for any advice..

You do not achieve expected behaviour. This expression:
cacheData.compareAndSet(null, getDataFromDatabase())
will always call getDataFromDatabase() first. This means that it doesn't matter if the data was cached or not. If it was, you still call the database, but discard the results. The cache is working, but the performance is equally poor.
Consider this instead:
if(cacheData.get() == null) {
cacheData.compareAndSet(null, unmodifiableMap(getDataFromDatabase()));
}
return cacheData.get());
It's not perfect (still getDataFromDatabase() can be called multiple times at the beginning), but will work later as expected. Also I moved Collections.unmodifiableMap() earlier so that you don't have to wrap the same map over and over again.
Which brings us to even simpler implementation (no synchronized or AtomicReference needed):
private volatile Map<String, String> cacheData;
if(cacheData == null) {
cacheData = unmodifiableMap(getDataFromDatabase());
}
return cacheData;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.