synchronizing reads to a java collection

synchronizing reads to a java collection - java

so i want to have an arraylist that stores a series of stock quotes. but i keep track of bid price, ask price and last price for each.
of course at any time, the bid ask or last of a given stock can change.
i have one thread that updates the prices and one that reads them.
i want to make sure that when reading no other thread is updating a price. so i looked at synchronized collection. but that seems to only prevent reading while another thread is adding or deleting an entry to the arraylist.
so now i'm onto the wrapper approach:
public class Qte_List {
private final ArrayList<Qte> the_list;
public void UpdateBid(String p_sym, double p_bid){
synchronized (the_list){
Qte q = Qte.FindBySym(the_list, p_sym);
q.bid=p_bid;}
}
public double ReadBid(String p_sym){
synchronized (the_list){
Qte q = Qte.FindBySym(the_list, p_sym);
return q.bid;}
}
so what i want to accomplish with this is only one thread can be doing anything - reading or updating an the_list's contents - at one time. am i approach this right?
thanks.

Yes, you are on the right track and that should work.
But why not use the existing Hashtable collection, which is synchronized, and provides a key-value lookup already?

As I understand it you are using the map to store the quotes; the number of quotes never changes, but each quote can be read or modified to reflect current prices. It is important to know that locking the collection only protects against changes to which Quote objects are in the map: it does not in any way restrict the modification of the contents of those Quotes. If you want to restrict that access you will have to provide locking on the Quote object.
Looking at your code however I don't believe you have a significant synchronization problem. If you try to do a read at the same time as a write, you will either get the price before or the price after the write. If you didn't know the write was going to occur that shouldn't matter to you. You may need locking at a higher level so that
if (getBidPrice(mystock)<10.0) {
sell(10000);
}
happens as an atomic operation and you don't end up selling at 5.0 rather than 10.0.
If the number of quotes really doesn't change then I would recommend allowing Qte objects to be added only in the constructor of Qte_List. This would make locking the collection irrelevant. The technical term for this is making Qte_List immutable.

That looks like a reasonable approach. Nit-picking, though, you probably shouldn't include the return statement inside the synchronized block:
public double ReadBid(String p_sym){
double bid;
synchronized (the_list) {
Qte q = Qte.FindBySym(the_list, p_sym);
bid = q.bid;
}
return bid;
}
I'm not sure if it's just my taste or there's some concurrency gotcha involved, but at the very least it looks cleaner to me ;-).

Yes this will work, anyway you don't need to do it yourself since it is already implemented in the Collections framework
Collections.synchronizedList

Your approach should do the trick, but as you stated, there can only be one reader and writer at a time. This isn't very scaleable.
There are some ways to improve performance without loosing thread-safety here.
You could use a ReadWriteLock for example. This will allow multiple readers at a time, but when someone gets the write-lock, all others must wait for him to finish.
Another way would be to use a proper collection. It seems you could exchange your list with a thread-safe implementation of Map. Have a look at the ConcurrentMap documentation for possible candidates.
Edit:
Assuming that you need ordering for your Map, have a look at the ConcurrentNavigableMap interface.

What you have will work, but locking the entire list every time you want to read or update the value of an element is not scalable. If this doesn't matter, then you're fine with what you have. If you want to make it more scalable consider the following...
You didn't say whether you need to be able to make structural changes to the_list (adding or removing elements), but if you don't, one big improvement would be to move the call to FindBySym() outside of the synchronized block. Then instead of synchronizing on the_list, you can just synchronize on q (the Qte object). That way you can update different Qte objects concurrently. Also, if you can make the Qte objects immutable as well you actually don't need any synchronization at all. (to update, just use the_list[i] = new Qte(...) ).
If you do need to be able to make structural changes to the list, you can use a ReentrantReadWriteLock to allow for concurrent reads and exclusive writes.
I'm also curious why you want to use an ArrayList rather than a synchronized HashMap.

Related

Do I need a lock if I change object pointer instead of update the object?

I'm using Java to write a multithreading application. One question I have is I have a list that is accessed by multiple threads, and I have one thread trying to update it. However, each time, the update thread will create a new List and then make the public shared list point to the new List, like this:
Public List<DataObject> publicDataObject = XXXX; // <- this will be accessed by multiple threads
Then I have one thread updating this List:
List<DataObject> newDataObjectList = CreateNewDataObject();
publicDataObject = newDataObjectList;
When I update the pointer of publicDataObject, do I need a lock to make it thread-safe?

Before going into the answer, let's first check if I understand the situation and the question.
Assumption as stated in problem description: Only a single thread creates new versions of the list, based on the previous value of publicDataObject, and stores that new list in publicDataObject.
Derived assumption: Other threads access DataObject elements from that list, but do not add, remove or change the order of elements.
If this assumption holds, the answer is below.
Otherwise, please make sure your question includes this in its description. This makes the answer much more complex, though, and I advise you to study the topic of concurrency more, for example by reading a book about Java concurrency.
Additional assumption:The DataObject objects themselves are thread-safe.
If this assumption does not hold, this would make the scope of the question too broad and I would suggest to study the topic of concurrency more, for example by reading a book about Java concurrency.
Answer
Given that the above assumptions are true, you do not need a lock, however, you cannot just access publicDataObject from multiple threads, using its definition in you code example. The reason is the Java Memory Model. The Java Memory Model makes no guarantees whatsoever about threads seeing changes made by other threads, unless you use special language constructs like atomic variables, locks or synchronization.
Simplified, these constructs ensure that a read in one thread that happens after a write in another, can see that written value, as long as you are using the same construct: the same atomic variable, lock or synchronisation on the same object. Locks and intrinsic locks (used by synchronisation) can also ensure exclusive access of a single thread to a block of code.
Given, again, that the above assumptions are true, you can suffice using an AtomicReference, whose get and set methods have the desired relationship:
// Definition
public AtomicReference<List<DataObject>> publicDataObject;
The reasons that a simple construct can be used that "only" guarantees visibility are:
The list that publicDataObject refers to, is always a new one (the first assumption). Therefore, other threads can never see a stale value of the list itself. Making sure that threads see the correct value of publicDataObject is therefore enough
As long as other threads don't change the list.
If in addition, only thread sets publicDataObject, there can be no race conditions in setting it, for example loosing updates that are overwritten by more recent updates before ever being read.

thread safe map operation

I came across the following piece of code ,and noted a few inconsistencies - for a multi-thread safe code.
Map<String,Map<String,Set<String>> clusters = new HashMap<.........>;
Map<String,Set<String>> servers = clusters.get(clusterkey);
if(servers==null){
synchronized(clusterkey){
servers = clusters.get(clusterkey);
if(servers==null){....initialize new hashmap and put...}
}
}
Set<String> users=servers.get(serverkey);
if(users==null){
synchronized(serverkey){
users=servers.get(serverkey);
if(users==null){ ... initialize new hashset and put...}
}
}
users.add(userid);
Why would map be synchronized on clusterkey- shouldnt it be on the map as monitor itself?
Shouldnt the last users.add... be synchronized too?
This seems to be a lot of code to add a single user in a thread-safe manner.What would be a smarter implementation?

Here just some observations:
Synchronizing on a String is a very bad idea -> sync on clusterKey and serverKey will probably not work the way intended.
Better would be to use ConcurrentHashMaps and ConcurrentHashSets.
Though without more context it is not really possible to answer this question. It seems the code-author wanted to safely create just 1 mapping per clusterKey and serverKey so the user can be added just once.
A (probably better) way would be to just synchronize on the clusters map itself and then you're safe as only one thread can read and/or write to said map.
Another way would be to use custom Locks, maybe one for reading, and another one for writing, though this may lead again to inconsistencies if one thread is writing to the Map while another is reading that exact value from it.

The code looks like a non-thought through version of the Double checked locking idiom that sometimes is used for lazy initialisation. Read the provided link for why this is a really bad implementation of it.
The problem with the given code is that it fails intermittently. There is a race condition when there are several threads trying to work on the map using the same key (or keys with the same hashcode) which means that the map created first might be replaced by the second hashmap.

1 -The synch is trying to avoid that two threads, at the same time, create a new Entry in that Map. The second one must wait so his (servers==null) doesn't also return true.
2 - That users list seems to be out of scope, but seems like it doesn't need a synch. Maybe the programmer knows there is no duplicated userIds, or maybe he doesn't care about resetting the same user again and again.
3- ConcurrentHashMap maybe?

How to handle synchronization of frequent concurrent read/writes on a Java ArrayList

I have a Java class that contains an ArrayList of transaction info objects that get queried and modified by different threads on a frequent basis. At a basic level, the structure of the class looks something like this (currently no synchronization is present):
class Statistics
{
private List<TranInfo> tranInfoList = new ArrayList<TranInfo>();
// This method runs frequently - every time a transaction comes in.
void add(TranInfo tranInfo)
{
tranInfoList.add(tranInfo);
}
// This method acts like a cleaner and runs occasionally.
void removeBasedOnSomeCondition()
{
// Some code to determine which items to remove
tranInfoList.removeAll(listOfUnwantedTranInfos);
}
// Methods to query stats on the tran info.
// These methods are called frequently.
Stats getStatsBasedOnSomeCondition()
{
// Iterate over the list of tran info
// objects and return some stats
}
Stats getStatsBasedOnSomeOtherCondition()
{
// Iterate over the list of tran info
// objects and return some stats
}
}
I need to ensure that read/write operations on the list are synchronized correctly, however, performance is very important, so I don't want to end up locking in every method call (especially for concurrent read operations). I've looked at the following solutions:
CopyOnWriteArrayList
I've looked at the use of a CopyOnWriteArrayList to prevent ConcurrentModificationExceptions being thrown when the list is modified while iterating over it; the problem here is the copy required each time the list is modified... it seems too expensive given how often the list will be modified and the potential size of the list.
ReadWriteLock
A ReadWriteLock could be used to synchronize read/write operations while allowing concurrent read operations to take place. While this approach will work, it ends up resulting in a lot of synchronization code in the class (this isn't the end of the world though).
Are there any other clever ways of achieving this kind of synchronization without a big performance penalty, or are one of the above methods the recommended way? Any advice on this would be greatly appreciated.

I'd use Collections.synchronizedList() until you know for sure that it is indeed the crucial performance bottle neck of your application (needless to say I doubt it is ;-)). You can only know for sure through thorough testing. I assume you know about "premature optimization"...
If then you strive to optimize access to that list I'd say ReadWriteLock is a good approach.

Another solution that may make sense (especially under heavy read/write) is ConcurrentLinkedQueue (http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html). It is a pretty scalable implementation under contention, based on CAS operations.
The one change that's required to your code is that ConcurrentLinkedQueue does not implement the List interface, and you need to abide by either the Iterable or the Queue type. The only operation you lose really is random access via index, but I don't see that being an issue in your access pattern.

Missing items from synchronized HashMap in a threaded environment

I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?

In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.

Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.

As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.

Synchronizing on an Integer value [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
What is the best way to increase number of locks in java
Suppose I want to lock based on an integer id value. In this case, there's a function that pulls a value from a cache and does a fairly expensive retrieve/store into the cache if the value isn't there.
The existing code isn't synchronized and could potentially trigger multiple retrieve/store operations:
//psuedocode
public Page getPage (Integer id){
Page p = cache.get(id);
if (p==null)
{
p=getFromDataBase(id);
cache.store(p);
}
}
What I'd like to do is synchronize the retrieve on the id, e.g.
if (p==null)
{
synchronized (id)
{
..retrieve, store
}
}
Unfortunately this won't work because 2 separate calls can have the same Integer id value but a different Integer object, so they won't share the lock, and no synchronization will happen.
Is there a simple way of insuring that you have the same Integer instance? For example, will this work:
syncrhonized (Integer.valueOf(id.intValue())){
The javadoc for Integer.valueOf() seems to imply that you're likely to get the same instance, but that doesn't look like a guarantee:
Returns a Integer instance
representing the specified int value.
If a new Integer instance is not
required, this method should generally
be used in preference to the
constructor Integer(int), as this
method is likely to yield
significantly better space and time
performance by caching frequently
requested values.
So, any suggestions on how to get an Integer instance that's guaranteed to be the same, other than the more elaborate solutions like keeping a WeakHashMap of Lock objects keyed to the int? (nothing wrong with that, it just seems like there must be an obvious one-liner than I'm missing).

You really don't want to synchronize on an Integer, since you don't have control over what instances are the same and what instances are different. Java just doesn't provide such a facility (unless you're using Integers in a small range) that is dependable across different JVMs. If you really must synchronize on an Integer, then you need to keep a Map or Set of Integer so you can guarantee that you're getting the exact instance you want.
Better would be to create a new object, perhaps stored in a HashMap that is keyed by the Integer, to synchronize on. Something like this:
public Page getPage(Integer id) {
Page p = cache.get(id);
if (p == null) {
synchronized (getCacheSyncObject(id)) {
p = getFromDataBase(id);
cache.store(p);
}
}
}
private ConcurrentMap<Integer, Integer> locks = new ConcurrentHashMap<Integer, Integer>();
private Object getCacheSyncObject(final Integer id) {
locks.putIfAbsent(id, id);
return locks.get(id);
}
To explain this code, it uses ConcurrentMap, which allows use of putIfAbsent. You could do this:
locks.putIfAbsent(id, new Object());
but then you incur the (small) cost of creating an Object for each access. To avoid that, I just save the Integer itself in the Map. What does this achieve? Why is this any different from just using the Integer itself?
When you do a get() from a Map, the keys are compared with equals() (or at least the method used is the equivalent of using equals()). Two different Integer instances of the same value will be equal to each other. Thus, you can pass any number of different Integer instances of "new Integer(5)" as the parameter to getCacheSyncObject and you will always get back only the very first instance that was passed in that contained that value.
There are reasons why you may not want to synchronize on Integer ... you can get into deadlocks if multiple threads are synchronizing on Integer objects and are thus unwittingly using the same locks when they want to use different locks. You can fix this risk by using the
locks.putIfAbsent(id, new Object());
version and thus incurring a (very) small cost to each access to the cache. Doing this, you guarantee that this class will be doing its synchronization on an object that no other class will be synchronizing on. Always a Good Thing.

Use a thread-safe map, such as ConcurrentHashMap. This will allow you to manipulate a map safely, but use a different lock to do the real computation. In this way you can have multiple computations running simultaneous with a single map.
Use ConcurrentMap.putIfAbsent, but instead of placing the actual value, use a Future with computationally-light construction instead. Possibly the FutureTask implementation. Run the computation and then get the result, which will thread-safely block until done.

Integer.valueOf() only returns cached instances for a limited range. You haven't specified your range, but in general, this won't work.
However, I would strongly recommend you not take this approach, even if your values are in the correct range. Since these cached Integer instances are available to any code, you can't fully control the synchronization, which could lead to a deadlock. This is the same problem people have trying to lock on the result of String.intern().
The best lock is a private variable. Since only your code can reference it, you can guarantee that no deadlocks will occur.
By the way, using a WeakHashMap won't work either. If the instance serving as the key is unreferenced, it will be garbage collected. And if it is strongly referenced, you could use it directly.

Using synchronized on an Integer sounds really wrong by design.
If you need to synchronize each item individually only during retrieve/store you can create a Set and store there the currently locked items. In another words,
// this contains only those IDs that are currently locked, that is, this
// will contain only very few IDs most of the time
Set<Integer> activeIds = ...
Object retrieve(Integer id) {
// acquire "lock" on item #id
synchronized(activeIds) {
while(activeIds.contains(id)) {
try {
activeIds.wait();
} catch(InterruptedExcption e){...}
}
activeIds.add(id);
}
try {
// do the retrieve here...
return value;
} finally {
// release lock on item #id
synchronized(activeIds) {
activeIds.remove(id);
activeIds.notifyAll();
}
}
}
The same goes to the store.
The bottom line is: there is no single line of code that solves this problem exactly the way you need.

How about a ConcurrentHashMap with the Integer objects as keys?

You could have a look at this code for creating a mutex from an ID. The code was written for String IDs, but could easily be edited for Integer objects.

As you can see from the variety of answers, there are various ways to skin this cat:
Goetz et al's approach of keeping a cache of FutureTasks works quite well in situations like this where you're "caching something anyway" so don't mind building up a map of FutureTask objects (and if you did mind the map growing, at least it's easy to make pruning it concurrent)
As a general answer to "how to lock on ID", the approach outlined by Antonio has the advantage that it's obvious when the map of locks is added to/removed from.
You may need to watch out for a potential issue with Antonio's implementation, namely that the notifyAll() will wake up threads waiting on all IDs when one of them becomes available, which may not scale very well under high contention. In principle, I think you can fix that by having a Condition object for each currently locked ID, which is then the thing that you await/signal. Of course, if in practice there's rarely more than one ID being waited on at any given time, then this isn't an issue.

Steve,
your proposed code has a bunch of problems with synchronization. (Antonio's does as well).
To summarize:
You need to cache an expensive
object.
You need to make sure that while one thread is doing the retrieval, another thread does not also attempt to retrieve the same object.
That for n-threads all attempting to get the object only 1 object is ever retrieved and returned.
That for threads requesting different objects that they do not contend with each other.
pseudo code to make this happen (using a ConcurrentHashMap as the cache):
ConcurrentMap<Integer, java.util.concurrent.Future<Page>> cache = new ConcurrentHashMap<Integer, java.util.concurrent.Future<Page>>;
public Page getPage(Integer id) {
Future<Page> myFuture = new Future<Page>();
cache.putIfAbsent(id, myFuture);
Future<Page> actualFuture = cache.get(id);
if ( actualFuture == myFuture ) {
// I am the first w00t!
Page page = getFromDataBase(id);
myFuture.set(page);
}
return actualFuture.get();
}
Note:
java.util.concurrent.Future is an interface
java.util.concurrent.Future does not actually have a set() but look at the existing classes that implement Future to understand how to implement your own Future (Or use FutureTask)
Pushing the actual retrieval to a worker thread will almost certainly be a good idea.

See section 5.6 in Java Concurrency in Practice: "Building an efficient, scalable, result cache". It deals with the exact issue you are trying to solve. In particular, check out the memoizer pattern.
(source: umd.edu)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.