Local variables are thread safe in Java. Is using a hashmap declared inside a method thread safe?
For Example-
void usingHashMap()
{
HashMap<Integer> map = new HashMap<integer>();
}
When two threads run the same method here usingHashMap(), they are in no way way related. Each thread will create its own version of every local variable, and these variables will not interact with each other in any way
If variables aren't local,then they are attached to the instance. In this case, two threads running the same method both see the one variable, and this isn't threadsafe.
public class usingHashMapNotThreadSafe {
HashMap<Integer, String> map = new HashMap<Integer, String>();
public int work() {
//manipulating the hashmap here
}
}
public class usingHashMapThreadSafe {
public int worksafe() {
HashMap<Integer, String> map = new HashMap<Integer, String>();
//manipulating the hashmap here
}
}
While usingHashMapNotThreadSafe two threads running on the same instance of usingHashMapNotThreadSafe will see the same x. This could be dangerous, because the threads are trying to change map! In the second, two threads running on the same instance of usingHashMapThreadSafe will see totally different versions of x, and can't effect each other.
As long as the reference to the HashMap object is not published (is not passed to another method), it is threadsafe.
The same applies to the keys/values stored in the map. They need to be either immutable (cannot change their states after being created) or used only within this method.
I think to ensure complete concurrency, a ConcurrentHashMap should be used in any case. Even if it is local in scope. ConcurrentHashMap implements ConcurrentMap. The partitioning is essentially an attempt, as explained in the documentation to:
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.
Related
I have a map defined
private static HashMap<Object, Object> myMap;
It is populated in a single thread and then that single thread spawns more threads that alter the data inside the map elements, but do not alter the map structure (no removes/puts/etc). Also, each thread alters one and only one unique member of the map (no two threads alter the same member).
My question is: will the main thread see the changes to the hashmap members, once all changes are complete? If not, would adding volatile to the declaration work, or would it only guarantee other threads see changes to the structure? Thanks
EDIT: Code that hopefully highlights what I'm doing in a more clear way
import java.util.HashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class TestingRandomStuff {
public static void main(String[] args) throws Exception {
HashMap<Object, Object> myMap = new HashMap();
ExecutorService pool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
//myMap is populated by this thread, and the objects inside are initialized, but are left largely empty
populate();
for (Object o : myMap.values()) {
Runnable r = new Task(o);
pool.execute(r);
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
} catch (InterruptedException e) {;}
//How do I gurantee that the objects inside myMap are displayed correctly with all the data that was just loaded by seperate threads,
//or is it already guranteed?
displayObjectData();
}
public static class Task implements Runnable {
private Object o;
public Task(Object o) {this.o = o;}
public void run() {
try {
o.load(); //o contains many complicated instance variables that will be created and written to
} catch (Exception e) {;}
}
}
}
EDIT: in your example, the map doesn't get accessed in other threads, only objects which are referenced by the map.
The objects themselves should be thread safe due to the way they are being used.
Note: if you used a parallelStream() the code would be simpler.
will other threads see the changes to the hashmap members?
Probably, but there is no guarentee
If not, would adding volatile to the declaration work,
volatile on the field only adds a read barrier on the reference to the Map. (Unless you change the field to point to another Map when you will get a write barrier.)
or would it only guarantee other threads see changes to the structure?
No, only guaranteed see changes to the myMap reference not changes to the Map or anything in the Map. i.e. the guarantee is very shallow.
There is a number of ways you can provide thread safety however the simplest is to synchronized the object in the on write and read. There are tricks you can do with volatile fields however it is high dependant on what you are doing as to whether thi will work.
You could use ConcurrentHashMap.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
This means changes done by a thread are visible to another thread reading value for the same key. But the two can interfere.
In Java, there are no objects embedded in other objects, all data structures consisting of multiple objects are reference graphs, or in other words, a collection of objects is a collection of references and a map with key and value objects is a map containing references to these objects.
Therefore, your objects never became “hashmap members”, but are just referenced by your hashmap. So, to discuss the thread safety of your code, the existence of the HashMap is just a red herring, as your multi-threaded code never sees any artifact of the HashMap.
You have code creating several distinct object, then submitting Task instances, each containing a reference to one of these objects, to an ExecutorService to be processed. Assuming that these object do not share mutable state, this is a straight-forward thread safe approach.
After waiting for the completion of all jobs, the main thread can be sure to see the result of all actions made within the jobs, i.e. the modifications made to these objects. It will again be entirely irrelevant whether you use that HashMap or anything else to get a reference to one of these objects to look at these modifications.
It would be different if you were modifying keys of a map in a way that it affects their equality or hash code, but that’s independent from thread safety concerns. You must never modify map keys in such a way, even in single threaded code, and violating this contract would even break thread safe maps. But since your objects are referenced as values, there is no such problem.
There is only one corner case to pay attention to. Your waiting for completion contains the line catch (InterruptedException e) {;}, so there is no full guaranty that after the execution of the statement truly all jobs have been completed, but this is the requirement for the visibility guaranty. Since you seem to assume that interruption should never happen, you should use something like catch(InterruptedException e) { throw new AssertionError(e); }, to be sure that violations of that assumption do not happen silently, while at the same time, get the full visibility guaranty, as now the displayObjectData(); statement can only be reached when all jobs have been completed (or Long.MAX_VALUE seconds elapsed which no-one of us will ever witness).
I am a bit confused regarding one pattern I have seen in some legacy code of ours.
The controller uses a map as a cache, with an approach that should be thread safe, however I am still not confident it indeed is. We have a map, which is properly synchronized during addition and retrieval, however, there is a bit of logic outside of the synchronized block, that does some additional filtering.
(the map itself and the lists are never accessed outside of this method, so concurrent modification is not an issue; the map holds some stable parameters, which basically never change, but are used often).
The code looks like the following sample:
public class FooBarController {
private final Map<String, List<FooBar>> fooBarMap =
new HashMap<String, List<FooBar>>();
public FooBar getFooBar(String key, String foo, String bar) {
List<FooBar> foobarList;
synchronized (fooBarMap) {
if (fooBarMap.get(key) == null) {
foobarList = queryDbByKey(key);
fooBarMap.put(key, foobarList);
} else {
foobarList = fooBarMap.get(key);
}
}
for(FooBar fooBar : foobarList) {
if(foo.equals(fooBar.getFoo()) && bar.equals(fooBar.getBar()))
return fooBar;
}
return null;
}
private List<FooBar> queryDbByKey(String key) {
// ... (simple Hibernate-query)
}
// ...
}
Based on what I know about the JVM memory model, this should be fine, since if one thread populates a list, another one can only retrieve it from the map with proper synchronization in place, ensuring that the entries of the list is visible. (putting the list happens-before getting it)
However, we keep seeing cases, where an entry expected to be in the map is not found, combined with the typical notorious symptoms of concurrency issues (e.g. intermittent failures in production, which I cannot reproduce in my development environment; different threads can properly retrieve the value etc.)
I am wondering if iterating through the elements of the List like this is thread-safe?
The code you provided is correct in terms of concurrency. Here are the guarantees:
only one thread at a time adds values to map, because of synchronization on map object
values added by thread become visible for all other threads, that enter synchronized block
Given that, you can be sure that all threads that iterate a list see the same elements. The issues you described are indeed strange but I doubt they're related to the code you provided.
It could be thread safe only if all access too fooBarMap are synchronized. A little out of scope, but safer may be to use a ConcurrentHashmap.
There is a great article on how hashmaps can be synchronized here.
In situation like this it's best option to use ConcurrentHashMap.
Verify if all Update-Read are in order.
As I understood from your question. There are fix set of params which never changes. One of the ways I preferred in situation like this is:
I. To create the map cache during start up and keep only one instance of it.
II. Read the map Instance anytime anywhere in the application.
In the for loop you are returning reference to fooBar objects in the foobarList.
So the method calling getFooBar() has access to the Map through this fooBar reference object.
try to clone fooBar before returning from getFooBar()
I am making some changes to some code I have written to try and change it into a multi-threaded solution. Some of the elements from my main class were originally static, and have had to be changed as part of the changes I am making. I had the idea to store them in a HashMap, using the Id of the Thread as the key for retrieving the items - that way I could store a reference to the Runnable class in the hash and access the desired attributes for the given thread by using getters/setters. I defined the below code to do this:
import java.util.HashMap;
public class ThreadContext {
private static HashMap<String, HashMap<String, Object>> tContext;
static {
initThreadContext();
}
public static void initThreadContext() {
String id = String.valueOf(Thread.currentThread().getId());
tContext = new HashMap<>();
}
public static void setObject(String key, Object o) {
String id = String.valueOf(Thread.currentThread().getId());
HashMap<String, Object> hash = tContext.get(id);
if( hash == null ) {
hash = new HashMap<>();
tContext.put(id, hash);
}
hash.put(key, o);
}
public static Object getObject(String key) {
String id = String.valueOf(Thread.currentThread().getId());
HashMap<String, Object> hash = tContext.get(id);
if( hash == null ) {
hash = new HashMap<>();
tContext.put(id, hash);
}
Object o = hash.get(key);
return o;
}
}
My question is: is it safe to do this, or should I try and find another way to do this? My example appears to work OK, but I'm unsure of any other side effects which may come about because of this.
EDIT: Example usage:
Foo foo = ((Foo)ThreadContext.getObject(Foo.CLASS_IDENTIFIER));
foo.doStuff();
There is already a way to do this using the JDK's ThreadLocal, which stores distinct references for each (local) thread.
Not sure what you are trying to do, however some of the points you should think are :
HashMap is not a synchronized object and has to be used in places where you don't need to worry about threads
In your case you seem to assume Thread Id will be unique which will not be when running on application servers. Some of the Application servers reuse thread ids and even use thread pool to reuse threads.
If you want to have data associated to a thread alone, use ThreadLocal. Again ThreadLocal should be used with Caution as there is no way JVM can clear the contents of ThreadLocal once your thread completes execution, if there is a thread pool. You will have to set the data and clear the data yourself.
The ThreadLocal is certainly a better approach.
But you want feedback on this code, so here it is.
The static block and the init can all be inlined on the static declaration.
You could use an IdentityHashMap and store thread instance themselves, avoiding the unclear risks around the thread id value stated above.
You could certainly use some static method synchronization for thread safety, but that would create contention. So a ConcurrentHashMap would locate the sub map for each thread, which in turn doesn't need synchronization (since only one thread could access it).
Regarding the safety (visibility to other unintended stackframes) when using a thread pool or executor and the likes, you can code yourself a try/finally or a closure (java87 lambda) to make sure you cleanup when you leave your code stackframes. No harder than the lock/unlock discipline.
BIG WARNING: if your code needing this custom threadlocal (or ANY thread local) will be inside a ForkJointTask.compute() subject to a ForkJoinPool during and calling a ForkJoinTask.join(), your thread will possibly run other identical ForkJoinTask.compute() (because of the thread continuation emulation) and your custom threadlocal could be initialized again and again (meaning, it will be clobbered) before even leaving the initial ForkJoinTask.compute(). This means you would need a stack of initial values managed in your try/finally... to tolerate re-entrance.
This question already has answers here:
Java double checked locking
(11 answers)
Closed 7 years ago.
The following code uses a double checked pattern to initialize variables. I believe the code is thread safe, as the map wont partially assigned even if two threads are getting into getMap() method at the same time. So I don't have to make the map as volatile as well. Is the reasoning correct? NOTE: The map is immutable once it is initialized.
class A {
private Map<String, Integer> map;
private final Object lock = new Object();
public static Map<String, Integer> prepareMap() {
Map<String, Integer> map = new HashMap<>();
map.put("test", 1);
return map;
}
public Map<String, Integer> getMap() {
if (map == null) {
synchronized (lock) {
if (map == null) {
map = prepareMap();
}
}
}
return map;
}
}
According to the top names in the Java world, no it is not thread safe. You can read why here: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
You better off using ConcurrentHashmap or synchronizing your Map.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html
Edit: If you only want to make the initialization of the map thread safe (so that two or more maps are not accidentally created) then you can do two things. 1) initialize the map when it is declared. 2) make the getMap() method synchronized.
No, your reasoning is wrong, access to the map is not thread safe, because the threads that call getMap() after the initialization may not invoke synchronized(lock) and thus are not in happens-before relation to other threads.
The map has to be volatile.
The code could be optimized by inlining to
public Map<String,Integer> getMap()
{
if(map == null)
{
synchronized(lock)
{
if(map == null)
{
map = new HashMap<>(); // partial map exposed
map.put("test", 1);
}
}
}
return map;
}
}
Having a HashMap under concurrent read and write is VERY dangerous, don't do it. Google HashMap infinite loop.
Solutions -
Expand synchronized to the entire method, so that reading map variable is also under lock. This is a little expensive.
Declare map as volatile, to prevent reordering optimization. This is simple, and pretty cheap.
Use an immutable map. The final fields will also prevent exposing partial object state. In your particular example, we can use Collections.singletonMap. But for maps with more entries, I'm not sure JDK has a public implementation.
This is just one example of how things can go wrong. To fully understand the issues, there is no substitute for reading The "Double-Checked Locking is Broken" Declaration, referenced in a prior answer.
To get anything approaching the full flavor, think about two processors, A and B, each with its own caches, and a main memory that they share.
Suppose Thread A, running on Processor A, first calls getMap. It does several assignments inside the synchronized block. Suppose the assignment to map gets written to main memory first, before Thread A reaches the end of the synchronized block.
Meanwhile, on Processor B, Thread B also calls getMap, and does not happen to have the memory location representing map in its cache. It goes out to main memory to get it, and its read happens to hit just after Thread A's assignment to map, so it sees a non-null map. Thread B does not enter the synchronized block.
At this point, Thread B can go ahead and attempt to use the HashMap, despite the fact that Thread A's work on creating it has not yet been written to main memory. Thread B may even have the memory pointed to by map in its cache because of a prior use.
If you are tempted to try to work around this, consider the following quote from the referenced article:
There are lots of reasons it doesn't work. The first couple of reasons
we'll describe are more obvious. After understanding those, you may be
tempted to try to devise a way to "fix" the double-checked locking
idiom. Your fixes will not work: there are more subtle reasons why
your fix won't work. Understand those reasons, come up with a better
fix, and it still won't work, because there are even more subtle
reasons.
This answer only contains one of the most obvious reasons.
No, it is not thread safe.
The basic reason is that you can have reordering of operations you don't even see in the Java code. Let's imagine a similar pattern with an even simpler class:
class Simple {
int value = 42;
}
In the analogous getSimple() method, you assign /* non-volatile */ simple = new Simple (). What happens here?
the JVM allocates some space for the new object
the JVM sets some bit of this space to 42 (for value)
the JVM returns the address of this space, which is then assigned to space
Without synchronization instructions to prohibit it, these instructions can be reordered. In particular, steps 2 and 3 can be ordered such that simple gets the new object's address before the constructor finishes! If another thread then reads simple.value, it'll see a value 0 (the field's default value) instead of 42. This is called seeing a partially-constructed object. Yes, that's weird; yes, I've seen things like that happen. It's a real bug.
You can imagine how if the object is a non-trivial object, like HashMap, the problem is even worse; there are a lot more operations, and so more possibilities for weird ordering.
Marking the field as volatile is a way of telling the JVM, "any thread that reads a value from this field must also read all operations that happened before that value was written." That prohibits those weird reorderings, which guarantees you'll see the fully-constructed object.
Unless you declare the lock as volatile, this code may be translated to non-thread-safe bytecode.
The compiler may optimize the expression map == null, cache the value of the expression and thus read the map property only once.
volatile Map<> map instructs the Java VM to always read the property map when it is accessed. Thsi would forbid such optimization from the complier.
Please refer to JLS Chapter 17. Threads and Locks
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
What is the best way to increase number of locks in java
Suppose I want to lock based on an integer id value. In this case, there's a function that pulls a value from a cache and does a fairly expensive retrieve/store into the cache if the value isn't there.
The existing code isn't synchronized and could potentially trigger multiple retrieve/store operations:
//psuedocode
public Page getPage (Integer id){
Page p = cache.get(id);
if (p==null)
{
p=getFromDataBase(id);
cache.store(p);
}
}
What I'd like to do is synchronize the retrieve on the id, e.g.
if (p==null)
{
synchronized (id)
{
..retrieve, store
}
}
Unfortunately this won't work because 2 separate calls can have the same Integer id value but a different Integer object, so they won't share the lock, and no synchronization will happen.
Is there a simple way of insuring that you have the same Integer instance? For example, will this work:
syncrhonized (Integer.valueOf(id.intValue())){
The javadoc for Integer.valueOf() seems to imply that you're likely to get the same instance, but that doesn't look like a guarantee:
Returns a Integer instance
representing the specified int value.
If a new Integer instance is not
required, this method should generally
be used in preference to the
constructor Integer(int), as this
method is likely to yield
significantly better space and time
performance by caching frequently
requested values.
So, any suggestions on how to get an Integer instance that's guaranteed to be the same, other than the more elaborate solutions like keeping a WeakHashMap of Lock objects keyed to the int? (nothing wrong with that, it just seems like there must be an obvious one-liner than I'm missing).
You really don't want to synchronize on an Integer, since you don't have control over what instances are the same and what instances are different. Java just doesn't provide such a facility (unless you're using Integers in a small range) that is dependable across different JVMs. If you really must synchronize on an Integer, then you need to keep a Map or Set of Integer so you can guarantee that you're getting the exact instance you want.
Better would be to create a new object, perhaps stored in a HashMap that is keyed by the Integer, to synchronize on. Something like this:
public Page getPage(Integer id) {
Page p = cache.get(id);
if (p == null) {
synchronized (getCacheSyncObject(id)) {
p = getFromDataBase(id);
cache.store(p);
}
}
}
private ConcurrentMap<Integer, Integer> locks = new ConcurrentHashMap<Integer, Integer>();
private Object getCacheSyncObject(final Integer id) {
locks.putIfAbsent(id, id);
return locks.get(id);
}
To explain this code, it uses ConcurrentMap, which allows use of putIfAbsent. You could do this:
locks.putIfAbsent(id, new Object());
but then you incur the (small) cost of creating an Object for each access. To avoid that, I just save the Integer itself in the Map. What does this achieve? Why is this any different from just using the Integer itself?
When you do a get() from a Map, the keys are compared with equals() (or at least the method used is the equivalent of using equals()). Two different Integer instances of the same value will be equal to each other. Thus, you can pass any number of different Integer instances of "new Integer(5)" as the parameter to getCacheSyncObject and you will always get back only the very first instance that was passed in that contained that value.
There are reasons why you may not want to synchronize on Integer ... you can get into deadlocks if multiple threads are synchronizing on Integer objects and are thus unwittingly using the same locks when they want to use different locks. You can fix this risk by using the
locks.putIfAbsent(id, new Object());
version and thus incurring a (very) small cost to each access to the cache. Doing this, you guarantee that this class will be doing its synchronization on an object that no other class will be synchronizing on. Always a Good Thing.
Use a thread-safe map, such as ConcurrentHashMap. This will allow you to manipulate a map safely, but use a different lock to do the real computation. In this way you can have multiple computations running simultaneous with a single map.
Use ConcurrentMap.putIfAbsent, but instead of placing the actual value, use a Future with computationally-light construction instead. Possibly the FutureTask implementation. Run the computation and then get the result, which will thread-safely block until done.
Integer.valueOf() only returns cached instances for a limited range. You haven't specified your range, but in general, this won't work.
However, I would strongly recommend you not take this approach, even if your values are in the correct range. Since these cached Integer instances are available to any code, you can't fully control the synchronization, which could lead to a deadlock. This is the same problem people have trying to lock on the result of String.intern().
The best lock is a private variable. Since only your code can reference it, you can guarantee that no deadlocks will occur.
By the way, using a WeakHashMap won't work either. If the instance serving as the key is unreferenced, it will be garbage collected. And if it is strongly referenced, you could use it directly.
Using synchronized on an Integer sounds really wrong by design.
If you need to synchronize each item individually only during retrieve/store you can create a Set and store there the currently locked items. In another words,
// this contains only those IDs that are currently locked, that is, this
// will contain only very few IDs most of the time
Set<Integer> activeIds = ...
Object retrieve(Integer id) {
// acquire "lock" on item #id
synchronized(activeIds) {
while(activeIds.contains(id)) {
try {
activeIds.wait();
} catch(InterruptedExcption e){...}
}
activeIds.add(id);
}
try {
// do the retrieve here...
return value;
} finally {
// release lock on item #id
synchronized(activeIds) {
activeIds.remove(id);
activeIds.notifyAll();
}
}
}
The same goes to the store.
The bottom line is: there is no single line of code that solves this problem exactly the way you need.
How about a ConcurrentHashMap with the Integer objects as keys?
You could have a look at this code for creating a mutex from an ID. The code was written for String IDs, but could easily be edited for Integer objects.
As you can see from the variety of answers, there are various ways to skin this cat:
Goetz et al's approach of keeping a cache of FutureTasks works quite well in situations like this where you're "caching something anyway" so don't mind building up a map of FutureTask objects (and if you did mind the map growing, at least it's easy to make pruning it concurrent)
As a general answer to "how to lock on ID", the approach outlined by Antonio has the advantage that it's obvious when the map of locks is added to/removed from.
You may need to watch out for a potential issue with Antonio's implementation, namely that the notifyAll() will wake up threads waiting on all IDs when one of them becomes available, which may not scale very well under high contention. In principle, I think you can fix that by having a Condition object for each currently locked ID, which is then the thing that you await/signal. Of course, if in practice there's rarely more than one ID being waited on at any given time, then this isn't an issue.
Steve,
your proposed code has a bunch of problems with synchronization. (Antonio's does as well).
To summarize:
You need to cache an expensive
object.
You need to make sure that while one thread is doing the retrieval, another thread does not also attempt to retrieve the same object.
That for n-threads all attempting to get the object only 1 object is ever retrieved and returned.
That for threads requesting different objects that they do not contend with each other.
pseudo code to make this happen (using a ConcurrentHashMap as the cache):
ConcurrentMap<Integer, java.util.concurrent.Future<Page>> cache = new ConcurrentHashMap<Integer, java.util.concurrent.Future<Page>>;
public Page getPage(Integer id) {
Future<Page> myFuture = new Future<Page>();
cache.putIfAbsent(id, myFuture);
Future<Page> actualFuture = cache.get(id);
if ( actualFuture == myFuture ) {
// I am the first w00t!
Page page = getFromDataBase(id);
myFuture.set(page);
}
return actualFuture.get();
}
Note:
java.util.concurrent.Future is an interface
java.util.concurrent.Future does not actually have a set() but look at the existing classes that implement Future to understand how to implement your own Future (Or use FutureTask)
Pushing the actual retrieval to a worker thread will almost certainly be a good idea.
See section 5.6 in Java Concurrency in Practice: "Building an efficient, scalable, result cache". It deals with the exact issue you are trying to solve. In particular, check out the memoizer pattern.
(source: umd.edu)