I work on Java 7.
I want to know if the method contains is thread-safe on HashSet object.
The HashSet is initialized by one thread. Then we wrap the HashSet with unmodifiable collection (Collections.unmodifiableSet). After initialization, multiple threads call only the method contains.
When I read the Javadoc, it's unclear for me.
On HashSet Javadoc we can read
This class implements the Set interface, backed by a hash table (actually a HashMap instance).
...
Note that this implementation is not synchronized.
And on the HashMap Javadoc, we can read:
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
For me, this means the method contains is not a structural modification.
So multiple call to the method contains is it thread-safe?
If it's true: Is it guaranteed on all implementation of JVM (like IBM JVM)?
In general, there can be no concurrency race (and thus conflict) solely between read operations. Concurrency problems arise between read and write operations. So, multiple read operations interleaved are always thread-safe (if we assume that such a notion of thread-safety is well defined).
Now, there is one more case where there might be a concurrency issue and this is during the initialisation of the data structure, since this can be considered the only modification (write operation) in your case. In order to make sure that all the subsequent contains() invocations will see the fully initialised Set, you have to make sure that it's correctly initialised. This concept is defined as "safe-publication" in Java and you can read more about it here or in the book "Java Concurrency in Practice" book.
To conclude, Collections.unmodifiableSet() publishes the result in a safe way through a final field. So, yes you can be sure that all contains() will see the fully initialised Set
Related
I want to implement a bean where I have a TreeSet that is storing integers in sorted order. The only methods that use this TreeSet are addValue that is adding a new integer to the set, and getHighestValue that is returning the last value in the set using the method last() from SortedSet.
Is there any concurrency issue here? I'm not using any explicit iterator so there shouldn't be any concurrency problem when fetching the highest value but I don't know if last method can throw any ConcurrentModificationException or any other exception if two threads try to add and get the highest value at the same time.
Yes, assuming multiple threads are interacting with the set, and at least one is modifying, there are concurrency concerns. In particular, you mention multiple threads doing the add operation, which is definitely going to cause problems - worse however, they may not raise ConcurrentModificationExceptions.
The purpose of ConcurrentModificationException is to alert you to clearly erroneous concurrency issues, such as attempting to remove an item from a collection as you iterate over it (it's unclear what the collection should even do, in such a case). However the exception can only be raised when the collection is aware of the erroneous modification. Since the collection is not thread-safe it explicitly does not guarantee multi-threaded operations will be done correctly, and you are expected to take care of explicitly protecting the collection yourself.
The easiest, though least efficient, way to do this is to wrap the set with Collections.synchronizedSortedSet() before using it, i.e.:
SortedSet synchronizedSet = Collections.synchronizedSortedSet(new TreeSet());
This ensures that each method call will be done in serial, that is to say they will block waiting for any earlier calls to finish. However as such, you essentially lose most of the benefits of multi-threading.
Another alternative is to use an explicitly thread-safe SortedSet, namely ConcurrentSkipListSet:
This implementation provides expected average log(n) time cost for the contains, add, and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads.
This implementation allows you to interact with the set from multiple threads without further concern. That isn't to say it is the best way to implement the behavior you're looking for, but given what you've described - add and access the maximum value in a sorted set from multiple threads - it's what you're looking for.
See also: When is a ConcurrentSkipListSet useful?
What is the best way to implement synchronization of a linkedhashmap externally, without using Collections.synchronizedMap
When Collections.synchronizedMap is used entire datastructure is locked, so performance is hugely impacted in a bad way.
What is the best way to lock only required part of datastructure. e.g. If thread is accessing key (K1), it should lock only Key(K1) and Value(v1) part of the datastructure
You can't get a fine-grained-locking, FIFO-eviction concurrent map from the built-in Java implementations.
Check out Guava's Cache or the open-source ConcurrentLinkedHashMap project.
I think you may want to synchronize the subsequent operation you do, just on the value coming from the map:
Object value = map.get(key);
synchronized(value) {
doSomethingWith(value);
}
Synchronizing to values get from the Map, makes sense, since they can be shared and accessed concurrently; the example I posted above should do what you need. That should be enough.
By the way you can also synchronize on the key doing two nested synchronized blocks:
synchronized(key) {
Object value = map.get(key);
synchronized(value) {
doSomethingWith(value);
}
}
The key is -usually- just used to access the object (by hashing). Keys are matched by hash value, so it doesn't make full sense to me to synchronize over the key.
Or, maybe you can subclass ConcurrentHashMap adding what is missing from LinkedHashMap.
Louis Wasserman's suggestion is probably the best because it gives you a lot of useful functionality. However, even if you lock on the entire map, you have to be hitting it really, really hard to make that a bottleneck (as in, your code is mostly doing read/write on the map). If you don't need the additional functionality of Guava's Cache, a synchronized map could be simpler & better. You could also use a ReadWriteLock if you mostly read from the map.
Best option would be to use java.util.concurrent.ConcurrentHashMap .
I can't see how it would be possible to externally lock only parts of zour Map, since you cannot control what shared datastructures are accessed internally by a call to any of the maps function.
If you don't need a LinkedHaspMap, use a ConcurrentHashMap from the java.util.concurrent package.
It is specifically designed for both speed and thread safety. It uses the minimal possible locking to achieve its thread safety.
An insertion in a HashMap, or LinkedHashMap, can cause a rehash because it increases the ratio between the size and the number of buckets. Having two or more threads rehash simultaneously would be a disaster.
Even if you are only doing a get, another thread may be removing an entry from the same bucket, so you are scanning a linked list that is being modified under you. You could also have two or more threads appending to the main linked list at the same time.
If you can do without the linking, use java.util.concurrent.ConcurrentHashMap, as already suggested.
If I do the following.
Create a HashMap (in a final field)
Populate HashMap
Wrap HashMap with unmodifiable wrapper Map
Start other threads which will access but not modify the Map
As I understand it the Map has been "safely published" because the other threads were started after the Map was fully populated so I think it is ok to access the Map from multiple threads as it cannot be modified after this point.
Is this right?
This is perfectly fine concerning the map itself. But you need to realize the making the map unmodifiable will only make the map itself unmodifiable and not its keys and values. So if you have for example a Map<String, SomeMutableObject> such as Map<String, List<String>>, then threads will still be able to alter the value by for example map.get("foo").add("bar");. To avoid this, you'd like to make the keys/values immutable/unmodifiable as well.
As I understand it the Map has been "safely published" because the other threads were started after the Map was fully populated so I think it is ok to access the Map from multiple threads as it cannot be modified after this point.
Yes. Just make sure that the other threads are started in a synchronized manner, i.e. make sure you have a happens-before relation between publishing the map, and starting the threads.
This is discussed in this blog post:
[...] This is how Collections.unmodifiableMap() works.
[...]
Because of the special meaning of the keyword "final", instances of this class can be shared with multiple threads without using any additional synchronization; when another thread calls get() on the instance, it is guaranteed to get the object you put into the map, without doing any additional synchronization. You should probably use something that is thread-safe to perform the handoff between threads (like LinkedBlockingQueue or something), but if you forget to do this, then you still have the guarantee.
In short, no you don't need the map to be thread-safe if the reads are non-destructive and the map reference is safely published to the client.
In the example there are two important happens-before relationships established here. The final-field publication (if and only if the population is done inside the constructor and the reference doesn't leak outside the constructor) and the calls to start the threads.
Anything that modifies the map after these calls wrt the client reading from the map is not safely published.
We have for example a CopyOnWriteMap that has a non-threadsafe map underlying that is copied on each write. This is as fast as possible in situations where there are many more reads than writes (caching configuration data is a good example).
That said, if the intention really is to not change the map, setting an immutable version of the map into the field is always the best way to go as it guarantees the client will see the correct thing.
Lastly, there are some Map implementations that have destructive reads such as a LinkedHashMap with access ordering, or a WeakHashMap where entries can disappear. These types of maps must be accessed serially.
You are correct. There is no need to ensure exclusive access to the data structure by different threads by using mutex'es or otherwise since it's immutable. This usually greatly increases performance.
Also note that if you only wrap the original Map rather than creating a copy, ie the unmodifiable Map delegates method calls further to the inner HashMap, modifying the underlying Map may introduce race condition problems.
Immutable map is born to thread-safe. You could use ImmutableMap of Guava.
When do we use synchronized ArrayList? We already have Vector which is synchronized.
I think that you've got this wrong. ArrayList is unsynchronized, Vector is.
Being synchronized means that every operation is thread safe - if you use the same vector from two threads at the same time, they can't corrupt the state. However, this makes it slower.
If you are working in a single threaded environment (or the list is limited to a thread and never shared), use ArrayList. If you are working with multiple threads that share the same collection, either use Vector, or use ArrayList but synchronize in some other way (e.g., manually or via a wrapper).
ArrayList is not synchronized via http://java.sun.com/javase/6/docs/api/java/util/ArrayList.html
ArrayList is not synchronized out of the box.
Resizable-array implementation of the
List interface. Implements all
optional list operations, and permits
all elements, including null. In
addition to implementing the List
interface, this class provides methods
to manipulate the size of the array
that is used internally to store the
list. (This class is roughly
equivalent to Vector, except that it
is unsynchronized.)
This avoids some performance issues in situations where you know that you won't need thread safety (e.g., entirely encapsulated private data). However both ArrayList and Vector have issues when using iterators over them: when iterating through either type of collection, if data is added or removed, you will throw a ConcurrentModificationException:
Note that this implementation is not
synchronized. If multiple threads
access an ArrayList instance
concurrently, and at least one of the
threads modifies the list
structurally, it must be synchronized
externally. (A structural modification
is any operation that adds or deletes
one or more elements, or explicitly
resizes the backing array; merely
setting the value of an element is not
a structural modification.) This is
typically accomplished by
synchronizing on some object that
naturally encapsulates the list. If no
such object exists, the list should be
"wrapped" using the
Collections.synchronizedList method.
This is best done at creation time, to
prevent accidental unsynchronized
access to the list:
List list =
Collections.synchronizedList(new
ArrayList(...));
The iterators returned by this class's
iterator and listIterator methods are
fail-fast: if the list is structurally
modified at any time after the
iterator is created, in any way except
through the iterator's own remove or
add methods, the iterator will throw a
ConcurrentModificationException. Thus,
in the face of concurrent
modification, the iterator fails
quickly and cleanly, rather than
risking arbitrary, non-deterministic
behavior at an undetermined time in
the future.
Note that the fail-fast behavior of an
iterator cannot be guaranteed as it
is, generally speaking, impossible to
make any hard guarantees in the
presence of unsynchronized concurrent
modification. Fail-fast iterators
throw ConcurrentModificationException
on a best-effort basis. Therefore, it
would be wrong to write a program that
depended on this exception for its
correctness: the fail-fast behavior of
iterators should be used only to
detect bugs.
ArrayList comes in a variety of useful flavors, however, while Vector does not. My personal favorite is the CopyOnWriteArrayList:
A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
This is ordinarily too costly, but may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent threads. The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created. Element-changing operations on iterators themselves (remove, set, and add) are not supported. These methods throw UnsupportedOperationException.
CopyOnWriteArrayLists are tremendously useful in GUI work, especially in situations where you are displaying an updating set of data (e.g., moving icons on a screen). If you can tolerate having your displayed list of data be one frame out of date (because your producer thread is slightly behind your graphical update thread), CopyOnWriteArrayLists are the perfect data structure.
What does it mean array list is synchronized in java?
It means it is thread-safe.
Vectors are synchronized. Any method that touches the Vector's contents is thread safe.
ArrayList, on the other hand, is unsynchronized, making them, therefore, not thread safe.
I read ArrayList is synchronized in
java..
But the array list api says that
Note that this implementation is not
synchronized
So you use an ArrayList when you are sure that you wont be dealing with concurrency.
Using Vector might be an overkill, and may result in performance issues.
I need to make an ArrayList of ArrayLists thread safe. I also cannot have the client making changes to the collection. Will the unmodifiable wrapper make it thread safe or do I need two wrappers on the collection?
It depends. The wrapper will only prevent changes to the collection it wraps, not to the objects in the collection. If you have an ArrayList of ArrayLists, the global List as well as each of its element Lists need to be wrapped separately, and you may also have to do something for the contents of those lists. Finally, you have to make sure that the original list objects are not changed, since the wrapper only prevents changes through the wrapper reference, not to the original object.
You do NOT need the synchronized wrapper in this case.
On a related topic - I've seen several replies suggesting using synchronized collection in order to achieve thread safety.
Using synchronized version of a collection doesn't make it "thread safe" - although each operation (insert, count etc.) is protected by mutex when combining two operations there is no guarantee that they would execute atomically.
For example the following code is not thread safe (even with a synchronized queue):
if(queue.Count > 0)
{
queue.Add(...);
}
The unmodifiable wrapper only prevents changes to the structure of the list that it applies to. If this list contains other lists and you have threads trying to modify these nested lists, then you are not protected against concurrent modification risks.
From looking at the Collections source, it looks like Unmodifiable does not make it synchronized.
static class UnmodifiableSet<E> extends UnmodifiableCollection<E>
implements Set<E>, Serializable;
static class UnmodifiableCollection<E> implements Collection<E>, Serializable;
the synchronized class wrappers have a mutex object in them to do the synchronized parts, so looks like you need to use both to get both. Or roll your own!
I believe that because the UnmodifiableList wrapper stores the ArrayList to a final field, any read methods on the wrapper will see the list as it was when the wrapper was constructed as long as the list isn't modified after the wrapper is created, and as long as the mutable ArrayLists inside the wrapper aren't modified (which the wrapper can't protect against).
It will be thread-safe if the unmodifiable view is safely published, and the modifiable original is never ever modified (including all objects recursively contained in the collection!) after publication of the unmodifiable view.
If you want to keep modifying the original, then you can either create a defensive copy of the object graph of your collection and return an unmodifiable view of that, or use an inherently thread-safe list to begin with, and return an unmodifiable view of that.
You cannot return an unmodifiableList(synchonizedList(theList)) if you still intend to access theList unsynchronized afterwards; if mutable state is shared between multiple threads, then all threads must synchronize on the same locks when they access that state.
An immutable object is by definition thread safe (assuming no-one retains references to the original collections), so synchronization is not necessary.
Wrapping the outer ArrayList using Collections.unmodifiableList()
prevents the client from changing its contents (and thus makes it thread
safe), but the inner ArrayLists are still mutable.
Wrapping the inner ArrayLists using Collections.unmodifiableList() too
prevents the client from changing their contents (and thus makes them
thread safe), which is what you need.
Let us know if this solution causes problems (overhead, memory usage etc);
other solutions may be applicable to your problem. :)
EDIT: Of course, if the lists are modified they are NOT thread safe. I assumed no further edits were to be made.
Not sure if I understood what you are trying to do, but I'd say the answer in most cases is "No".
If you setup an ArrayList of ArrayList and both, the outer and inner lists can never be changed after creation (and during creation only one thread will have access to either inner and outer lists), they are probably thread safe by a wrapper (if both, outer and inner lists are wrapped in such a way that modifying them is impossible). All read-only operations on ArrayLists are most likely thread-safe. However, Sun does not guarantee them to be thread-safe (also not for read-only operations), so even though it might work right now, it could break in the future (if Sun creates some internal caching of data for quicker access for example).
This is neccessary if:
There is still a reference to the original modifiable list.
The list will possibly be accessed though an iterator.
If you intend to read from the ArrayList by index only you could assume this is thread-safe.
When in doubt, chose the synchronized wrapper.