So I'm stuck with a thread hostile singleton implementation that returns an Iterator to a HashSet.
I have two threads that -sometimes- access this Iterator simultaneously to load data. I'll call them the luckyThread and the unluckyThread.
ONE of them (unluckyThread) throws a ConcurrentModificationException.
Question: Is is safe to assume all is well with the other thread?
To be specific: is the data loaded by the luckyThread free of corruption?
(The couple of time this happened the system chugged along just fine, except for the unluckyThread)
Don't think this question requires any code sample, but I'd be happy to provide them if required.
Update: (without getting into details) the system is fine as long as one of the threads loads a clean data set. And quite needless to say I fixed this issue, but this got me thinking about recovering from such exceptions and I did not find anything concrete online.
If you look at the documentation of ConcurrentModificationException, it clearly states that:
Note that fail-fast behavior cannot be guaranteed as it is, generally
speaking, impossible to make any hard guarantees in the presence of
unsynchronized concurrent modification. Fail-fast operations throw
ConcurrentModificationException on a best-effort basis. Therefore, it
would be wrong to write a program that depended on this exception for
its correctness: ConcurrentModificationException should be used only
to detect bugs.
Instead, you probably should use some other mechanism to make sure there's no concurrent access (like using using a synchronized-block with the singleton while accessing the underlying HashSet).
The exception is thrown because the backing store has changed, which makes any usage of an iterator on that store vulnerable to the exception. This can even occur in a single threaded application if poorly written. In your case, neither of your threads is overly lucky because both can suffer this exception when the change occurs.
Even without a change to your underlying HashSet, having two threads access your iterator can lead to indeterminate behaviour since both will change the internal state of the iterator, not to mention that in the best case scenario each thread will be grabbing different items from your set.
The code is not safe as is and has to be rewritten to use a threadsafe Set and to NOT share the iterator between threads.
This is definitely unsafe. You are at the moment seeing the best case scenario : one thread gets a ConcurrentModificationException. It could be much worse than this. Behaviour of a HashSet is undefined under concurrent access. I'm not sure how stable an iterator on a HashSet is, but a quick look to the sources let me think that it could go very wrong. In case the keys are rehashed during iteration, you have a good chance to end up in an infinite loop.
Conclusion: either synchronize the access to the iterator, create a copy of your set (in a synchronized block) or change to a thread safe collection.
An alternative can be to use ConcurrentHashMap instead of HashMap. You do not need to have synchronized blocks when accessing ConcurrentHashMap in multithreaded application.
Related
Can anyone please explain to me the consequences of mutating a collection in java that is not thread-safe and is being used by multiple threads?
The results are undefined and somewhat random.
With JDK collections that are designed to fail fast, you might receive a ConcurrentModificationException. This is really the only consequence that is specific to thread safety with collections, as opposed to any other class.
Problems that occur generally with thread-unsafe classes may occur:
The internal state of the collection might be corrupted.
The mutation may appear to be successful, but the changes may not, in fact, be visible to other threads at any given time. They might be invisible at first and become visible later.
The changes might actually be successful under light load, but fail randomly under heavy load with lots of threads in contention.
Race conditions might occur, as was mentioned in a comment above.
There are lots of other possibilities, none of them pleasant. Worst of all, these things tend to most commonly reveal themselves in production, when the system is stressed.
In short, you probably don't want to do that.
The most common outcome is it looks like it works, but doesn't work all the time.
This can mean you have a problem which
works on one machine but doesn't on another.
works for a while but something apparently unrelated changes and your program breaks.
whenever you have a bug you don't know if it's a multi-threading issue or not if you are not using thread safe data structures.
What can happen is;
you rarely/randomly get an error and strange behaviour
your code goes into an infinite loop and stops working (HashMap used to do this)
The only option is to;
limit the amount of state which is shared between threads, ideally none at all.
be very careful about how data is updated.
don't rely on unit tests, you have to understand what the code doing and be confident it will be behave correctly in all possible situations.
The invariants of the data structure will not be guaranteed.
For example:
If thread 2 does a read whilst thread 1 is adding to the DS thread 1 may consider this element added while thread 2 doesn't see that the element has been added yet.
There are plenty of data structures that aren't thread-safe that will still appear to function(i.e. not throw) in a multi threaded environment and they might even perform correctly under certain circumstances(like if you aren't doing any writes to the data structure).
To fully understand this topic exploring the different classes of bugs that occur in concurrent systems is recommended: this short document seems like a good start.
http://pages.cs.wisc.edu/~remzi/OSTEP/threads-bugs.pdf
Recently I started reading 'Java 7 Concurrency Cookbook' and in a section Creating and running a daemon thread found the code where main thread creates and one instance of ArrayDeque and shares its reference with three producers and one consumer. The producers call deque.addFirst(event) and the consumer calls deque.getLast().
But JavaDoc of ArrayDeque clearly states that:
Array deques are not thread-safe; in the absence of external synchronization, they do not support concurrent access by multiple threads.
So I wonder whether it is a mistake or I just don't understand something?
Array deques are not thread safe, meaning you have to provide external synchronization.
However why it works is, like holger said
You are using addFirst(e) is an insert model method which does causes change in underlying datastructure
You are using getLast() which is an examine model method which does not causes change in underlying datastructure.
That is why it is working, if you had used removeLast() instead of getLast(), you should have got ConcurrentModification Exception for sure.
Hope this clears up everything , Cheers
It is clearly mentioned that if you are not going to provide any external synchronization, then ArrayDeque will not give you synchronization features just like Vector(provides internal features for thread safety-concurrency)
First, I'll describe what I want and then I'll elaborate on the possibilities I am considering. I don't know which is the best so I want some help.
I have a hash map on which I do read and write operations from a Servlet. Now, since this Servlet is on Tomcat, I need the hash map to be thread safe. Basically, when it is being written to, nothing else should write to it and nothing should be able to read it as well.
I have seen ConcurrentHashMap but noticed its get method is not thread-safe. Then, I have seen locks and something called synchronized.
I want to know which is the most reliable way to do it.
ConcurrentHashMap.get() is thread safe.
You can make HashMap thread safe by wrapping it with Collections.synchronizedMap().
EDIT: removed false information
In any case, the synchronized keyword is a safe bet. It blocks any threads from accessing the object while inside a synchronized block.
// Anything can modify map at this point, making it not thread safe
map.get(0);
as opposed to
// Nothing can modify map until the synchronized block is complete
synchronized(map) {
map.get(0);
}
I would like to suggest you to go with ConcurrentHashMap , the requirement that you have mentioned above ,earlier I also had the same type of requirement for our application but we were little more focused on the performance side.
I ran both ConcurrentHashMap and map returned by Colecctions.synchronizedMap(); , under various types of load and launching multiple threads at a time using JMeter and I monitored them using JProfiler .After all these tests we came to conclusion that that map returned by Colecctions.synchronizedMap() was not as efficient in terms of performance in comaprison to ConcurrentHashMap.
I have written a post also on the same about my experience with both.
Thanks
Collections.synchronizedMap(new HashMap<K, V>);
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
This is the point of ConcurrentHashMap class. It protects your collection, when you have more than 1 thread.
In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.
Could you please clarify if we need to use explicit synchronization or locks for using ConcurrentLinkedQueue? I am specifically interested in knowing if sync calls are needed for following ConcurrentLinkedQueue methods.
add
clear
size
Possibly size is the only method which might require explicit sync since it's a not an atomic method but ConcurrentLinkedQueue java docs say that
"Beware that, unlike in most
collections, the size method is NOT a
constant-time operation. Because of
the asynchronous nature of these
queues, determining the current number
of elements requires a traversal of
the elements. "
which make me believe that though size call may be slow but it doesn't require any explicit sync call.
Thanks in advance ...
clear() is not atomic operation (it is implemented in AbstractQueue class), as Javadoc and source says: "This implementation repeatedly invokes poll until it returns null.". poll is atomic, but if you use offer while clear() is ongoing, you will add something in the middle of clearing, and clear() will delete it...
If you will use clear() you should use LinkedBlockingQueue instead of ConcurrentLinkedQueue.
You don't need any explicit synchronization or locks. As the docs state, it is a thread-safe collection. This mean each of these methods is correctly atomic (though as you point out, size() may be slow).
You should not and do not need to use explicit locking on any of those methods.
Yeah, you do not need to use explicit synchronization because this is a thread safe collection. Any concurrent access is allowed without worry
It is unnecessary to synchronize to preserve the internal structure of the queue. However it may be necessary to linearise other invariants of your structure.
For instance size() is fairly meaningless in any shared mutable container. All it can ever tell you is something about what it was the last time you asked, not what it is now, unless you stop the world and prevent concurrent modification. It is only useful for indicative monitoring purposes, you should never use it in your algorithm.
Similarly, clear() doesn't really mean much without some kind of external intervention. Clear what? The things that are in it at the time you call clear? In a concurrent structure answering that is a difficult if not impossible question.
So, you are better off using it as a simple thread-safe queue (only offering and polling) and steering clear of the others unless you externally lock.