Concurrency design principles in practice - java

I have a Results object which is written to by several threads concurrently. However, each thread has a specific purpose and owns certain fields, so that no data is actually modified by more than one thread. The consumer of this data will not try to read it until all of the writer threads are done writing it. Because I know this to be true, there is no synchronization on the data writes and reads.
There is a RunningState object associated with this Results object which serves to coordinate this work. All of its methods are synchronized. When a thread is done with its work on this Results object, it calls done() on the RunningState object, which does the following: decrements a counter, checks if the counter has gone to 0 (indicating that all writers are done), and if so, puts this object on a concurrent queue. That queue is consumed by a ResultsStore which reads all of the fields and stores data in the database. Before reading any data, the ResultsStore calls RunningState.finalizeResult(), which is an empty method whose sole purpose is to synchronize on the RunningState object, to ensure that writes from all of the threads are visible to the reader.
Here are my concerns:
1) I believe that this will work correctly, but I feel like I'm violating good design principles to not synchronize on the data modifications to an object that is shared by multiple threads. However, if I were to add synchronization and/or split things up so each thread only saw the data it was responsible for, it would complicate the code. Anyone who modifies this area had better understand what's going on in any case or they're likely to break something, so from a maintenance standpoint I think the simpler code with good comments explaining how it works is a better way to go.
2) The fact that I need to call this do-nothing method seems like an indication of wrong design. Is it?
Opinions appreciated.

This seems mostly right, if a bit fragile (if you change the thread-local nature of one field, for instance, you may forget to synchronize it and end up with hard-to-trace data races).
The big area of concern is in memory visibility; I don't think you've established it. The empty finalizeResult() method may be synchronized, but if the writer threads didn't also synchronize on whatever it synchronizes on (presumably this?), there's no happens-before relationship. Remember, synchronization isn't absolute -- you synchronize relative to other threads that are also synchronized on the same object. Your do-nothing method will indeed do nothing, not even ensure any memory barrier.
You somehow need to establish a happens-before relationship between each thread doing its writes, and the thread that eventually reads. One way to do this without synchronization is via a volatile variable, or an AtomicInteger (or other atomic classes).
For instance, each writer thread can invoke counter.incrementAndGet(1) on the object, and the reading thread can then check that counter.get() == THE_CORRECT_VALUE. There's a happens-before relationship between a volatile/atomic field being written and it being read, which gives you the needed visibility.

Your design is sound, but it can be improved if you are using a true concurrent queue since a concurrent queue from the java.util.concurrent package already guarantees a happens before relationship between the thread putting an item into the queue, and the thread taking an item out, so this precludes needing to call finalizeResult() in the taking thread (so no need for that "do nothing" method call).
From java.util.concurrent package description:
The methods of all classes in java.util.concurrent and its subpackages
extend these guarantees to higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection happen-before actions subsequent to the access
or removal of that element from the collection in another thread.
The comments in another answer concerning using an AtomicInteger instead of synchronization are also wise (as using an AtomicInteger to do your thread counting will likely perform better than synchronization), just make sure to get the value of the count after the atomic decrement (e.g. decrementAndGet()) when comparing to 0 in order to avoid adding to the queue twice.

What you've described is indeed safe, but it also sounds, frankly, brittle and (as you note) maintenance could become an issue. Without sample code, it's really hard to tell what's really easiest to understand, so an already subjective question becomes frankly unanswerable. Could you ask a coworker for a code review? (Particularly one that's likely to have to deal with this pattern.) I'm going to trust you that this is indeed the simplest approach, but doing something like wrapping synchronized blocks around writes would increase safety now and in the future. That said, you obviously know your code better than I do.

Related

Threadsafe vs Synchronized

I'm new to java.
I'm little bit confused between Threadsafe and synchronized.
Thread safe means that a method or class instance can be used by multiple threads at the same time without any problems occurring.
Where as Synchronized means only one thread can operate at single time.
So how they are related to each other?
The definition of thread safety given in Java Concurrency in Practice is:
A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.
For example, a java.text.SimpleDateFormat object has internal mutable state that is modified when a method that parses or formats is called. If multiple threads call the methods of the same dateformat object, there is a chance a thread can modify the state needed by the other threads, with the result that the results obtained by some of the threads may be in error. The possibility of having internal state get corrupted causing bad output makes this class not threadsafe.
There are multiple ways of handling this problem. You can have every place in your application that needs a SimpleDateFormat object instantiate a new one every time it needs one, you can make a ThreadLocal holding a SimpleDateFormat object so that each thread of your program can access its own copy (so each thread only has to create one), you can use an alternative to SimpleDateFormat that doesn't keep state, or you can do locking using synchronized so that only one thread at a time can access the dateFormat object.
Locking is not necessarily the best approach, avoiding shared mutable state is best whenever possible. That's why in Java 8 they introduced a date formatter that doesn't keep mutable state.
The synchronized keyword is one way of restricting access to a method or block of code so that otherwise thread-unsafe data doesn't get corrupted. This keyword protects the method or block by requiring that a thread has to acquire exclusive access to a certain lock (the object instance, if synchronized is on an instance method, or the class instance, if synchronized is on a static method, or the specified lock if using a synchronized block) before it can enter the method or block, while providing memory visibility so that threads don't see stale data.
Thread safety is a desired behavior of the program, where the synchronized block helps you achieve that behavior. ​There are other methods of obtaining Thread safety e.g immutable class/objects. Hope this helps.
Thread safety: A thread safe program protects it's data from memory consistency errors. In a highly multi-threaded program, a thread safe program does not cause any side effects with multiple read/write operations from multiple threads on shared data (objects). Different threads can share and modify object data without consistency errors.
synchronized is one basic method of achieving ThreadSafe code.
Refer to below SE questions for more details:
What does 'synchronized' mean?
You can achieve thread safety by using advanced concurrency API. This documentation page provides good programming constructs to achieve thread safety.
Lock Objects support locking idioms that simplify many concurrent applications.
Concurrent Collections make it easier to manage large collections of data, and can greatly reduce the need for synchronization.
Atomic Variables have features that minimize synchronization and help avoid memory consistency errors.
ThreadLocalRandom (in JDK 7) provides efficient generation of pseudorandom numbers from multiple threads.
Refer to java.util.concurrent and java.util.concurrent.atomic packages too for other programming constructs.
Related SE question:
Synchronization vs Lock
Synchronized: only one thread can operate at same time.
Threadsafe: a method or class instance can be used by multiple threads at the same time without any problems occurring.
If you relate this question as, Why synchronized methods are thread safe? than you can get better idea.
As per the definition this appears to be confusive. But not,if you understand it analytically.
Synchronized means: sequentially one by one in an order,Not concurrently [Not at the same time].
synchronized method not allows to act another thread on it, While a thread is already working on it.This avoids concurrency.
example of synchronization: If you want to buy a movie ticket,and stand in a queue. you will get the ticket only after the person in front of you get the ticket.
Thread safe means: method becomes safe to be accessed by multiple threads without any problem at the same time.synchronized keyword is one of the way to achieve 'thread safe'. But Remember:Actually while multiple threads tries to access synchronized method they follow the order so becomes safe to access. Actually, Even they act at the same time, but cannot access the same resource(method/block) at the same time, because of synchronized behavior of the resource.
Because If a method becomes synchronized, so this is becomes safe to allow multiple threads to act on it, without any problem. Remember:: multiple threads "not act on it at the same time" hence we call synchronized methods thread safe.
Hope this helps to understand.
After patiently reading through a lot of answers and not being too technical at the same time, I could say something definite but close to what Nayak had already replied to fastcodejava above, which comes later on in my answer but look
synchronization is not even close to brute-forcing thread-safety; it's just making a piece of code (or method) safe and incorruptible for a single authorized thread by preventing it from being used by any other threads.
Thread safety is about how all threads accessing a certain element behave and get their desired results in the same way if they would have been sequential (or even not so), without any form of undesired corruption (sorry for the pleonasm) as in an ideal world.
One of the ways of achieving proximity to thread-safety would be using classes in java.util.concurrent.atomic.
Sad, that they don't have final methods though!
Nayak, when we declare a method as synchronized, all other calls to it from other threads are locked and can wait indefinitely. Java also provides other means of locking with Lock objects now.
You can also declare an object to be final or volatile to guarantee its availability to other concurrent threads.
ref: http://www.javamex.com/tutorials/threads/thread_safety.shtml
In practice, performance wise, Thread safe, Synchronised, non-thread safe and non-synchronised classes are ordered as:
Hashtable(slower) < Collections.SynchronizedMap < HashMap(fastest)

Synchronization in java - Can we set priority to a synchronized access in java?

Synchronization works by providing exclusive access to an object or method by putting a Synchronized keyword before a method name. What if I want to give higher precedence to one particular access if two or more accesses to a method occurs at the same time. Can we do that?
Or just may be I'm misunderstanding the concept of Synchronization in java. Please correct me.
I have other questions as well,
Under what requirements should we make method synchronized?
When to make method synchronized ? And when to make block synchronized ?
Also if we make a method synchronized will the class too be synchronized ? little confused here.
Please Help. Thanks.
No. Sadly Java synchronization and wait/notify appear to have been copied from the very poor example of Unix, rather than almost anywhere else where there would have been priority queues instead of thundering herds. When Per Brinch Hansen, author of monitors and Objective Pascal, saw Java, he commented 'clearly I have laboured in vain'.
There is a solution for almost everything you need in multi-threading and synchronization in the concurrent package, it however requires some thinking about what you do first. The synchronized, wait and notify constructs are like the most basic tools if you have just a very basic problem to solve, but realistically most advanced programs will (/should) never use those and instead rely on the tools available in the Concurrent package.
The way you think about threads is slightly wrong. There is no such thing as a more important thread, there is only a more important task. This is why Java clearly distinguishes between Threads, Runnables and Callables.
Synchronization is a concept to prevent more than one thread from entering a specific part of code, which is - again - the most basic concept of avoiding threading issues. Those issues happen if more than one thread accesses some data, where at least one of those multiple threads is trying to modify that data. Think about an array that is read by Thread A, while it is written by Thread B at the same time. Eventually Thread B will write the cell that Thread A is just about to read. Now as the order of execution of threads is undefined, it is as well undefined whether Thread A will read the old value, the new value or something messed up in between.
A synchronized "lock" around this access is a very brute way of ensuring that this will never happen, more sophisticated tools are available in the concurrent package like the CopyOnWriteArray, that seamlessly handles the above issue by creating a copy for the writing thread, so neither Thread A nor Thread B needs to wait. Other tools are available for other solutions and problems.
If you dig a bit into the available tools you soon learn that they are highly sophisticated, and the difficulties using them is usually located with the programmer and not with the tools, because countless hours of thinking, improving and testing has been gone into those.
Edit: to clarify a bit why the importance is on the task even though you set it on the thread:
Imagine a street with 3 lanes that narrows to 1 lane (synchronized block) and 5 cars (threads) are arriving. Let's further assume there is one person (the car scheduler) that has to define which cars get the first row and which ones get the other rows. As there is only 1 lane, he can at best assign 1 cars to the first row and the others need to come behind. If all cars look the same, he will most likely assign the order more or less randomly, while a car already in front might stay in front more likely, just because it would be to troublesome to move those cars around.
Now lets say one car has a sign on top "President of the USA inside", so the scheduler will most likely give that car priority in his decision. But even though the sign is on the car, the reason for his decision is not the importance of the car (thread), but the importance on the people inside (task). So the sign is nothing but an information for the scheduler, that this car transports more important people. Whether or not this is true however, the scheduler can't say (at least not without inspection), so he just has to trust the sign on the car.
Now if in another scenario all 5 cars have the "President inside" sign, the scheduler doesn't have any way to decide which one goes first, and he is in the same situation again as he was with all the cars having no sign at all.
Well in case of synchronized, the access is random if multiple threads are waiting for the lock. But in case you need first-come first-serve basis: Then you can probably use `ReentrantLock(fairness). This is what the api says:
The constructor for this class accepts an optional fairness parameter.
When set true, under contention, locks favor granting access to the
longest-waiting thread.
Else if you wish to give access based on some other factor, then I guess it shouldn;t be complicated to build one. Have a class that when call's lock gets blocked if some other thread is executing. When called unlock it will unblock a thread based on whatever algorithm you wish to.
There's no such thing as "priority" among synchronized methods/blocks or accesses to them. If some other thread is already holding the object's monitor (i.e. if another synchronized method or synchronized (this) {} block is in progress and hasn't relinquished the monitor by a call to this.wait()), all other threads will have to wait until it's done.
There are classes in the java.util.concurrent package that might be able to help you if used correctly, such as priority queues. Full guidance on how to use them correctly is probably beyond the scope of this question - you should probably read a decent tutorial to start with.

Thread safety within Java

So, while working on something that was having locking issues, a question came to me. Do objects that only can be accessed from a single thread require locks or synchronization at all?
For example, given Thread1, Thread2, and Thread3, along with Buffer1, Buffer2, Buffer3, where each buffer is instanced as a thread is created, meaning that Thread1 will only ever access Buffer1, and the same for Thread2 and Buffer2, along with Thread3 and Buffer3. Thread1 will never touch Buffer2 or Buffer3. While adding/removing/modifying bytes in the stream, are locks needed?
No, You wont need any locks in this case. Locking and synchronization is only required when any resource is being shared between multiple threads.
If you go ahead and add synchronization on the private instance of that buffer then still it wont make any difference as there will be no thread waiting to acquire locks, The only one locking and releasing the buffer will be the owner thread.
1. When more than one thread try to access an object, then locking becomes necessary.
2. Moreover classes when developed needs to be thread safe, if concurrent access by threads is possible.
3. A class is said to be thread safe, it if behaves correctly in the presence of interleaving and scheduling of the underlying OS , without any synchronization mechanism from the client.
4. Locking the resources can cause overhead, prevents concurrent access, and bottle neck situations.
Only when two or more threads need to access a shared object you need to worry about locking.
No. This strategy for ensuring thread-safety is generally referred to as confinement.
Confinement relies on encapsulation techniques to ensure that multiple threads cannot access an object. "Concurrent Programming in Java" by Doug Lea has good chapter on the details of confinement and its strengths and weaknesses compared to other exclusion techniques.
Paraphrasing from Lea, in general there are 4 conditions needed for confinement of a reference r, to an object x, within a method m:
m cannot pass r as an argument to another method.
m cannot pass r as a return value.
m cannot record r in a field (instance or static) that is accessible from another thread.
m cannot may not let any other references escape (via 1-3) that may be traversed to r.
From what I remember from my studies, if you are using a private buffer for every thread you should not worry about locking it to avoid concurrent access, since you don't have any.
If no-one is reading the buffer apart from the creator, it could do whatever he wants on it without worrying that someone else is reading or writing it. so you should be fine
But you have to remember that a thread can be interrupted at any time, so your internal buffer can be in a inconsistent state. (this shouldn't be a problem since you are accessing only sequentially from the same thread)
Locks are not needed unless threads are concurrently using the same data structure.
Hence if different data structures are used by each thread, your code is guaranteed to be thread safe.
Incidentally, this is one of the main reasons why the key Java collection classes like java.util.ArrayList are not thread safe: making them thread safe would add a performance overhead which you shouldn't have to pay for if you don't need, and in a lot of cases you don't need it because you can ensure in some other way that only one thread accesses the ArrayList at once.

What are the "Conventional Techniques" to avoid deadlock?

I saw the below statement in Java Specifications.
Programs where threads hold (directly
or indirectly) locks on multiple
objects should use conventional
techniques for deadlock avoidance,
creating higher-level locking
primitives that don't deadlock, if
necessary.
So, What are the "Conventional Techniques" to follow to avoid deadlock? I'm not pretty clear with this (not understood properly, explanation needed).
The most common technique is to acquire resources (locks) in some consistent well-defined order.
The following article by Brian Goetz might be helpful: http://www.javaworld.com/javaworld/jw-10-2001/jw-1012-deadlock.html
It's pretty old, but explains the issues well.
As a somewhat absract suggestion, an answer to this might be "Have a plan for handling locks and stick to it".
The danger of locking is where, in short, one thread holds lock A and is trying to get lock B, while another thread holds lock B and is trying to get lock A. As noted by another answer, the clasic way to avoid this is to get locks in a consistent order. However, a good discipline is to minimize the amount of work that your code does with a lock held. Any code that calls another function with a lock held is a potential problem: what if that other function tries to get another lock? What if someone else later modifies that function to get a lock? Try to form a clear pattern of what functions can be called with locks held, and what cannot, and make sure the comments in your code make this all clear.
Don't do locking! Seriously. We get immense performance (100k's of transactions at sub-millisecond latency) at my work by keeping all our business logic single threaded.

When do we make a call to use between Synchronised method and Synchronised Block

Can any one please share their experience on
"When do we make a call to use between Synchronised method and Synchronised Block"
Any Performance Issues?
When do we make a call to use between Synchronised method and Synchronised Block.
If you want to lock for the duration of a method call AND you want to lock on this (or the current class, for a static method), then synchronized methods are the right solution.
If you are locking on something else (e.g. a private lock object or some internal data structure), then the synchronized block approach is better.
Similarly, if only some of the code in a procedure call needs to be done holding a lock, it is better to use a synchronized block and put just that code in the block.
Any Performance Issues?
None, apart from the general principal that it is a bad idea to hold a lock longer than you need to. (The longer a lock is held, the more likely it is that other threads will need to wait.)
I'm not sure what you mean by "synchronized statement". You use the synchronized keyword to either denote that a method is synchronized or (as you mention) a block of code within it.
I typically favour keeping methods small and manageable and therefore labelling the entire method as synchronized (when required). This has the advantage that it is immediately evident to a user of the class as to which methods represent critical sections. It also allows you as a programmer to more easily determine whether a class is thread-safe, namely: Are all public methods that access mutable data labelled as synchronized?
There is no performance difference between the approaches as both require obtaining a lock.
Always try to use Synchronized block if possible, for any case its not possible then go for Synchronized method. Will be a lot of performance improvements depend on the no. of lines in the Synchronized method. As no. of lines increases,performance will degrade.
I tend to use synchronized methods when it is the public interface that requires synchronization (c.f. synchronized collections) and synchronized blocks for class internal synchronization, such as access to a shared resource which needs to be thread safe.
There is also a readability issue. I find method level synchronization to be neater and more obvious as the code is not cluttered with lock management.
As for performance, I'm not aware of any particular difference in the behaviour of either approach. I think it is more important to avoid excessive synchronization, so a method which only needs access to the shared resource 1 in 10 calls should use block level rather than method level synchronization.
My approach to any given scenario is usually based on a mix of these factors, modified by previous experience.
In terms of overall performance, there is no difference between having a synchronized block or method. The issue is really in terms of coding practices. Synchronizing a method seems like an easy thing to do however, when working with multiple people on a project, it becomes possible for someone to alter a simple light method that someone else synchronized into a heavy operation one. In fact, one really good example (from personal experience) of where you can get into trouble is when you are using a dependency injection framework and you have methods in a service object that interact with data access objects (daos) that are synchronized. The expectation is that the daos perform quickly so the locks are only held briefly. Someone else comes along and either alters the daos or creates and injects new ones that are much slower and suddenly things start to really slow down because the service object has synchronized interaction with it.
I don't think synchronized blocks can get around that issue that I described above however, at least with synchronized blocks, they are harder to miss than a declaration in the method.

Categories