On Paul Tyma's presentation, I found an interview question:
What's harder, synchronizing 2 threads or synchronizing 1000 threads?
From my perspective, of course synchronizing 1000 threads is harder, but I can't think of a good reasons for that beside 'of course'. But since it's interview question, may be I'm wrong (interview questions have to be tricky, isn't it?).
You could make the case that synchronizing 2 threads correctly is in fact harder than doing it for 1000, because if you have a race condition, it will usually manifest very quickly with 1000 threads, but not so with only 2.
But on the other hand, synchronizing 1000 threads without running into lock contention issues is much harder than when there are only 2.
The real answer is "synchronizing threads is hard in various ways, period."
Synchronizing a thousand threads is just as easy as synchronizing two threads: just lock access to all important data.
Now, synchronizing a thousand threads with good performance is more difficult. If I were asking this question, I'd look for answers mentioning "the thundering herd problem", "lock contention", "lock implementation scalability", "avoiding spinlocks", etc.
In an interview, I would say that "exactly two threads" is a very useful special case of multi-threading. Things like starvation and priority inversion can occur with as few as three threads, but with only two threads priority inversion and starvation can never occur (well, starvation could occur if a thread released and reacquired a lock without letting the other thread start, but with three threads starvation can occur even if locks are grabbed instantly when available). Going from 2 threads to 3 is a bigger jump than going from 3 to 1,000.
Why would synchronizing 1000 threads be any harder than synchronizing 2 threads?
The only code that would be added would be to spawn the extra threads.
You wouldn't have to add any synchronization code (as long as you were doing everything correctly).
I think the answer is that after you have two threads synchronized all other 998 will also be synchronized
It depends what "is easier" means. The complexity of the design/locking mechanisms is roughly the same.
That being said, I think 1000 thread programs might be easier to debug. Vulnerable race-conditions have a higher probability of occurring and will probably be easier to replicate. A race condition in two threads might only appear once every 5 years if the moon is full and you're on vacation.
I have two answers.
CASE 1: Utilize existing resources: Synchronizing 2 threads is the same difficulty as synchronizing 1000 threads because existing are created for synchronizing an arbitrary number of threads.
CASE 2: Implementing From Scratch It seems obvious that if you had to implement a synchronization system from scratch, then it would be easier to build the 2 thread system.
Take reader-writer problem. With two threads, you can use mutual exclusion and it's done. With more threads, you have to write nontrivial code, since otherwise readers couldn't read simultaneously, or worse, they could starve the writers.
However, good synchronization code should work for any number of threads. In some cases, like mutual exclusion, you can add Java's synchronized keyword and it's as hard for 2 threads as for 1000.
In other words, if your program uses only 2 threads, you can take advantage of that and make assumptions that wouldn't be true with more threads. Obviously it's not a good practice, but it is possible.
It's one of those questions to which the only real answer is "it depends". In this case, it depends on what you're doing with them.
A scenario could be as simple as a single background worker thread that the foreground waits for while displaying a progress meter. Or it could spawn 1000 threads and simply wait for them all to finish before doing something else.
Alternatively, if as few as 2 threads are accessing shared resources, then the concepts are the same. You have to be very careful about concurrency issues and locking strategies whether it's 2 or 1000. Regardless of how many threads more than one, you can't guarantee that something else is not trying to simultaneously read or write to the same resource that you are.
I would agree with those stating "it depends". If the threads are identical, then there moght not be such a big difference between 2 and 1000 threads. However, if there are multiple resources which need mutually exclusive access (synchronized in Java terms), then the likelihood of deadlocks may increase with the increasing number of threads.
As I ran through your answers, I found several interesting toughts. I think, at the interview, this is more important than the answer: the conversion, the toughts.
it's equally hard. But synchronization over 2 threads will most likely perform better, since there are only 2 Threads concurring on one Lock instead of a thousand where there most likely be much more overhead due to locked resources.
Hope that helped
Objects are synchronized not threads. Creating a synchronized method or code block prevents multiple threads from executing the region at the same time - so it doesn't matter if there are 2, 1,000 or 1,000,000 threads.
In terms of performance, if you are expecting to double parallelism (half execution time) when you double the number of threads, then any synchronised region is going to be a bottle neck in terms of performance because it is essentially serial code which cannot be parallelised.
If you use a programming language like Scala with a design pattern of Actors then you do not have to synchronize anything. http://www.scala-lang.org/node/242
Another option (in Java) is to go with a mechanism of compare and swap/set http://en.wikipedia.org/wiki/Compare-and-swap so you do not have to synchronize any threads in that they are atomic variable that you are comparing and reading through (non blocking) and only block/wait upon write which can get some huge performance gains based on your solution
Related
I have a dilemma regarding the use of multithreading in the application I am working on. I have a workflow in which the state of the object changes, which presents no issues for single-threaded operation. However, In order to improve the performance, I am planning to use multiple threads.
It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading? It seems like multiple threads won't produce any actual concurrency, so it wouldn't be any better than single threaded.
Is my analysis correct? If I am misunderstanding then would someone please clarify the concept?
The short answer: concurrency is hard. Real concurrency, with multiple concurrent writers, is really hard.
What you need to determine is what your actual consistency guarantees need to be. Does every reader need to be able to see every write, guaranteed? Then you'll be forced into linearizing all the threads somehow (e.g. using locks) -- your next effort should be to ensure you do as much work as possible outside of the lock, to keep the lock held for the shortest possible time.
One way to keep the lock held for the shortest possible time is to use a lock-free algorithm. Most lock-free algorithms are based on an atomic compare-and-set primitive, such as those provided by the java.util.concurrent.atomic package. These can be very high-performance, but designing a successful lock-free algorithm can be subtle. One simple kind of lock-free algorithm is to just build a new (immutable) state object and then atomically make it the "live" state, retrying in a loop if a different state was made live by another writer in the interim. (This approach is good enough for many applications, but it's vulnerable to livelock if you have too many writers.)
If you can get by with a looser consistency guarantee, then many other optimizations are possible. For example, you can use thread-local caches so that each thread sees its own view of the data and can be writing in parallel. Then you need to deal with the consequences of data being stale or inconsistent. Most techniques in this vein strive for eventual consistency: writes may not be visible to all readers immediately, but they are guaranteed to be visible to all readers eventually.
This is an active area of research, and a complete answer could fill a book (really, several books!). If you're just getting started in this area, I'd recommend you read Java Concurrency in Practice by Goetz et al, as it provides a good introduction to the subject and lots of practical advice about how to successfully build concurrent systems.
Your interpretation of the limits of multithreading and concurrency are correct. Since the state must be acquired and controlled by threads in order for them to perform work (and waiting when not working), you are essentially splitting the work of a single thread among multiple threads.
The best way to fix this is to adjust your program design to limit the size of the critical section. As we learned in my operating systems course with process synchronization,
only one critical section must be executing at any given time
The specific term critical section may not directly apply to Java concurrency, but it still illustrates the concept.
What does it mean to limit this critical section? For example, let's say you have a program managing a single bank account (unrealistic, but illustrates my point). If a lock on the account must be acquired by a thread for the balance to be updated, the basic option would be to have a single thread working on updating the balance at all times (without concurrency). The critical section would be the entire program. However, let's say there was also other logic to be executed, such as alerting other banks of the balance update. You could require the lock on the bank account state only while updating the balance, and not when alerting other banks, decreasing the size of critical section and allowing other threads to perform work by alerting other banks while one thread is updating the balance.
Please comment if this was unclear. Your seem to already understand the constraints of concurrency, but hopefully this will reveal possible steps towards implementing concurrency.
Your need is not totally clear but you guess well the limitations that multi threading may have.
Running parallel threads have a sense if some "relatively autonomous" tasks can be concurrently performed by distinct threads or group of threads.
If your scenario looks like : you start 5 threads and finally only a single thread is active while the others are waiting for a locking resource, using multithreading makes no sense and could even introduce an overhead because of cpu context switches.
I think that in your use case, the multithreading could be used for :
tasks that don't change the state
performing a task that changes the state if the task may be divided in multiple processing with a minimal set of instructions that may do profitable the multithreading use.
It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading?
The short answer is "it depends". It is rare that you have a multithreaded application that has no shared data. So sharing data, even if it needs a full lock, doesn't necessarily defeat the performance improvements when making a single threaded application be multi-threaded.
The big question is what the frequency that the state needs to be updated by each thread. If the threads read in the state, do their concurrent processing which takes time, and then alter the state at the end then you may see performance gains. On the other hand, if every step in the processing needs to somehow be coordinated between threads then they may all spend them time contending for the state object. Reducing this dependence on shared state will then improve your multi-threaded performance.
There are also more efficient ways to update a state variable which can avoid locks. Something like the following pattern is used a lot:
private AtomicReference<State> sharedState;
...
// inside a thread processing loop
// do the processing job
while (true) {
State existingState = sharedState.get();
// create a new state object from the existing and our processing
State newState = updateState(state);
// if the application state hasn't changed, then update it
if (sharedState.compareAndSet(existingState, newState)) {
break;
}
// otherwise we need to get the new existing state and try again
}
One way to handle state changes is to have a coordinating thread. It is the only thread which reads from the state and generates jobs. As jobs finish they put updates to the state on a BlockingQueue which is then read by the coordinating thread which updates the state in turn. Then the processing threads don't have to all be contending for access to the shared state.
Imagine it this way :
Synchronization is blocking
Concurrency is parallelization
You don't have to use synchronization. You can use an Atomic reference object as a wrapper for your shared mutable state.
You can also use stamped locks which improves concurrency by allowing for optimistic reads. You may also use Accumulators to write concurrent code. These features are part of Java 8.
Another way to prevent synchronization is to use immutable objects which can be shared and published freely and need no synchronization. I should add that you should use immutable objects anyway regardless of concurrency for that makes your state space of the object easier to reason about
ExecutorService.newFixedThreadPool() Is there any real time scenarios where we prefer to have a fixed set of active threads even when there is nothing to process?
In practice, having a fixed number of threads is always better than spawning a new thread every time a task has to be processed.
Threads are expensive to create and maintain, and not being able to create the number of active threads in your application, can end up actually harming the performance. Fixed thread pools reuse already created threads and this removes the thread creation overhead.
When you keep a fixed number of threads, you can predict your memory and CPU usage better, at least IMHO.
Of course, there is no recipe that fits all use cases and, before choosing what paradigm is best for your particular situation, you should do rigorous testing and measurements. Experimenting with different configurations will give you a better understanding and point you to the best solution.
I develop an application, and, at a given moment, I start about 10000 threads to stress-test a database. I want to synchronize this in the following way: I want to read all data from a table in all the threads, then I want all the treads to wait for the other threads to stop reading. After all threads finished reading, I delete all records from that table, then I want all the threads to insert the data read previously. Now, how do I synchronize my threads, to wait for each other in the before mentioned order? What is the best solution?
Use CyclicBarrier:
CyclicBarriers are useful in programs involving a fixed sized party of threads that must occasionally wait for each other.
The example in the JavaDoc quoted above solves the exact same problem.
10 thousand threads? Make sure you are testing your database, not your CPU and memory (context switching overhead might be tremendous). Have you considered jmeter in distributed mode?
This may not exactly be what you looking for, but you could give it a look CountDownLatch
A synchronization aid that allows one or more threads to wait until a
set of operations being performed in other threads completes.
This were the only two questions I couldn't answer in the interview I got rejected from last night.
Q: When should you use multithreading?
A: "Your question is very broad. There are few non-trivial systems where the functionality can be met simply, quickly and reliably with only one thread. For example: [pick out a typical system that the target company sells and pick out a couple aspects of its function that would be better threaded off - heavy CPU, comms, multi-user - just pick out something likely & explain].
Q: Would multithreading be beneficial if the different threads execute mutually independent tasks?
A: "Depends on what you mean by 'executing tasks'. Multithreading would surely be beneficial if the threads process mutually independent data in a concurrent fashion - it reduces requirements for locks and probabilty of deadlocks increases in a super-linear fashion with the number of locks. OTOH, there is no issue with threads executing the same code, this is safe and very common."
You should use multithreading when you want to perform heavy operations without "blocking" the flow.
Example in UIs where you do a heavy processing in a background thread but the UI is still active.
If the threads execute mutually exclusive tasks it is the best since there is no overhead for synchronization among threads needed
Multithreading is a way to introduce parallelness in your program. In any case if there can be parallel paths (parts which do not depend on result from a other part) in your program, use can make use of it.
Specially with all these multiple core machines now days, this is a feature which one should exploit.
Some examples would be processing of large data where you can divide it in chunks and get it done in multiple threads, file processing, long running I/O works like network data transfers etc.
To your second question, it would be best if the tasks are mutually independent - reasons
no shared data means no contentions
no need for any ordered processing (dependency), so each thread can work when have resources
more easy to implement
You should definitely use multithreading in GUI applications when you invoke time consuming tasks from the main event loop. Same applies for server application that might block while doing the I/O.
For the second question, it is usually yes when you have machine with multiple CPU cores. In this case these independent tasks can be executed in parallel.
You can use multithreading if the tasks can be broken down which can be executed in parallel. Like produce and consume , Validate and save , Read and Validate.
For the second question , Yes, it is beneficial for make a program into Multi threading if they are executing independent tasks.
This article gives very good reasons:
https://marcja.wordpress.com/2007/04/06/four-reasons-to-use-multithreading/
To summarize, the reasons are:
Keep your program responsive.
Make better use of your CPU. CPU may be blocked by IO or other stuff. While waiting, why not letting other threads use it
Multiple threads can be scheduled to multiple CPU cores
Some problems are naturally to be solved by multi-threading. Such solution can simplify your code.
In general, multithreading is used in cases where execution time is throttled/bottlenecked by the CPU as opposed to other areas such as IO. The second question is really quite subjective to the circumstance. For example if they are mutually independent but both do heavy IO, you might not necessarily get a large gain.
Multithreading is used when we can divide our job into several independent parts. For example, suppose you have to execute a complex database query for fetching data and if you can divide that query into sereval independent queries, then it will be better if you assign a thread to each query and run all in parallel.
In that way, your final result output will be faster.
Again, this is an example when you have the leverage to run mutliple database queries.
And to answer your second question, it is better to have threads for independent tasks. Otherwise, you will have to take care of synchronization, global variables, etc.
When should you use multithreading?
Multithreading is a process of executing multiple threads simultaneously. You should use multithreading when you can perform multiple operations together so that it can save time.
Would multithreading be beneficial if the different threads execute mutually independent tasks?
it is usually yes. Multithreading would usually be beneficial if the different threads execute mutually independent tasks so that it doesn't affect other threads if exception occur in a single thread.
I'm using a 1producer-1consumer design in my app using a SynchronousQueue. By now, I'm using it with the default constructor (fair=true). And I'm wondering about how "fair=false" would affect to the system (performance and specially concurrency behaviour).
Here what the docs tell:
SynchronousQueue
public SynchronousQueue()
Creates a SynchronousQueue with nonfair access policy.
SynchronousQueue
public SynchronousQueue(boolean fair)
Creates a SynchronousQueue with the specified fairness policy.
Parameters:
fair - if true, waiting threads contend in FIFO order for
access; otherwise the order is
unspecified.
Thanks in advance.
Your question contains the answer, more or less. Anyway, the short answer is that it will make no effective difference in your single-consumer case (with perhaps an infinitesimal performance decrease).
If you set the fair flag to true, then as you've pasted in your question, waiting threads contend in FIFO order for access. This places specific constraints on the scheduling of waiting threads as to how they are reawakened; an unfair system has no such constraints (and consequently the compiler/runtime is free to do things which may run a little faster).
Note that this only ever effects which thread is chosen to wake up out of the set of threads that are waiting; and with only one thread that will ever wait, the decision algorithm is irrelevant as it will always pick the same thread. The distinction comes when you have multiple threads waiting - is it acceptable for one individual thread to never get anything from the queue so long as other threads are able to handle the whole workload between them?
Wrt. performance, have you tried measuring this ? It'll most likely give you more of an indication as to what's going on than any answer here.
From the doc:
Fairness generally decreases
throughput but reduces variability and
avoids starvation
but it would be interesting to run a repeatable test and study how much that will affect you and your particular circumstances. As you have only one consumer thread I don't think it'll affect your application beyond (perhaps) a small (perhaps imperceptible?) performance decrease. But I would reiterate that you should try and measure it.