I'm using a 1producer-1consumer design in my app using a SynchronousQueue. By now, I'm using it with the default constructor (fair=true). And I'm wondering about how "fair=false" would affect to the system (performance and specially concurrency behaviour).
Here what the docs tell:
SynchronousQueue
public SynchronousQueue()
Creates a SynchronousQueue with nonfair access policy.
SynchronousQueue
public SynchronousQueue(boolean fair)
Creates a SynchronousQueue with the specified fairness policy.
Parameters:
fair - if true, waiting threads contend in FIFO order for
access; otherwise the order is
unspecified.
Thanks in advance.
Your question contains the answer, more or less. Anyway, the short answer is that it will make no effective difference in your single-consumer case (with perhaps an infinitesimal performance decrease).
If you set the fair flag to true, then as you've pasted in your question, waiting threads contend in FIFO order for access. This places specific constraints on the scheduling of waiting threads as to how they are reawakened; an unfair system has no such constraints (and consequently the compiler/runtime is free to do things which may run a little faster).
Note that this only ever effects which thread is chosen to wake up out of the set of threads that are waiting; and with only one thread that will ever wait, the decision algorithm is irrelevant as it will always pick the same thread. The distinction comes when you have multiple threads waiting - is it acceptable for one individual thread to never get anything from the queue so long as other threads are able to handle the whole workload between them?
Wrt. performance, have you tried measuring this ? It'll most likely give you more of an indication as to what's going on than any answer here.
From the doc:
Fairness generally decreases
throughput but reduces variability and
avoids starvation
but it would be interesting to run a repeatable test and study how much that will affect you and your particular circumstances. As you have only one consumer thread I don't think it'll affect your application beyond (perhaps) a small (perhaps imperceptible?) performance decrease. But I would reiterate that you should try and measure it.
Related
Well the title basically says it all, with the small addition that I would really like to know when to use them. And it might be simple enough - I've read the documentation for them both, still can't tell the difference much.
There are answers like this here that basically say:
Yielding also was useful for busy waiting...
I can't agree much with them for the simple reason that ForkJoinPool uses Thread::yield internally and that is a pretty recent addition in the jdk world.
The thing that really bothers me is usages like this in jdk too (StampledLock::tryDecReaderOverflow):
else if ((LockSupport.nextSecondarySeed() & OVERFLOW_YIELD_RATE) == 0)
Thread.yield();
else
Thread.onSpinWait();
return 0L;
So it seems there are cases when one would be preferred over the other. And no, I don't have an actual example where I might need this - the only one I actually used was Thread::onSpinWait because 1) I happened to busy wait 2) the name is pretty much self explanatory that I should have used it in the busy spin.
When blocking a thread, there are a few strategies to choose from: spin, wait() / notify(), or a combination of both. Pure spinning on a variable is a very low latency strategy but it can starve other threads that are contending for CPU time. On the other hand, wait() / notify() will free up the CPU for other threads but can cost thousands of CPU cycles in latency when descheduling/scheduling threads.
So how can we avoid pure spinning as well as the overhead associated with descheduling and scheduling the blocked thread?
Thread.yield() is a hint to the thread scheduler to give up its time slice if another thread with equal or higher priority is ready. This avoids pure spinning but doesn't avoid the overhead of rescheduling the thread.
The latest addition is Thread.onSpinWait() which inserts architecture-specific instructions to hint the processor that a thread is in a spin loop. On x86, this is probably the PAUSE instruction, on aarch64, this is the YIELD instruction.
What's the use of these instructions? In a pure spin loop, the processor will speculatively execute the loop over and over again, filling up the pipeline. When the variable the thread is spinning on finally changes, all that speculative work will be thrown out due to memory order violation. What a waste!
A hint to the processor could prevent the pipeline from speculatively executing the spin loop until prior memory instructions are committed. In the context of SMT (hyperthreading), this is useful as the pipeline will be freed up for other hardware threads.
I have a dilemma regarding the use of multithreading in the application I am working on. I have a workflow in which the state of the object changes, which presents no issues for single-threaded operation. However, In order to improve the performance, I am planning to use multiple threads.
It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading? It seems like multiple threads won't produce any actual concurrency, so it wouldn't be any better than single threaded.
Is my analysis correct? If I am misunderstanding then would someone please clarify the concept?
The short answer: concurrency is hard. Real concurrency, with multiple concurrent writers, is really hard.
What you need to determine is what your actual consistency guarantees need to be. Does every reader need to be able to see every write, guaranteed? Then you'll be forced into linearizing all the threads somehow (e.g. using locks) -- your next effort should be to ensure you do as much work as possible outside of the lock, to keep the lock held for the shortest possible time.
One way to keep the lock held for the shortest possible time is to use a lock-free algorithm. Most lock-free algorithms are based on an atomic compare-and-set primitive, such as those provided by the java.util.concurrent.atomic package. These can be very high-performance, but designing a successful lock-free algorithm can be subtle. One simple kind of lock-free algorithm is to just build a new (immutable) state object and then atomically make it the "live" state, retrying in a loop if a different state was made live by another writer in the interim. (This approach is good enough for many applications, but it's vulnerable to livelock if you have too many writers.)
If you can get by with a looser consistency guarantee, then many other optimizations are possible. For example, you can use thread-local caches so that each thread sees its own view of the data and can be writing in parallel. Then you need to deal with the consequences of data being stale or inconsistent. Most techniques in this vein strive for eventual consistency: writes may not be visible to all readers immediately, but they are guaranteed to be visible to all readers eventually.
This is an active area of research, and a complete answer could fill a book (really, several books!). If you're just getting started in this area, I'd recommend you read Java Concurrency in Practice by Goetz et al, as it provides a good introduction to the subject and lots of practical advice about how to successfully build concurrent systems.
Your interpretation of the limits of multithreading and concurrency are correct. Since the state must be acquired and controlled by threads in order for them to perform work (and waiting when not working), you are essentially splitting the work of a single thread among multiple threads.
The best way to fix this is to adjust your program design to limit the size of the critical section. As we learned in my operating systems course with process synchronization,
only one critical section must be executing at any given time
The specific term critical section may not directly apply to Java concurrency, but it still illustrates the concept.
What does it mean to limit this critical section? For example, let's say you have a program managing a single bank account (unrealistic, but illustrates my point). If a lock on the account must be acquired by a thread for the balance to be updated, the basic option would be to have a single thread working on updating the balance at all times (without concurrency). The critical section would be the entire program. However, let's say there was also other logic to be executed, such as alerting other banks of the balance update. You could require the lock on the bank account state only while updating the balance, and not when alerting other banks, decreasing the size of critical section and allowing other threads to perform work by alerting other banks while one thread is updating the balance.
Please comment if this was unclear. Your seem to already understand the constraints of concurrency, but hopefully this will reveal possible steps towards implementing concurrency.
Your need is not totally clear but you guess well the limitations that multi threading may have.
Running parallel threads have a sense if some "relatively autonomous" tasks can be concurrently performed by distinct threads or group of threads.
If your scenario looks like : you start 5 threads and finally only a single thread is active while the others are waiting for a locking resource, using multithreading makes no sense and could even introduce an overhead because of cpu context switches.
I think that in your use case, the multithreading could be used for :
tasks that don't change the state
performing a task that changes the state if the task may be divided in multiple processing with a minimal set of instructions that may do profitable the multithreading use.
It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading?
The short answer is "it depends". It is rare that you have a multithreaded application that has no shared data. So sharing data, even if it needs a full lock, doesn't necessarily defeat the performance improvements when making a single threaded application be multi-threaded.
The big question is what the frequency that the state needs to be updated by each thread. If the threads read in the state, do their concurrent processing which takes time, and then alter the state at the end then you may see performance gains. On the other hand, if every step in the processing needs to somehow be coordinated between threads then they may all spend them time contending for the state object. Reducing this dependence on shared state will then improve your multi-threaded performance.
There are also more efficient ways to update a state variable which can avoid locks. Something like the following pattern is used a lot:
private AtomicReference<State> sharedState;
...
// inside a thread processing loop
// do the processing job
while (true) {
State existingState = sharedState.get();
// create a new state object from the existing and our processing
State newState = updateState(state);
// if the application state hasn't changed, then update it
if (sharedState.compareAndSet(existingState, newState)) {
break;
}
// otherwise we need to get the new existing state and try again
}
One way to handle state changes is to have a coordinating thread. It is the only thread which reads from the state and generates jobs. As jobs finish they put updates to the state on a BlockingQueue which is then read by the coordinating thread which updates the state in turn. Then the processing threads don't have to all be contending for access to the shared state.
Imagine it this way :
Synchronization is blocking
Concurrency is parallelization
You don't have to use synchronization. You can use an Atomic reference object as a wrapper for your shared mutable state.
You can also use stamped locks which improves concurrency by allowing for optimistic reads. You may also use Accumulators to write concurrent code. These features are part of Java 8.
Another way to prevent synchronization is to use immutable objects which can be shared and published freely and need no synchronization. I should add that you should use immutable objects anyway regardless of concurrency for that makes your state space of the object easier to reason about
I am reading comparison between Reentrant locks and synchronization blocks in java. I am going through the various resources on internet. One disadvantage that I discovered using Reentrant locks over synchronization blocks is that in previous one, you have to explicitly use try finally block to call unlock method on the acquired lock in the finally block, as it might be possible that your critical section of code might throw exception and it can cause big trouble, if thread doesn't releases the lock, While in the latter one, JVM itself takes care of releasing the lock in case of exception.
I am not very much convinced with this disadvantage, because it's not a big deal to use try finally block.As we have been using it for long time for ex(stream closing etc). Can somebody tell me some other disadvantages of Re-entrant locks over synchronized blocks?
A ReentrantLock is a different tool for a different use-case. While you can use both for most synchronization issues (that's what they are made for), the come with different advantages and disadvantages.
Synchronized is at most simple: you write synchronized and that's it. With modern JVMs it is reasonable fast, but has the drawback that it puts all threads that try to enter a synchronized block on hold, whether they actually need to or not. If you use synchronized too often, this can dramatically reduce the speed of multi-threading, worst case down to a point where single-threaded execution would have been faster.
As threading issues only occur if someone is writing while someone else is reading/writing the same data section, programs often run into the problem, that they could theoretically run without synchronization, because most threads just read, but there is this one occasional write, which enforces the synchronized block. This is what the Locks were made for: you have a finer control over when you actually synchronize.
The basic ReentrantLock allows - beside a fair parameter in the constructor - that you can decide when you release the lock, and you can do it at multiple points, so when it suits you best. Other variations of it like the ReentrantReadWriteLock allow you to have many unsynchronized reads, except if there is a write. The downside is that this is solved in Java code, which makes it noticeably slower than the "native" synchronized block. That said: you should only use it, if you know that the optimization gain using this lock is bigger than the loss.
Under normal situations you can only tell the difference in speed if you actually monitor it, by running a profiler to check the speed before and afterwards in a sophisticated way.
synchronized is almost always faster for low or minimal contention, because it allows JVM to perform some optimizations such as biased locking, lock elision and others. Here are some more details how it works:
Let's assume some monitor is held by thread A, and thread B requests this monitor. In that case monitor will change its state to inflated. Saying short, it means that all threads trying to acquire this monitor, will be put to wait set at OS level, which is quite expensive.
Now, if thread A released monitor before thread B requested it, so-called rebias operation will be performed by cheap (on modern CPU) compare-and-swap operation.
Let's take a look at ReentrantLock now. Each thread calls lock() or lockInterruptibly() method cause locking attempt done via CAS operation.
Conclusion: in low contention cases, prefer synchronized. In high contention cases, prefer ReentrantLock. For all cases between, it is hard to say for sure, consider performing benchmarks to find out which solution is faster.
Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).
I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.
Here's what I thought out so far;
A thread-safe collection holds the objects to be analyzed
As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.
And my questions are:
Is this a reasonable scheme, if not, how would you go about doing this?
In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?
Thanks,
You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.
An ExecutorService can help you manage your pool of threads.
If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:
The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
Some general things you need to consider in your quest for java concurrency:
Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
You need to carefully ensure atomicity of compound actions using synchronization and locks.
Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.
There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.
It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.
The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call
// create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);
The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads
Sounds reasonable, but it's not as trivial to implement as it may seem.
Maybe you should check the jsr166y project.
That's probably the easiest solution to your problem.
On Paul Tyma's presentation, I found an interview question:
What's harder, synchronizing 2 threads or synchronizing 1000 threads?
From my perspective, of course synchronizing 1000 threads is harder, but I can't think of a good reasons for that beside 'of course'. But since it's interview question, may be I'm wrong (interview questions have to be tricky, isn't it?).
You could make the case that synchronizing 2 threads correctly is in fact harder than doing it for 1000, because if you have a race condition, it will usually manifest very quickly with 1000 threads, but not so with only 2.
But on the other hand, synchronizing 1000 threads without running into lock contention issues is much harder than when there are only 2.
The real answer is "synchronizing threads is hard in various ways, period."
Synchronizing a thousand threads is just as easy as synchronizing two threads: just lock access to all important data.
Now, synchronizing a thousand threads with good performance is more difficult. If I were asking this question, I'd look for answers mentioning "the thundering herd problem", "lock contention", "lock implementation scalability", "avoiding spinlocks", etc.
In an interview, I would say that "exactly two threads" is a very useful special case of multi-threading. Things like starvation and priority inversion can occur with as few as three threads, but with only two threads priority inversion and starvation can never occur (well, starvation could occur if a thread released and reacquired a lock without letting the other thread start, but with three threads starvation can occur even if locks are grabbed instantly when available). Going from 2 threads to 3 is a bigger jump than going from 3 to 1,000.
Why would synchronizing 1000 threads be any harder than synchronizing 2 threads?
The only code that would be added would be to spawn the extra threads.
You wouldn't have to add any synchronization code (as long as you were doing everything correctly).
I think the answer is that after you have two threads synchronized all other 998 will also be synchronized
It depends what "is easier" means. The complexity of the design/locking mechanisms is roughly the same.
That being said, I think 1000 thread programs might be easier to debug. Vulnerable race-conditions have a higher probability of occurring and will probably be easier to replicate. A race condition in two threads might only appear once every 5 years if the moon is full and you're on vacation.
I have two answers.
CASE 1: Utilize existing resources: Synchronizing 2 threads is the same difficulty as synchronizing 1000 threads because existing are created for synchronizing an arbitrary number of threads.
CASE 2: Implementing From Scratch It seems obvious that if you had to implement a synchronization system from scratch, then it would be easier to build the 2 thread system.
Take reader-writer problem. With two threads, you can use mutual exclusion and it's done. With more threads, you have to write nontrivial code, since otherwise readers couldn't read simultaneously, or worse, they could starve the writers.
However, good synchronization code should work for any number of threads. In some cases, like mutual exclusion, you can add Java's synchronized keyword and it's as hard for 2 threads as for 1000.
In other words, if your program uses only 2 threads, you can take advantage of that and make assumptions that wouldn't be true with more threads. Obviously it's not a good practice, but it is possible.
It's one of those questions to which the only real answer is "it depends". In this case, it depends on what you're doing with them.
A scenario could be as simple as a single background worker thread that the foreground waits for while displaying a progress meter. Or it could spawn 1000 threads and simply wait for them all to finish before doing something else.
Alternatively, if as few as 2 threads are accessing shared resources, then the concepts are the same. You have to be very careful about concurrency issues and locking strategies whether it's 2 or 1000. Regardless of how many threads more than one, you can't guarantee that something else is not trying to simultaneously read or write to the same resource that you are.
I would agree with those stating "it depends". If the threads are identical, then there moght not be such a big difference between 2 and 1000 threads. However, if there are multiple resources which need mutually exclusive access (synchronized in Java terms), then the likelihood of deadlocks may increase with the increasing number of threads.
As I ran through your answers, I found several interesting toughts. I think, at the interview, this is more important than the answer: the conversion, the toughts.
it's equally hard. But synchronization over 2 threads will most likely perform better, since there are only 2 Threads concurring on one Lock instead of a thousand where there most likely be much more overhead due to locked resources.
Hope that helped
Objects are synchronized not threads. Creating a synchronized method or code block prevents multiple threads from executing the region at the same time - so it doesn't matter if there are 2, 1,000 or 1,000,000 threads.
In terms of performance, if you are expecting to double parallelism (half execution time) when you double the number of threads, then any synchronised region is going to be a bottle neck in terms of performance because it is essentially serial code which cannot be parallelised.
If you use a programming language like Scala with a design pattern of Actors then you do not have to synchronize anything. http://www.scala-lang.org/node/242
Another option (in Java) is to go with a mechanism of compare and swap/set http://en.wikipedia.org/wiki/Compare-and-swap so you do not have to synchronize any threads in that they are atomic variable that you are comparing and reading through (non blocking) and only block/wait upon write which can get some huge performance gains based on your solution