I am trying understand a rationale behind biased locking and making it a default. Since reading this blog post, namely:
"Since most objects are locked by at most one thread during their lifetime, we allow that thread to bias an object toward itself"
I am perplexed... Why would anyone design a synchronized set of methods to be accessed by one thread only? In most cases, people devise certain building blocks specifically for the multi-threaded use-case, and not a single-threaded one. In such cases, EVERY lock aquisition by a thread which is not biased is at the cost of a safepoint, which is a huge overhead! Could someone please help me understand what I am missing in this picture?
The reason is probably that there are a decent number of libraries and classes that are designed to be thread safe but that are still useful outside of such circumstances. This is especially true of a number of classes that predate the Collections framework. Vector and it's subclasses is a good example. If you also consider that most java programs are not multi threaded it is in most cases an overall improvement to use a biased locking scheme, this is especially true of legacy code where the use of such Classes is all to common.
You are correct in a way, but there are cases when this is needed, as Holger very correctly points in his comment. There is so-called, the grace period when no biased-locking is attempted at all, so it's not like this will happen all the time. As I last remember looking at the code, it was 5 seconds. To prove this you would need a library that could inspect Java Object's header (jol comes to my mind), since biased locking is hold inside mark word. So only after 5 seconds will the object that held a lock before will be biased towards the same lock.
EDIT
I wanted to write a test for this, but seems like there is one already! Here is the link for it
Related
Conceptually,
Mutex
Reader's/Writer lock (Better form of Mutex)
Semaphore
Condition Variable
are used as four major synchronization mechanisms, which are purely lock based. Different programming language have different terms/jargon for these 4 mechanisms. POSIX pthread package is one such example for such implementation.
First two get implemented using spin lock(Busy-wait).
Last two get implemented using sleep lock.
Lock based synchronisation is expensive in terms of cpu cycles.
But, I learnt that java.util.concurrent packages do not use lock(sleep/spin) based mechanism to implement synchronisation.
My question:
What is the mechanism used by java concurrent package to implement synchronization? Because spin lock is cpu intensive and sleep lock is more costlier than spin lock due to frequent context switch.
That very much depends on what parts of the java.util.concurrent package you use (and to a lesser degree on the implementation). E.g. the LinkedBlockingQueue as of Java 1.7 uses both ReentrantLocks and Conditions, while e.g. the java.util.concurrent.atomic classes or the CopyOnWrite* classes rely on volatiles + native methods (that insert the appropriate memory barriers).
The actual native implementation of Locks, Semaphores, etc. also varies between architectures and implementations.
Edit: If you really care about performance, you should measure performance of your specific workload. There are folks far more clever than me like A. Shipilev (whose site is a trove of information on this topic) on the JVM team, who do this and care deeply about JVM performance.
This question is best answered by looking at the source code for java.util.concurrent. The precise implementation depends on the class you are referring to.
For example, many of the implementations make use of volatile data and sun.misc.Unsafe, which defers e.g. compare-and-swap to native operations. Semaphore (via AbstractQueuedSynchronizer) makes heavy use of this.
You can browse through the other objects there (use the navigation pane on the left of that site) to take a look at the other synchronization objects and how they are implemented.
The short answer is no.
Concurrent collections are not implemented with locks compared to synchronized collections.
I myself had the exact same issue as what is asked, wanted to always understand the details. What helped me ultimately to fully understand what's going on under the hood was to read the following chapter in java concurrency in practice:
5.1 Synchronized collections
5.2 Concurrent collections
The idea is based on doing atomic operations, which basically requires no lock, since they are atomic.
The OP's question and the comment exchanges appear to contain quite a bit of confusion. I will avoid answering the literal questions and instead try to give an overview.
Why does java.util.concurrent become today's recommended practice?
Because it encourages good application coding patterns. The potential performance gain (which may or may not materialize) is a bonus, but even if there is no performance gain, java.util.concurrent is still recommended because it helps people write correct code. Code that is fast but is flawed has no value.
How does java.util.concurrent encourage good coding patterns?
In many ways. I will just list a few.
(Disclaimer: I come from a C# background and do not have comprehensive knowledge of Java's concurrent package; though a lot of similarities exist between the Java and C# counterparts.)
Concurrent data collections simplifies code.
Often, we use locking when we need to access and modify a data structure from different threads.
A typical operation involves:
Lock (blocked until succeed),
Read and write values,
Unlock.
Concurrent data collections simplify this by rolling all these operations into a single function call. The result is:
Simpler code on the caller's side,
Possibly more optimized, because the library implementation can possibly use a different (and more efficient) locking or lock-free mechanism than the JVM object monitor.
Avoids a common pitfall of race condition: Time of check to time of use.
Two broad categories of concurrent data collection classes
There are two flavors of concurrent data collection classes. They are designed for very different application needs. To benefit from the "good coding patterns", you must know which one to use given each situation.
Non-blocking concurrent data collections
These classes can guarantee a response (returning from a method call) in a deterministic amount of time - whether the operation succeeds or fails. It never deadlocks or wait forever.
Blocking concurrent data collections
These classes make use of JVM and OS synchronization features to link together data operations with thread control.
As you have mentioned, they use sleep locks. If a blocking operation on a blocking concurrent data collection is not satisfied immediately, the thread requesting this operation goes into sleep, and will be waken up when the operation is satisfied.
There is also a hybrid: blocking concurrent data collections that allow one to do a quick (non-blocking) check to see if the operation might succeed. This quick check can suffer from the "Time of check to time of use" race condition, but if used correctly it can be useful to some algorithms.
Before the java.util.concurrent package becomes available, programmers often had to code their own poor-man's alternatives. Very often, these poor alternatives have hidden bugs.
Besides data collections?
Callable, Future, and Executor are very useful for concurrent processing. One could say that these patterns offer something remarkably different from the imperative programming paradigm.
Instead of specifying the exact order of execution of a number of tasks, the application can now:
Callable allows packaging "units of work" with the data that will be worked on,
Future provides a way for different units of work to express their order dependencies - which work unit must be completed ahead of another work unit, etc.
In other words, if two different Callable instances don't indicate any order dependencies, then they can potentially be executed simultaneously, if the machine is capable of parallel execution.
Executor specifies the policies (constraints) and strategies on how these units of work will be executed.
One big thing which was reportedly missing from the original java.util.concurrent is the ability to schedule a new Callable upon the successful completion of a Future when it is submitted to an Executor. There are proposals calling for a ListenableFuture.
(In C#, the similar unit-of-work composability is known as Task.WhenAll and Task.WhenAny. Together they make it possible to express many well-known multi-threading execution patterns without having to explicitly create and destroy threads with own code.)
I am learning multithreading, and I have a little question.
When I am sharing some variable between threads (ArrayList, or something other like double, float), should it be lcoked by the same object in read/write? I mean, when 1 thread is setting variable value, can another read at same time withoud any problems? Or should it be locked by same object, and force thread to wait with reading, until its changed by another thread?
All access to shared state must be guarded by the same lock, both reads and writes. A read operation must wait for the write operation to release the lock.
As a special case, if all you would to inside your synchronized blocks amounts to exactly one read or write operation, then you may dispense with the synchronized block and mark the variable as volatile.
Short: It depends.
Longer:
There is many "correct answer" for each different scenarios. (and that makes programming fun)
Do the value to be read have to be "latest"?
Do the value to be written have let all reader known?
Should I take care any race-condition if two threads write?
Will there be any issue if old/previous value being read?
What is the correct behaviour?
Do it really need it to be correct ? (yes, sometime you don't care for good)
tl;dr
For example, not all threaded programming need "always correct"
sometime you tradeoff correctness with performance (e.g. log or progress counter)
sometime reading old value is just fine
sometime you need eventually correct (e.g. in map-reduce, nobody nor synchronized is right until all done)
in some cases, correct is mandatory for every moment (e.g. your bank account balance)
in write-once, read-only it doesn't matter.
sometime threads in groups with complex cases.
sometime many small, independent lock run faster, but sometime flat global lock is faster
and many many other possible cases
Here is my suggestion: If you are learning, you should thing "why should I need a lock?" and "why a lock can help in DIFFERENT cases?" (not just the given sample from textbook), "will if fail or what could happen if a lock is missing?"
If all threads are reading, you do not need to synchronize.
If one or more threads are reading and one or more are writing you will need to synchronize somehow. If the collection is small you can use synchronized. You can either add a synchronized block around the accesses to the collection, synchronized the methods that access the collection or use a concurrent threadsafe collection (for example, Vector).
If you have a large collection and you want to allow shared reading but exclusive writing you need to use a ReadWriteLock. See here for the JavaDoc and an exact description of what you want with examples:
ReentrantReadWriteLock
Note that this question is pretty common and there are plenty of similar examples on this site.
I've been caught by yet another deadlock in our Java application and started thinking about how to detect potential deadlocks in the future. I had an idea of how to do this, but it seems almost too simple.
I'd like to hear people's views on it.
I plan to run our application for several hours in our test environment, using a typical data set.
I think it would be possible to perform bytecode manipulation on our application such that, whenever it takes a lock (e.g. entering a synchronized block), details of the lock are added to a ThreadLocal list.
I could write an algorithm that, at some later point, compares the lists for all threads and checks if any contain the same pair of locks in opposite order - this would be reported as a deadlock possibility. Again, I would use bytecode manipulation to add this periodic check to my application.
So my question is this: is this idea (a) original and (b) viable?
This is something that we talked about when I took a course in concurrency. I'm not sure if your implementation is original, but the concept of analysis to determine potential deadlock is not unique. There are dynamic analysis tools for Java, such as JCarder. There is also research into some analysis that can be done statically.
Admittedly, it's been a couple of years since I've looked around. I don't think JCarder was the specific tool we talked about (at least, the name doesn't sound familiar, but I couldn't find anything else). But the point is that analysis to detect deadlock isn't an original concept, and I'd start by looking at research that has produced usable tools as a starting point - I would suspect that the algorithms, if not the implementation, are generally available.
I have done something similar to this with Lock by supplying my own implementation.
These days I use the actor model, so there is little need to lock the data (as I have almost no shared mutable data)
In case you didn't know, you can use the Java MX bean to detect deadlocked threads programmatically. This doesn't help you in testing but it will help you at least better detect and recover in production.
ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
long[] deadLockedThreadIds = threadMxBean.findMonitorDeadlockedThreads();
// log the condition or even interrupt threads if necessary
...
That way you can find some deadlocks, but never prove their absence. I'd better develop static checking tool, a kind of bytecode analizer, feeded with annotations for each synchronized method. Annotations should show the place of the annotated method in the resource graph. The task is then to find loops in the graph. Each loop means deadlock.
All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java
Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.
You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".
As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.
Buy and read Brian Goetz's "Java Concurrency in Practice".
Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have been considering adding threaded procedures to my application to speed up execution, but the problem is that I honestly have no idea how to use threads, or what is considered "thread safe". For example, how does a game engine utilize threads in its rendering processes, or in what contexts would threads only be considered nothing but a hindrance? Can someone point the way to some resources to help me learn more or explain here?
This is a very broad topic. But here are the things I would want to know if I knew nothing about threads:
They are units of execution within a single process that happen "in parallel" - what this means is that the current unit of execution in the processor switches rapidly. This can be achieved via different means. Switching is called "context switching", and there is some overhead associated with this.
They can share memory! This is where problems can occur. I talk about this more in depth in a later bullet point.
The benefit of parallelizing your application is that logic that uses different parts of the machine can happen simultaneously. That is, if part of your process is I/O-bound and part of it is CPU-bound, the I/O intensive operation doesn't have to wait until the CPU-intensive operation is done. Some languages also allow you to run threads at the same time if you have a multicore processor (and thus parallelize CPU-intensive operations as well), though this is not always the case.
Thread-safe means that there are no race conditions, which is the term used for problems that occur when the execution of your process depends on timing (something you don't want to rely on). For example, if you have threads A and B both incrementing a shared counter C, you could see the case where A reads the value of C, then B reads the value of C, then A overwrites C with C+1, then B overwrites C with C+1. Notice that C only actually increments once!
A couple of common ways avoid race conditions include synchronization, which excludes mutual access to shared state, or just not having any shared state at all. But this is just the tip of the iceberg - thread-safety is quite a broad topic.
I hope that helps! Understand that this was a very quick introduction to something that requires a good bit of learning. I would recommend finding a resource about multithreading in your preferred language, whatever that happens to be, and giving it a thorough read.
There are four things you should know about threads.
Threads are like processes, but they share memory.
Threads often have hardware, OS, and language support, which might make them better than processes.
There are lots of fussy little things that threads need to support (like locks and semaphores) so they don't get the memory they share into an inconsistent state. This makes them a little difficult to use.
Locking isn't automatic (in the languages I know), so you have to be very careful with the memory they (implicitly) share.
Threads don't speed up applications. Algorithms speed up applications. Threads can be used in algorithms, if appropriate.
Well someone will probably answer this better, but threads are for the purpose of having background processing that won't freeze the user interface. You don't want to stop accepting keyboard input or mouse input, and tell the user, "just a moment, I want to finish this computation, it will only be a few more seconds." (And yet its amazing how many times commercial programs do this.
As far as thread safe, it means a function that does not have some internal saved state. If it did you couldn't have multiple threads using it simutaneously.
As far as thread programming you just have to start doing it, and then you'll start encountering various issues unique to thread programming, for example simultaneuous access to data, in which case you have to decide to use some syncronization method such as critical sections or mutexes or something else, each one having slightly different nuances in their behavior.
As far as the differences between processes and threads (which you didn't ask) processes are an OS level entity, whereas threads are associated with a program. In certain instances your program may want to create a process rather than a thread.
Threads are simply a way of executing multiple things simultaneously (assuming that the platform on which they are being run is capable of parallel execution). Thread safety is simply (well, nothing with threads is truly simple) making sure that the threads don't affect each other in harmful ways.
In general, you are unlikely to see systems use multiple threads for rendering graphics on the screen due to the multiple performance implications and complexity issues that may arise from that. Other tasks related to state management (or AI) can potentially be moved to separate threads however.
First rule of threading: don't thread. Second rule of threading: if you have to violate rule one...don't. Third rule: OK, fine you have to use threads, so before proceeding get your head into the pitfalls, understand locking and the common thread problems such as deadlock and livelocking.
Understand that threading does not speed up anything, it is only useful to background long-running processes allowing the user can do something else with the application. If you have to allow the user to interact with the application while the app does something else in the background, like poll a socket or wait for ansynchronous input from elsewhere in the application, then you may indeed require threading.
The thread sections in both Effective Java and Clean Code are good introductions to threads and their pitfalls.
Since the question is tagged with 'Java', I assume you are familiar with Java, in which case this is a great introductory tutorial
http://java.sun.com/docs/books/tutorial/essential/concurrency/
Orm, great question to ask. I think all serious programmers should learn about threads, cause eventually you will at least consider using them and you really want to be prepared when it happens. Concurrency bugs can be incredibly subtle and the best way to avoid them is to know what idioms are safe(-ish).
I highly recommend you take the time to read the book Concurrent Programming in Java: Design Principles and Patterns by Doug Lea:
http://gee.cs.oswego.edu/dl/cpj/
Lea takes the time not only to teach you the concepts, but also to show you the correct and incorrect ways to use the concurrent programming primitives (in Java but also helpful for any other environment that uses shared-memory locking/signaling style concurrency). Most of all he teaches respect for the difficulty of concurrent programming.
I should add that this style of concurrent programming is the most common but not the only approach. There's also message passing, which is safer but forces you to structure your algorithm differently.
Since the original post is very broad, and also tagged with C++, I think the following pointers are relevant:
Anthony Williams, maintainer of the Boost Thread Library, has been working on a book called "C++ Concurrency in Action", a description of which you can find here. The first (introductory) chapter is available for free in pdf form here.
Also, Herb Sutter (known, among other things, for his "Exceptional C++" series) has been writing a book to be called "Effective Concurrency", many articles of which are available in draft form here.
There's a nice book, Java Concurrency in Practice, http://www.javaconcurrencyinpractice.com/ .