I have a set of counters which will only ever be updated in a single thread.
If I read these values from another thread and I don't user volatile/atomic/synchronized how out of date can these values be?
I ask as I am wondering if I can avoid using volatile/atomic/synchronized here.
I currently believe that I can't make any assumptions about time to update (so I am forced to use at least volatile). Just want to make sure I am not missing something here.
I ask as I am wondering if I can avoid using volatile/atomic/synchronized here.
In practice, the CPU cache is probably going to be synchronized to main memory anyway on a regular basis (how often depends on many parameters), so it sounds like you would be able to see some new values from time to time.
But that is missing the point: the actual problem is that if you don't use a proper synchronization pattern, the compiler is free to "optimise" your code and remove the update part.
For example:
class Broken {
boolean stop = false;
void broken() throws Exception {
while (!stop) {
Thread.sleep(100);
}
}
}
The compiler is authorised to rewrite that code as:
void broken() throws Exception {
while (true) {
Thread.sleep(100);
}
}
because there is no obligation to check if the non-volatile stop might change while you are executing the broken method. Mark the stop variable as volatile and that optimisation is not allowed any more.
Bottom line: if you need to share state you need synchronization.
How stale a value can get is left entirely to the discretion of the implementation -- the spec doesn't provide any guarantees. You will be writing code that depends on the implementation details of a particular JVM and which can be broken by changes to memory models or to how the JIT reorders code. The spec seems to be written with the intent of giving the implementers as much rope as they want, as long as they observe the constraints imposed by volatile, final, synchronized, etc.
It looks like the only way that I can avoid the synchronization of these variables is to do the following (similar to what Zan Lynx suggested in the comments):
Figure out the maximum age I am prepared to accept. I will make this
the "update interval".
Each "update interval" copy the unsynchronized counter variables to synchronized variables. This neeeds to be done on the write thread.
Read thread(s) can only read from these synchronized variables.
Of course, this optimization may only be a marginal improvement and would probably not be worth it considering the extra complexity it would create.
Java8 has a new class called LongAdder which helps with the problem of using volatile on a field. But until then...
If you do not use volatile on your counter then the results are unpredictable. If you do use volatile then there are performance problems since each write must guarantee cache/memory coherency. This is a huge performance problem when there are many threads writing frequently.
For statistics and counters that are not critical to the application, I give users the option of volatile/atomic or none with none the default. So far, most use none.
Related
I have a pretty basic method,
//do stuff
}
. I was having issues in that new quotes would update the order, so I wanted to synchronize on the order parameter. So my code would like:
handleOrder(IOrder order) {
synchronized(order){
//do stuff
}
}
Now however, intellij is complaining that:
Synchronization on method parameter 'order'
Inspection info: Reports synchronization on a local variable or parameter. It is very difficult to guarantee correctness when such synchronization is used. It may be possible to improve code like this by controlling access through e.g. a synchronized wrapper class, or by synchronizing on a field.
Is this something I actually need to be concerned about?
Yes, because this type of synchronization is generally an indication that the code cannot easily be reviewed to ensure that deadlocks don't take place.
When you synchronize on a field, you're combining the synchronization code with the instance being used in a way that permits you to have most, if not all of the competing methods in the same file. This makes it easier to review the file for deadlocks and errors in the synchronization approach. The same idea applies when using a synchronized wrapper class.
When you synchronize on a passed instance (local field) then you need to review all of the code of the entire application for other synchronization efforts on the same instance to get the same level of security that a mistake was not made. In addition, this will have to be done frequently, as there is little assurance that after the next commit, a developer will have done the same code scan to make sure that their synchronization didn't impact code that lived in some remote directory (or even in a remote JAR file that doesn't have source code on their machine).
I am learning multithreading, and I have a little question.
When I am sharing some variable between threads (ArrayList, or something other like double, float), should it be lcoked by the same object in read/write? I mean, when 1 thread is setting variable value, can another read at same time withoud any problems? Or should it be locked by same object, and force thread to wait with reading, until its changed by another thread?
All access to shared state must be guarded by the same lock, both reads and writes. A read operation must wait for the write operation to release the lock.
As a special case, if all you would to inside your synchronized blocks amounts to exactly one read or write operation, then you may dispense with the synchronized block and mark the variable as volatile.
Short: It depends.
Longer:
There is many "correct answer" for each different scenarios. (and that makes programming fun)
Do the value to be read have to be "latest"?
Do the value to be written have let all reader known?
Should I take care any race-condition if two threads write?
Will there be any issue if old/previous value being read?
What is the correct behaviour?
Do it really need it to be correct ? (yes, sometime you don't care for good)
tl;dr
For example, not all threaded programming need "always correct"
sometime you tradeoff correctness with performance (e.g. log or progress counter)
sometime reading old value is just fine
sometime you need eventually correct (e.g. in map-reduce, nobody nor synchronized is right until all done)
in some cases, correct is mandatory for every moment (e.g. your bank account balance)
in write-once, read-only it doesn't matter.
sometime threads in groups with complex cases.
sometime many small, independent lock run faster, but sometime flat global lock is faster
and many many other possible cases
Here is my suggestion: If you are learning, you should thing "why should I need a lock?" and "why a lock can help in DIFFERENT cases?" (not just the given sample from textbook), "will if fail or what could happen if a lock is missing?"
If all threads are reading, you do not need to synchronize.
If one or more threads are reading and one or more are writing you will need to synchronize somehow. If the collection is small you can use synchronized. You can either add a synchronized block around the accesses to the collection, synchronized the methods that access the collection or use a concurrent threadsafe collection (for example, Vector).
If you have a large collection and you want to allow shared reading but exclusive writing you need to use a ReadWriteLock. See here for the JavaDoc and an exact description of what you want with examples:
ReentrantReadWriteLock
Note that this question is pretty common and there are plenty of similar examples on this site.
This maybe a related question: Java assignment issues - Is this atomic?
I have the same class as the OP that acts on a mutable string reference. But set rarely happens. (basically this string is part of a server configuration that only reloads when forced to).
public class Test {
private String s;
public void setS(String str){
s = str;
}
public String getS(){
return s;
}
}
Multiple threads will be pounding this variable to read its value. What is the best method to make it 'safe' while not having to incur the performance degradation by declaring it volatile?
I am currently heading into the direction of ReadWriteLock, but as far as I understand, ReadWrite locks does not make it safe from thread caching? unless some syncronisation happen? Which means I've gone a full circle back to I may as well just use the volatile keyword?
Is my understanding correct? Is there nothing that can 'notify' other threads about an update to a variable in main memory manually such that they can update their local cache just once on a full moon?
volatile on this seems overkill given that the server application is designed to run for months without restart. By that time, it would've served a few million reads. I'm thinking I might as well just set the String as static final and not allow it mutate without a complete application and JVM restart.
Reads and writes to references are atomic. The problems you can incur is attempting to perform a read and a write (an update) or guaranteeing that after a write all thread see this change on the next read. However, only you can say what your requirements are.
When you use volatile, it requires a cache coherent copy be read or written. This doesn't require a copy be made to/from main memory as the caches communicate amongst themselves, even between sockets. There is a performance impact but it doesn't mean the caches are not used.
Even if the access did go all the way to main memory, you could still do millions of accesses per second.
Why a mutable String? Why not a Config class with a simple static String. When config is updated, you change this static reference, which is an atomic operation and won't be a problem for reading threads. You then have no synchronization, no locking penalties.
In order to notify the clients to this server you can use observer pattern, who ever is interested in getting the info of server update can register for your event and server delivers the notification. This shouldnt become a bottleneck as you mentioned the reload is not often.
Now to make this thread safe you can have a separate thread handle the update of server state and if your get you check for the state if state is 'Updating' you wait for it to complete say you went to sleep. Once your update thread is done it should change the state from 'Updating' to 'Updated', once you come out of sleep check for the state if it is 'Updating' then go to sleep or else start servicing the request.
This approach will add an extra if in your code but then it will enable you to reload the cache without forcing application restart.
Also this shouldnt be a bottleneck as server update is not frequent.
Hope this makes some sense.
In order to avoid the volatile keyword, you could add a "memory barrier" method to your Test class that is only called very rarely, for example
public synchronized void sync() {
}
This will force the thread to re-read the field value from main memory.
Also, you would have to change the setter to
public synchronized void setS(String str){
s = str;
}
The synchronized keyword will force the setting thread to write directly to main memory.
See here for a detailed explanation of synchronization and memory barriers.
All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java
Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.
You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".
As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.
Buy and read Brian Goetz's "Java Concurrency in Practice".
Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.
Possibly similar question:
Do you ever use the volatile keyword in Java?
Today I was debugging my game; It had a very difficult threading problem that would show up every few minutes, but was difficult to reproduce. So first I added the synchronized keyword to each of my methods. That didn't work. Then I added the volatile keyword to every field. The problem seemed to just fix itself.
After some experimentation I found that the field responsible was a GameState object which kept track of my game's current state, which can be either playing or busy. When busy, the game ignores user input. What I had was a thread that constantly changed the state variable, while the Event thread reads the state variable. However, after one thread changes the variable, it takes several seconds for the other thread to recognize the changes, which ultimately causes the problem.
It was fixed by making the state variable volatile.
Why aren't variables in Java volatile by default and what's a reason not to use the volatile keyword?
To make a long story short, volatile variables--be they in Java or C#--are never cached locally within the thread. This doesn't have much of an implication unless you're dealing with a multiprocessor/multicore CPU with threads executing on different cores, as they'd be looking at the same cache. When you declare a variable as volatile, all reads and writes come straight from and go straight to the actual main memory location; there's no cache involved. This has implications when it comes to optimization, and to do so unnecessarily (when most variables don't need to be volatile) would be inflicting a performance penalty (paltry as it may or may not be) for a relatively small gain.
Volatiles are really only needed when you're trying to write low-level thread-safe, lock-free code. Most of your code probably shouldn't be either thread-safe or lock-free. In my experience, lock-free programming is only worth attempting after you've found that the simpler version which does do locking is incurring a significant performance hit due to the locking.
The more pleasant alternative is to use other building blocks in java.util.concurrent, some of which are lock-free but don't mess with your head quite as much as trying to do it all yourself at a low level.
Volatility has its own performance costs, and there's no reason why most code should incur those costs.
Personally I think fields should have been final by default and mutable only with an extra keyword, but that boat has sailed along time ago. ;)
While others are correct in pointing out why it would be a bad idea to default to volatile, there's another point to make: there is very likely a bug in your code.
Variables seldom need to made volatile: there is always a way to properly synchronize access to variables (either by synchronized keyword, or using AtomicXxx objects from java.util.concurrency): exceptions would include JNI code manipulating these (which is not bound by synchronization directives).
So instead of adding volatile, you may want to figure out WHY it resolved the problem. It isn't the only way to solve it, and there is probably a better way.
Because the compiler can't optimise volatile variables.
volatile tells the compiler that the variable can change at any time. Therefore, it can't assume that the variable won't change and optimise accordingly.
Declaring variables volatile generally has a huge impact on performance. On traditional single-threaded systems, it was relativly easy to know what needed to be volatile; it was those things that accessed hardware.
On multi-threaded it can be a little more complex, but I would generally encourage using notifications and event queues to handle passing data between theads in leau of magic variables. In Java it may not matter much; in C/C++ you would get into trouble when those variables cannot be set atomically by the underlying hardware.