Related
I wonder if a method is going to be thread safe if it just calls another thread safe method. I have an example like this. As the docs state, ConcurrentSkipListSet is thread-safe.
public class MessageHolder {
private Set<String> processedIds;
public MessageHolder() {
this.processedIds = new ConcurrentSkipListSet<>();
}
public void add (String id) {
processedIds.add(id);
}
public boolean contains (String id) {
return processedIds.contains(id);
}
public void remove (String id) {
processedIds.remove(id);
}
}
You may ask why I am not using ConcurrentSkipListSet directly. The reason is that I want to create an interface for the actions performed here and this example will be like an in memory version.
I think that you deserve some clarification on the comments and also some clarification on what causes a race condition.
On race conditions-- the basic idea of all threading is that at anytime of execution the current thread that you are on could be rescheduled to a later time or another thread might be executing and accessing the same data in parrallel.
For one, the process IDs should be final as mentioned before. YOUR CODE WILL NOT BE THREAD SAFE UNTIL YOU DO THIS. Even though the ConcurrentSkipListSet<> is thread safe this doesn't stop the variable processIds from being reassigned by another thread. Also, Java is weird and to your processIds field must be marked final to guarantee that is initialized before the constructor completes. I found this stackoverflow post that explains some of the issues with object construction in java for some more reading. Constructor synchronization in Java. Basically, don't mark your constructor fields as synchronized, but if you want to guarantee variable initialization in the constructor, which in this case you do, then mark your field as final.
To answer your question on whether this is thread safe, the only answer would be that it depends on the client usage you are expecting. The methods you provided are indeed thread safe for their intended purposes as written, but a client could use them and produce a race condition. Your intuition on whether or not it would be 100% necessary to use the synchronized keyword is correct. Yet, what the comments are alluding too is that not making these methods explicitly thread safe might have some dire consequence in the future for the maintainability and correctness of your code.
A client could still use the API you provide in an unsafe way that could result in a race condition as mentioned in one of the comments. If you are providing an interface to a client you may not care about this... or you might care about this and want to provide a mechanism for the client to make multiple accesses to your class with guaranteed thread safety.
Overall, I would probably recommend that you mark your methods as synchronized for a couple of reasons 1) it makes in clear to the client that they are accessing methods that are thread safe which is very important. I can easily imagine a situation where the client decides to use a lock when it isn't needed at the detriment of performance. 2) Someone could change your methods to contain some different logic in the future that requires the synchronized keyword (this isn't so unlikely because you seem to already be in a threaded environment).
Consider the following scenario.
public synchronized String getData(){
return getInfo();
}
private String getInfo(){
// operation here should synchronized
return "Synchronize Info";
}
If some one by pass the flow of this operation. System may become unstable, if Java compiler force to make getInfo synchronized that kind of issues won't be there. But Java compiler doesn't force to do so. Now developer should responsible for make getInfo synchronized or not. Why Java doesn't force to make getInfo synchronized ?
It's impossible for the Java compiler to enforce this usefully. You could invent a rule that synchronized methods can only call methods of the same object if they are synchronized, but it's not helpful except in quite narrow scenarios. If getInfo() were public, getData() could easily call a different class, which calls back to getInfo() on the original object.
Also, synchronization can be used on individual statements rather than whole methods, and there are other locks too. Trying to design compiler rules to prevent synchronized code using unsynchronized data for all these cases would be complicated or impossible.
Actually, it's legitimate to want to call a method without the cost of synchronization at times when it's known that only thread is using the object. Deciding which parts of the object should be locked, and when, is a complicated design problem only the human programmer can do.
So Java can't/won't prevent you from writing code which is not theadsafe. It doesn't care. If by "system" you mean "computer", it will not "become unstable". At worst, that one class will be buggy.
From the Java documentation at http://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html, "it is not possible for two invocations of synchronized methods on the same object to interleave".
In other words, if you call getData, then any future calls to getData on the same object will wait until that one is done. So even when the control flow moves into getInfo, the system will still block any future calls to getData until the first returns, so there is no reason why getInfo should be blocked. I hope that helps!
I could find the answer if I read a complete chapter/book about multithreading, but I'd like a quicker answer. (I know this stackoverflow question is similar, but not sufficiently.)
Assume there is this class:
public class TestClass {
private int someValue;
public int getSomeValue() { return someValue; }
public void setSomeValue(int value) { someValue = value; }
}
There are two threads (A and B) that access the instance of this class. Consider the following sequence:
A: getSomeValue()
B: setSomeValue()
A: getSomeValue()
If I'm right, someValue must be volatile, otherwise the 3rd step might not return the up-to-date value (because A may have a cached value). Is this correct?
Second scenario:
B: setSomeValue()
A: getSomeValue()
In this case, A will always get the correct value, because this is its first access so he can't have a cached value yet. Is this right?
If a class is accessed only in the second way, there is no need for volatile/synchronization, or is it?
Note that this example was simplified, and actually I'm wondering about particular member variables and methods in a complex class, and not about whole classes (i.e. which variables should be volatile or have synced access). The main point is: if more threads access certain data, is synchronized access needed by all means, or does it depend on the way (e.g. order) they access it?
After reading the comments, I try to present the source of my confusion with another example:
From UI thread: threadA.start()
threadA calls getSomeValue(), and informs the UI thread
UI thread gets the message (in its message queue), so it calls: threadB.start()
threadB calls setSomeValue(), and informs the UI thread
UI thread gets the message, and informs threadA (in some way, e.g. message queue)
threadA calls getSomeValue()
This is a totally synchronized structure, but why does this imply that threadA will get the most up-to-date value in step 6? (if someValue is not volatile, or not put into a monitor when accessed from anywhere)
If two threads are calling the same methods, you can't make any guarantees about the order that said methods are called. Consequently, your original premise, which depends on calling order, is invalid.
It's not about the order in which the methods are called; it's about synchronization. It's about using some mechanism to make one thread wait while the other fully completes its write operation. Once you've made the decision to have more than one thread, you must provide that synchronization mechanism to avoid data corruption.
As we all know, that its the crucial state of the data that we need to protect, and the atomic statements which govern the crucial state of the data must be Synchronized.
I had this example, where is used volatile, and then i used 2 threads which used to increment the value of a counter by 1 each time till 10000. So it must be a total of 20000. but to my surprise it didnt happened always.
Then i used synchronized keyword to make it work.
Synchronization makes sure that when a thread is accessing the synchronized method, no other thread is allowed to access this or any other synchronized method of that object, making sure that data corruption is not done.
Thread-Safe class means that it will maintain its correctness in the presence of the scheduling and interleaving of the underlining Runtime environment, without any thread-safe mechanism from the Client side, which access that class.
Let's look at the book.
A field may be declared volatile, in which case the Java memory model (ยง17) ensures that all threads see a consistent value for the variable.
So volatile is a guarantee that the declared variable won't be copied into thread local storage, which is otherwise allowed. It's further explained that this is an intentional alternative to locking for very simple kinds of synchronized access to shared storage.
Also see this earlier article, which explains that int access is necessarily atomic (but not double or long).
These together mean that if your int field is declared volatile then no locks are necessary to guarantee atomicity: you will always see a value that was last written to the memory location, not some confused value resulting from a half-complete write (as is possible with double or long).
However you seem to imply that your getters and setters themselves are atomic. This is not guaranteed. The JVM can interrupt execution at intermediate points of during the call or return sequence. In this example, this has no consequences. But if the calls had side effects, e.g. setSomeValue(++val), then you would have a different story.
The issue is that java is simply a specification. There are many JVM implementations and examples of physical operating environments. On any given combination an an action may be safe or unsafe. For instance On single processor systems the volatile keyword in your example is probably completely unnecessary. Since the writers of the memory and language specifications can't reasonably account for possible sets of operating conditions, they choose to white-list certain patterns that are guaranteed to work on all compliant implementations. Adhering to to these guidelines ensures both that your code will work on your target system and that it will be reasonably portable.
In this case "caching" typically refers to activity that is going on at the hardware level. There are certain events that occur in java that cause cores on a multi processor systems to "Synchronize" their caches. Accesses to volatile variables are an example of this, synchronized blocks are another. Imagine a scenario where these two threads X and Y are scheduled to run on different processors.
X starts and is scheduled on proc 1
y starts and is scheduled on proc 2
.. now you have two threads executing simultaneously
to speed things up the processors check local caches
before going to main memory because its expensive.
x calls setSomeValue('x-value') //assuming proc 1's cache is empty the cache is set
//this value is dropped on the bus to be flushed
//to main memory
//now all get's will retrieve from cache instead
//of engaging the memory bus to go to main memory
y calls setSomeValue('y-value') //same thing happens for proc 2
//Now in this situation depending on to order in which things are scheduled and
//what thread you are calling from calls to getSomeValue() may return 'x-value' or
//'y-value. The results are completely unpredictable.
The point is that volatile(on compliant implementations) ensures that ordered writes will always be flushed to main memory and that other processor's caches will be flagged as 'dirty' before the next access regardless of the thread from which that access occurs.
disclaimer: volatile DOES NOT LOCK. This is important especially in the following case:
volatile int counter;
public incrementSomeValue(){
counter++; // Bad thread juju - this is at least three instructions
// read - increment - write
// there is no guarantee that this operation is atomic
}
this could be relevant to your question if your intent is that setSomeValue must always be called before getSomeValue
If the intent is that getSomeValue() must always reflect the most recent call to setSomeValue() then this is a good place for the use of the volatile keyword. Just remember that without it there is no guarantee that getSomeValue() will reflect to most recent call to setSomeValue() even if setSomeValue() was scheduled first.
If I'm right, someValue must be volatile, otherwise the 3rd step might not return the up-to-date value (because A may have a cached
value). Is this correct?
If thread B calls setSomeValue(), you need some sort of synchronization to ensure that thread A can read that value. volatile won't accomplish this on its own, and neither will making the methods synchronized. The code that does this is ultimately whatever synchronization code you added that made sure that A: getSomeValue() happens after B: setSomeValue(). If, as you suggest, you used a message queue to synchronize threads, this happens because the memory changes made by thread A became visible to thread B once thread B acquired the lock on your message queue.
If a class is accessed only in the second way, there is no need for
volatile/synchronization, or is it?
If you are really doing your own synchronization then it doesn't sound like you care whether these classes are thread-safe. Be sure that you aren't accessing them from more than one thread at the same time though; otherwise, any methods that aren't atomic (assiging an int is) may lead to you to be in an unpredictable state. One common pattern is to put the shared state into an immutable object so that you are sure that the receiving thread isn't calling any setters.
If you do have a class that you want to be updated and read from multiple threads, I'd probably do the simplest thing to start, which is often to synchronize all public methods. If you really believe this to be a bottleneck, you could look into some of the more complex locking mechanisms in Java.
So what does volatile guarantee?
For the exact semantics, you might have to go read tutorials, but one way to summarize it is that 1) any memory changes made by the last thread to access the volatile will be visible to the current thread accessing the volatile, and 2) that accessing the volatile is atomic (it won't be a partially constructed object, or a partially assigned double or long).
Synchronized blocks have analogous properties: 1) any memory changes made by the last thread to access to the lock will be visible to this thread, and 2) the changes made within the block are performed atomically with respect to other synchronized blocks
(1) means any memory changes, not just changes to the volatile (we're talking post JDK 1.5) or within the synchronized block. This is what people mean when they refer to ordering, and this is accomplished in different ways on different chip architectures, often by using memory barriers.
Also, in the case of synchronous blocks (2) only guarantees that you won't see inconsistent values if you are within another block synchronized on the same lock. It's usually a good idea to synchronize all access to shared variables, unless you really know what you are doing.
consider this class,with no instance variables and only methods which are non-synchronous can we infer from this info that this class in Thread-safe?
public class test{
public void test1{
// do something
}
public void test2{
// do something
}
public void test3{
// do something
}
}
It depends entirely on what state the methods mutate. If they mutate no shared state, they're thread safe. If they mutate only local state, they're thread-safe. If they only call methods that are thread-safe, they're thread-safe.
Not being thread safe means that if multiple threads try to access the object at the same time, something might change from one access to the next, and cause issues. Consider the following:
int incrementCount() {
this.count++;
// ... Do some other stuff
return this.count;
}
would not be thread safe. Why is it not? Imagine thread 1 accesses it, count is increased, then some processing occurs. While going through the function, another thread accesses it, increasing count again. The first thread, which had it go from, say, 1 to 2, would now have it go from 1 to 3 when it returns. Thread 2 would see it go from 1 to 3 as well, so what happened to 2?
In this case, you would want something like this (keeping in mind that this isn't any language-specific code, but closest to Java, one of only 2 I've done threading in)
int incrementCount() synchronized {
this.count++;
// ... Do some other stuff
return this.count;
}
The synchronized keyword here would make sure that as long as one thread is accessing it, no other threads could. This would mean that thread 1 hits it, count goes from 1 to 2, as expected. Thread 2 hits it while 1 is processing, it has to wait until thread 1 is done. When it's done, thread 1 gets a return of 2, then thread 2 goes throguh, and gets the expected 3.
Now, an example, similar to what you have there, that would be entirely thread-safe, no matter what:
int incrementCount(int count) {
count++;
// ... Do some other stuff
return this.count;
}
As the only variables being touched here are fully local to the function, there is no case where two threads accessing it at the same time could try working with data changed from the other. This would make it thread safe.
So, to answer the question, assuming that the functions don't modify anything outside of the specific called function, then yes, the class could be deemed to be thread-safe.
Consider the following quote from an article about thread safety ("Java theory and practice: Characterizing thread safety"):
In reality, any definition of thread safety is going to have a certain degree of circularity, as it must appeal to the class's specification -- which is an informal, prose description of what the class does, its side effects, which states are valid or invalid, invariants, preconditions, postconditions, and so on. (Constraints on an object's state imposed by the specification apply only to the externally visible state -- that which can be observed by calling its public methods and accessing its public fields -- rather than its internal state, which is what is actually represented in its private fields.)
Thread safety
For a class to be thread-safe, it first must behave correctly in a single-threaded environment. If a class is correctly implemented, which is another way of saying that it conforms to its specification, no sequence of operations (reads or writes of public fields and calls to public methods) on objects of that class should be able to put the object into an invalid state, observe the object to be in an invalid state, or violate any of the class's invariants, preconditions, or postconditions.
Furthermore, for a class to be thread-safe, it must continue to behave correctly, in the sense described above, when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, without any additional synchronization on the part of the calling code. The effect is that operations on a thread-safe object will appear to all threads to occur in a fixed, globally consistent order.
So your class itself is thread-safe, as long as it doesn't have any side effects. As soon as the methods mutate any external objects (e.g. some singletons, as already mentioned by others) it's not any longer thread-safe.
Depends on what happens inside those methods. If they manipulate / call any method parameters or global variables / singletons which are not themselves thread safe, the class is not thread safe either.
(yes I see that the methods as shown here here have no parameters, but no brackets either, so this is obviously not full working code - it wouldn't even compile as is.)
yes, as long as there are no instance variables. method calls using only input parameters and local variables are inherently thread-safe. you might consider making the methods static too, to reflect this.
If it has no mutable state - it's thread safe. If you have no state - you're thread safe by association.
No, I don't think so.
For example, one of the methods could obtain a (non-thread-safe) singleton object from another class and mutate that object.
Yes - this class is thread safe but this does not mean that your application is.
An application is thread safe if the threads in it cannot concurrently access heap state. All objects in Java (and therefore all of their fields) are created on the heap. So, if there are no fields in an object then it is thread safe.
In any practical application, objects will have state. If you can guarantee that these objects are not accessed concurrently then you have a thread safe application.
There are ways of optimizing access to shared state e.g. Atomic variables or with carful use of the volatile keyword, but I think this is going beyond what you've asked.
I hope this helps.
From Sun's tutorial:
Synchronized methods enable a simple strategy for preventing thread interference and memory consistency errors: if an object is visible to more than one thread, all reads or writes to that object's variables are done through synchronized methods. (An important exception: final fields, which cannot be modified after the object is constructed, can be safely read through non-synchronized methods, once the object is constructed) This strategy is effective, but can present problems with liveness, as we'll see later in this lesson.
Q1. Is the above statements mean that if an object of a class is going to be shared among multiple threads, then all instance methods of that class (except getters of final fields) should be made synchronized, since instance methods process instance variables?
In order to understand concurrency in Java, I recommend the invaluable Java Concurrency in Practice.
In response to your specific question, although synchronizing all methods is a quick-and-dirty way to accomplish thread safety, it does not scale well at all. Consider the much maligned Vector class. Every method is synchronized, and it works terribly, because iteration is still not thread safe.
No. It means that synchronized methods are a way to achieve thread safety, but they're not the only way and, by themselves, they don't guarantee complete safety in all situations.
Not necessarily. You can synchronize (e.g. place a lock on dedicated object) part of the method where you access object's variables, for example. In other cases, you may delegate job to some inner object(s) which already handles synchronization issues.
There are lots of choices, it all depends on the algorithm you're implementing. Although, 'synchronized' keywords is usually the simplest one.
edit
There is no comprehensive tutorial on that, each situation is unique. Learning it is like learning a foreign language: never ends :)
But there are certainly helpful resources. In particular, there is a series of interesting articles on Heinz Kabutz's website.
http://www.javaspecialists.eu/archive/Issue152.html
(see the full list on the page)
If other people have any links I'd be interested to see also. I find the whole topic to be quite confusing (and, probably, most difficult part of core java), especially since new concurrency mechanisms were introduced in java 5.
Have fun!
In the most general form yes.
Immutable objects need not be synchronized.
Also, you can use individual monitors/locks for the mutable instance variables (or groups there of) which will help with liveliness. As well as only synchronize the portions where data is changed, rather than the entire method.
synchronized methodName vs synchronized( object )
That's correct, and is one alternative. I think it would be more efficient to synchronize access to that object only instead synchronize all it's methods.
While the difference may be subtle, it would be useful if you use that same object in a single thread
ie ( using synchronized keyword on the method )
class SomeClass {
private int clickCount = 0;
public synchronized void click(){
clickCount++;
}
}
When a class is defined like this, only one thread at a time may invoke the click method.
What happens if this method is invoked too frequently in a single threaded app? You'll spend some extra time checking if that thread can get the object lock when it is not needed.
class Main {
public static void main( String [] args ) {
SomeClass someObject = new SomeClass();
for( int i = 0 ; i < Integer.MAX_VALUE ; i++ ) {
someObject.click();
}
}
}
In this case, the check to see if the thread can lock the object will be invoked unnecessarily Integer.MAX_VALUE ( 2 147 483 647 ) times.
So removing the synchronized keyword in this situation will run much faster.
So, how would you do that in a multithread application?
You just synchronize the object:
synchronized ( someObject ) {
someObject.click();
}
Vector vs ArrayList
As an additional note, this usage ( syncrhonized methodName vs. syncrhonized( object ) ) is, by the way, one of the reasons why java.util.Vector is now replaced by java.util.ArrayList. Many of the Vector methods are synchronized.
Most of the times a list is used in a single threaded app or piece of code ( ie code inside jsp/servlets is executed in a single thread ), and the extra synchronization of Vector doesn't help to performance.
Same goes for Hashtable being replaced by HashMap
In fact getters a should be synchronized too or fields are to be made volatile. That is because when you get some value, you're probably interested in a most recent version of the value. You see, synchronized block semantics provides not only atomicity of execution (e.g. it guarantees that only one thread executes this block at one time), but also a visibility. It means that when thread enters synchronized block it invalidates its local cache and when it goes out it dumps any variables that have been modified back to main memory. volatile variables has the same visibility semantics.
No. Even getters have to be synchronized, except when they access only final fields. The reason is, that, for example, when accessing a long value, there is a tiny change that another thread currently writes it, and you read it while just the first 4 bytes have been written while the other 4 bytes remain the old value.
Yes, that's correct. All methods that modify data or access data that may be modified by a different thread need to be synchronized on the same monitor.
The easy way is to mark the methods as synchronized. If these are long-running methods, you may want to only synchronize that parts that the the reading/writing. In this case you would definie the monitor, along with wait() and notify().
The simple answer is yes.
If an object of the class is going to be shared by multiple threads, you need to syncronize the getters and setters to prevent data inconsistency.
If all the threads would have seperate copy of object, then there is no need to syncronize the methods. If your instance methods are more than mere set and get, you must analyze the threat of threads waiting for a long running getter/setter to finish.
You could use synchronized methods, synchronized blocks, concurrency tools such as Semaphore or if you really want to get down and dirty you could use Atomic References. Other options include declaring member variables as volatile and using classes like AtomicInteger instead of Integer.
It all depends on the situation, but there are a wide range of concurrency tools available - these are just some of them.
Synchronization can result in hold-wait deadlock where two threads each have the lock of an object, and are trying to acquire the lock of the other thread's object.
Synchronization must also be global for a class, and an easy mistake to make is to forget to synchronize a method. When a thread holds the lock for an object, other threads can still access non synchronized methods of that object.