do I need a lock? - java

MyObject myObj ...
public void updateObj(){
MyObject newObj = getNewMyObject();
myObj = newObj;
}
public int getSomething(){
//O(n^2) operation performed in getSomething method
int something = myObj.getSomething();
return something;
}
Suppose the main thread periodically calls updateObj() and a child thread calling getSomething() method pretty often.
Do I need a lock (or declare methods as synchronized) before myObj = newObj; and int something = myObj.getSomething();
Someone argued I don't need a lock here because in Java, assignment operations (e.g myObj = newObj;) is atomic. But what I don't get it is that myObj.getSomething(); this is not an atomic operation but its O(n^2) so I think a locking is still needed. Is this correct?

You need to declare myObject volatile, otherwise getSomething() method may not see updated object. Other than that I cannot see any synchronization issues in the above code

Yes, you must synchronize your access to the shared variable properly. According to the Java Memory Model, your code doesn't guarantee happens-before relationship between the read and the write in all possible executions.
Proper synchronization doesn't always mean using locks; In your case, declaring myObj as volatile can do the job (assuming that you don't mutate it after construction).

The write to a reference variable is indeed atomic.
But without the lock (or some other kind of synchronization) the other threads are not guaranteed to see the updated value of myObj to the extreme that they will see only its initial value (null, if you did not assign it in the constructor).
You will be hard-pressed to write a program in such a way that it exhibits exactly this extreme behavior, but without synchronization you will definitely get inconsistent results when a thread calls updateObj and than some other threads call getSomething and use the obsolete myObj instance.
The simplest way to ensure that getSomething uses the latest myObj is to declare the myObj as volatile.

yes you need some locking over here. because I can see is both different function is called by different threads and you are using myObj in getSomething() method.
Now assume when your child thread is executing getSomething() and at the same time your main thread have change myObj then your program will be victim of the race condition and you may not get the desired output.
but though it depends on the overall program you write. hope this helps

The updateObj() needs to be synchronized so that the shared mutable myObj is written to atomically.
MyObjects.getSomething() needs to be a synchronized method if it uses shared mutable values.

Related

Is the following code threadsafe?

I see the following example in Java Concurrency in Practice book, and it is mentioned that the class is threadsafe, and there is no information given about Person class. If the person class is mutable, then after adding a Person Object, it can be modified, say value that is used in equals method may be modified by another thread, in that case the following code will not be threadsafe. Is that correct statement?
#ThreadSafe
public class PersonSet {
#GuardedBy("this")
private final Set<Person> mySet = new HashSet<Person>();
public synchronized void addPerson(Person p) {
mySet.add(p);
}
public synchronized boolean containsPerson(Person p) {
return mySet.contains(p);
}
}
Yes it is thread safe. Thread safe in the sense that only one thread at a time can do read on set via containsPerson method or may be able to add set via addPerson method.
This class is thread safe because it has one Object state i.e. Set it self. So it protects it's state by allowing only one thread to work upon it.
However it doesn't guarantee that Person can't be modified by multiple thread. If you want to achieve the same you can either create Person as immutable object or you make it thread safe as well i.e. allow only one thread to modify it's state.
Your statement is correct: if the Person class is mutable, and an update is done on a field that contributes to hashCode and equals, then the PersonSet will have a problem - no matter in which thread.
The no duplicate Set contract will be broken, silently...
here object lock(this) is being used . the object which was used to access methods of personSet class which is of personSet type ,No two threads can access any of methods of personSet at one time. In locking there is no dependency of Person class. So personSet class is thread safe.
The thread safety in question here only pertains to the set, not the contents of the set. There are no allowances or checks made on a retrieval (i.e. get) operation on the set, so that portion is thread unsafe. Since that's not exposed, then the operations of adding and checking to see if an element is contained in the set are indeed thread-safe.
Note that while these operations are indeed safe for threads, they're also not concurrent. Since synchronized will block multiple threads from interacting with the method at once, these methods will become a bottleneck if used in a highly concurrent environment.

How to specify an object to be locked in java using Lock

Using the synchronized(intrinsic locking) keyword for locking, we could do something like:
public void addSum(int a) {
synchronized(q) {
q.add(a); // q is say a Queue
}
}
In the above code say when an object tries to call the addSum() method i.e. x.addSum(10), the lock will be held on 'q' and not x. So using synchronization we could lock an object which is other than the actual calling object(Class).
Below I'm using Lock from java concurrent package, is there a way to specify which object should the lock be on (i.e like in the above code snippet using synchronized it was specified that the lock/synchronization should be on 'q'). However below when I'm using Lock, I haven't specified on which object should the lock be on. Can it be done?
public void addSum(int a) {
lock.tryLock();
q.add(a);
lock.unlock();
}
I did refer - http://docs.oracle.com/javase/tutorial/essential/concurrency/newlocks.html . However was looking for much smaller example to clear my concept.
No, Lock objects don't work the same way as synchronized. synchronized cannot start within a method invocation and reach outside that method invocation. The pattern you've shown
lock.tryLock();
q.add(a);
lock.unlock();
would only be possible if the opposite were true. Lock objects typically work by flipping on/off a switch/flag atomically, indicating they've acquired or released the lock.
I think you misunderstand what the word "lock" means. Suppose this method is called:
void foobar() {
synchronized(x) {
y.doSomething();
}
}
We say that x is "locked" while the thread is in the y.doSomething() call, but that does not prevent other threads from accessing fields or updating fields of x. The synchronized keyword means one thing, and one thing only.
The JVM will not allow two threads to synchronize on the same object at the same time.
That's all it means. How you use it is up to you. My example is using it to prevent y.doSomething() from being called in more than one thread at the same time, but it only works if every call to y.doSomething() is protected in the same way, and it only works if x always refers to the same object.
The java.util.concurrent.ReentrantLock class works much the same way. The only guarantee that the JVM makes is that no two threads can "lock" the same ReentrantLock object at the same time. That's all it does. The rest is up to you.
P.S., Your second example does not test the value returned by lock.tryLock(). That's a mistake. If lock.tryLock() returns false, that means it failed to lock the lock.

how to locally synchronize two maps?

For instance,
class Test{
static Map a = new ...
static Map b = new ...
public void process(){
...
a.put(...);
b.put(...);
}
}
Do I have to lock like this:
synchronized(a){
a.put();
}
synchronized(b){
b.put(b);
}
This seems to be awkward. Any other right way to do this? Thanks.
No, you need both operations in one synchronized block, otherwise another thread may see inconsistencies between the two maps.
One possible option would be using a synchronized method, or you could use some other private object or one of the maps as an monitor. Here is the synchronized method example:
static Map a = new ...
static Map b = new ...
public synchronized void process(){
...
a.put(...);
b.put(...);
}
}
You can use a dedicated object like
Object mapLock = new Object();
to synchronize on.
Or you can sync on a keeping in mind that even if you need an access to b you need to sync on a.
Synchronizing on this is not a good idea in general. I mean this is a bad habit and doing so may eventually result in bad performance or non-obvious deadlocks if not in this but other applications you make.
Avoid synchronized(this) in Java?
You can also consider using ReadWriteLock from concurrency package.
You do need to run both operations within one synchronized block. Worth noting that in your example, you've defined the maps statically while the process() method is an instance method. The synchronizing the method will mean that calls to that instance will be synchronized, but that calls to 2 different instances will not (as the lock used when applying the synchronized keyword to a method is effectively this). You could either make the process() method static, or use a synchronized(Test.class) {} block instead to ensure that there's no racing happening.
You will also need to be careful about how you expose the maps to clients - if you're offering them up as properties, then I would probably wrap them with Collections.unmodifiableMap() to ensure that nothing else can go and screw with them while you're not looking - however that doesn't entirely protect against clients having an "odd" time as they will still see changes happen in potentially unsafe ways. As such, I'd also probably declare the types as ConcurrentHashMap to make things a little safer (although there are still some dangerous operations such as sharing an Iterator between threads)

What should I use as a lock object of a synchronized statement in Java

Could anyone explain what is the difference between these examples?
Example # 1.
public class Main {
private Object lock = new Object();
private MyClass myClass = new MyClass();
public void testMethod() {
// TODO Auto-generated method stub
synchronized (myClass) {
// TODO: modify myClass variable
}
}
}
Example # 2.
package com.test;
public class Main {
private MyClass myClass = new MyClass();
private Object lock = new Object();
public void testMethod() {
// TODO Auto-generated method stub
synchronized (lock) {
// TODO: modify myClass variable
}
}
}
What should I use as a monitor lock if I need to take care about synchronization when modifying the variable?
Assuming that Main is not intended to be a "leaky abstraction", here is minimal difference between the first and second examples.
It may be better to use an Object rather than some other class because an Object instance has no fields and is therefore smaller. And the Object-as-lock idiom makes it clear that the lock variable is intended to only ever used as a lock.
Having said that, there is a definite advantage in locking on an object that nothing else will ever see. The problem with a Main method synchronizing on a Main (e.g. this) is that other unrelated code could also be synchronizing on it for an unrelated purpose. By synchronizing on dedicated (private) lock object you avoid that possibility.
In response to the comment:
There is a MAJOR difference in the two cases. In the first you're locking the object that you want to manipulate. In the second you're locking some other object that has no obvious relationship to the object being manipulated. And the second case takes more space, since you must allocate the (otherwise unused) Object, rather than using the already-existing instance you're protecting.
I think you are making an INCORRECT assumption - that MyClass is the data structure that needs protecting. In fact, the Question doesn't say that. Indeed the way that the example is written implies that the lock is intended to protect the entire Main class ... not just a part of its state. And in that context, there IS an obvious connection ...
The only case where it would be better to lock the MyClass would be if the Main was a leaky abstraction that allowed other code to get hold of its myClass reference. That would be bad design, especially in a multi-threaded app.
Based on the revision history, I'm pretty sure that is not the OP's intention.
The statement synchronization is useful when changing variables of an object.
You are changing variables of myClass so you want to lock on myClass object. If you were to change something in lock then you want to lock on lock object.
In example #2 you are modifying myClass but locking on lock object which is nonsense.
In first case you lock on object that it known only within this method, so it is unlikely that anybody else will use the same object to lock on, so such lock is almost useless. Second variant makes much more sense for me.
At the same time, myClass variable is also known only within this method, so it is unlikely that other thread will access it, so probably lock is not necessary here at all. Need more complete example to say more.
In general, you want to lock on the "root" object of the data you're manipulating. If you're, eg, going to subtract a value from a field in object A and add that value to object B, you need to lock some object that is somehow common (at least by convention) between A and B, possibly the "owner" object of the two. This is because you're doing the lock to maintain a "contract" of consistency between separate pieces of data -- the object locked must be common to and conceptually encompassing of the entire set of data that must be kept consistent.
The simple case, of course, is when you're modifying field A and field B in the same object, in which case locking that object is the obvious choice.
A little less obvious is when you're dealing with static data belonging to a single class. In that case you generally want to lock the class.
A separate "monitor" object -- created only to serve as a lockable entity -- is rarely needed in Java, but might apply to, say, elements of two parallel arrays, where you want to maintain consistency between element N of the two arrays. In that case, something like a 3rd array of monitor objects might be appropriate.
(Note that this is all just a "quick hack" at laying out some rules. There are many subtleties that one can run into, especially when attempting to allow the maximum of concurrent access to heavily-accessed data. But such cases are rare outside of high-performance computing.)
Whatever you choose, it's critical that the choice be consistent across all references to the protected data. You don't want to lock object A in one case and object B in another, when referencing/modifying the same data. (And PLEASE don't fall into the trap of thinking you can lock an arbitrary instance of Class A and that will somehow serve to lock another instance of Class A. That's a classical beginner's mistake.)
In your above example you'd generally want to lock the created object, assuming the consistency you're assuring is all internal to that object. But note that in this particular example, unless the constructor for MyClass somehow lets the object address "escape", there is no need to lock at all, since there is no way that another thread can get the address of the new object.
The difference are the class of the lock and its scope
- Both topics are pretty much orthogonal with synchronization
objects with different classes may have different sizes
objects in different scopes may be available in different contexts
Basically both will behave the same in relation to synchronization
Both examples are not good syncronisation practise.
The lock Object should be placed in MyClass as private field.

Is reference update thread safe?

public class Test{
private MyObj myobj = new MyObj(); //it is not volatile
public class Updater extends Thred{
myobje = getNewObjFromDb() ; //not am setting new object
}
public MyObj getData(){
//getting stale date is fine for
return myobj;
}
}
Updated regularly updates myobj
Other classes fetch data using getData
IS this code thread safe without using volatile keyword?
I think yes. Can someone confirm?
No, this is not thread safe. (What makes you think it is?)
If you are updating a variable in one thread and reading it from another, you must establish a happens-before relationship between the write and the subsequent read.
In short, this basically means making both the read and write synchronized (on the same monitor), or making the reference volatile.
Without that, there are no guarantees that the reading thread will see the update - and it wouldn't even be as simple as "well, it would either see the old value or the new value". Your reader threads could see some very odd behaviour with the data corruption that would ensue. Look at how lack of synchronization can cause infinite loops, for example (the comments to that article, especially Brian Goetz', are well worth reading):
The moral of the story: whenever mutable data is shared across threads, if you don’t use synchronization properly (which means using a common lock to guard every access to the shared variables, read or write), your program is broken, and broken in ways you probably can’t even enumerate.
No, it isn't.
Without volatile, calling getData() from a different thread may return a stale cached value.
volatile forces assignments from one thread to be visible on all other threads immediately.
Note that if the object itself is not immutable, you are likely to have other problems.
You may get a stale reference. You may not get an invalid reference.
The reference you get is the value of the reference to an object that the variable points to or pointed to or will point to.
Note that there are no guarantees how much stale the reference may be, but it's still a reference to some object and that object still exists. In other words, writing a reference is atomic (nothing can happen during the write) but not synchronized (it is subject to instruction reordering, thread-local cache et al.).
If you declare the reference as volatile, you create a synchronization point around the variable. Simply speaking, that means that all cache of the accessing thread is flushed (writes are written and reads are forgotten).
The only types that don't get atomic reads/writes are long and double because they are larger than 32-bits on 32-bit machines.
If MyObj is immutable (all fields are final), you don't need volatile.
The big problem with this sort of code is the lazy initialization. Without volatile or synchronized keywords, you could assign a new value to myobj that had not been fully initialized. The Java memory model allows for part of an object construction to be executed after the object constructor has returned. This re-ordering of memory operations is why the memory-barrier is so critical in multi-threaded situations.
Without a memory-barrier limitation, there is no happens-before guarantee so you do not know if the MyObj has been fully constructed. This means that another thread could be using a partially initialized object with unexpected results.
Here are some more details around constructor synchronization:
Constructor synchronization in Java
Volatile would work for boolean variables but not for references. Myobj seems to perform like a cached object it could work with an AtomicReference. Since your code extracts the value from the DB I'll let the code stay as is and add the AtomicReference to it.
import java.util.concurrent.atomic.AtomicReference;
public class AtomicReferenceTest {
private AtomicReference<MyObj> myobj = new AtomicReference<MyObj>();
public class Updater extends Thread {
public void run() {
MyObj newMyobj = getNewObjFromDb();
updateMyObj(newMyobj);
}
public void updateMyObj(MyObj newMyobj) {
myobj.compareAndSet(myobj.get(), newMyobj);
}
}
public MyObj getData() {
return myobj.get();
}
}
class MyObj {
}

Categories