Thread Safe - final local method variable passed on to threads?

Thread Safe - final local method variable passed on to threads? - java

Will the following code cause same problems, if variable 'commonSet' of this method was instead a class level field. If it was a class level field, I'll have to wrap adding to set operation within a synchronized block as HashSet is not thread safe. Should I do the same in following code, since multiple threads are adding on to the set or even the current thread may go on to mutate the set.
public void threadCreatorFunction(final String[] args) {
final Set<String> commonSet = new HashSet<String>();
final Runnable runnable = new Runnable() {
#Override
public void run() {
while (true) {
commonSet.add(newValue());
}
}
};
new Thread(runnable, "T_A").start();
new Thread(runnable, "T_B").start();
}
The reference to 'commonSet' is 'locked' by using final. But multiple threads operating on it can still corrupt the values in the set(it may contain duplicates?). Secondly, confusion is since 'commonSet' ia a method level variable - it's same reference will be on the stack memory of the calling method (threadCreatorFunction) and stack memory of run methods - is this correct?
There are quite a few questions related to this:
Why do variables passed to runnable need to be final?
Why are only final variables accessible in anonymous class?
But, I cannot see them stressing on thread safe part of such sharing/passing of mutables.

No, this is absolutely not thread-safe. Just because you've got it in a final variable, that means that both threads will see the same reference, which is fine - but it doesn't make the object any more thread-safe.
Either you need to synchronize access, or use ConcurrentSkipListSet.

An interesting example.
The reference commonSet is thread safe and immutable. It is on the stack for the first thread and a field of your anonymous Runnable class as well. (You can see this in a debugger)
The set commonSet refers to is mutable and not thread safe. You need to use synchronized, or a Lock to make it thread safe. (Or use a thread safe collection instead)

I think you're missing a word in your first sentence:
Will the following code cause same problems if variable 'commonSet' of this method was a ??? instead a class level field.
I think you're a little bit confused though. The concurrency issues have nothing to do with whether or not the reference to your mutable data structure is declared final. You need to declare the reference as final because you're closing over it inside the anonymous inner class declaration for your Runnable. If you're actually going to have multiple threads reading/writing the data structure then you need to either use locks (synchronize) or use a concurrent data structure like java.util.concurrent.ConcurrentHashMap.

The commonSet is shared among two Threads. You have declared it as final and thus you made the reference immutable (you can not re-assign it), but the actual data inside the Set is still mutable. Suppose that one Thread puts some data in and some other Thread reads some data out. Whenever the first thread puts data in, you most probably want to lock that Set so that no other Thread could read until that data is written. Does that happen with a HashSet? Not really.

As others have already commented, you are mistaking some concepts, like final and synchronized.
I think that if you explain what you want to accomplish with your code,it would be much easier to help you. I've got the impression that this code snippet is more an example that the actual code.
Some questions: Why is the set defined inside the function? should it be shared among threads? Something that puzzles me is that you crate two threads with the same instance of the runnable
new Thread(runnable, "T_A").start();
new Thread(runnable, "T_B").start();

Whether commonset is used by single thread or multiple it is only the reference that is immutable for final objects(i.e, once assigned you cannot assign another obj reference again) however you can still modify the contents referenced by this object using that reference.
If it were not final one thread could have initialized it again and changed the reference
commonSet = new HashSet<String>();
commonSet.add(newValue());
in which case these two threads may use two different commonsets which is probably not what you want

Related

What's the purpose of ThreadLocal here?

public class VPattern implements Pattern
{
private final TokenKey tokenKey_;
private final String tokenLabel_;
private Integer cachedHashCode_ = null;
private ThreadLocal<Token> token_ = new ThreadLocal<Token>();
...
}
I am reading this piece of code and don't understand the use of ThreadLocal here. Is that because ThreadLocal is used to ensure the 'token_' object will be thread safe in any concurrent situation? If that's the case, why TokenKey and Integer are not thread safety protected? I know that "String" is always thread safe.

Every thread gets its own Token even if they share the same instance of VPattern. Possibly this was done because Token is not thread safe and VPattern wants to avoid synchronizing access to the Token instance. tokenKey_ is final so don't have to worry about the field changing, and maybe it's thread safe on its own. tokenLabel_ is also final and strings are immutable so no issue there. cachedHashCode_ is the odd one out here; is access to it protected somehow? It's hard to say what's going on without seeing the rest of the class.

In general, ThreadLocal can provide different objects for each working thread. So if the given object is not thread-safe nor singleton, it can be stored in ThreadLocal variable. Then every thread may get and safely use different instance of your class. You can treat it like a map, where the current thread is a key and the actual object is a value.
Let's assume there are two threads working at the same time and sharing one VPattern object. If threads get tokenKey_ or tokenLabel_, then both will get the same instances. But if both threads call token.get(), then they will get different instances of Token type (if initialized previously, see: set method and withInitial static factory method).
Unfortunately it's hard to say what's the purpose of ThreadLocal in your case because it highly depends on the context. Seems that Token objects cannot be shared by different threads (each thread should have own token).
You can read more about ThreadLocal in javadoc or here: https://www.baeldung.com/java-threadlocal

Partially constructed object

Is this possible if an object is visible to other threads during its initialization (visible while doing initialization but not completed yet)? If yes then could you please give a simple example to backup your justification?

This can happen in a number of ways.
You pass your object to another thread in the constructor, e.g. you start a thread in your constructor.
You pass the object to another thread but the other thread sees old, uninitialized values because the fields are not final or volatile or accessed in a locked or synchronized block. Other fields are not guarenteed to be thread safe.

The best case in point would be the notoriously broken double-checked locking idiom. I'll extract from it only the part relevant for this argument. Take this code:
public class Holder { public static File f; }
Somewhere in Thread A you do Holder.f = new File("path"); and elsewhere in Thread B you do File xxf = Holder.f; and proceed to use it. There is no guarantee that, even if you read the reference to Holder.f, any field of the File instance will be in any defined state. You may read all nulls, (zeros, falses, depending on type), as well as any combination of non-null and null values.

ThreadLocal vs local variable in Runnable

Which one among ThreadLocal or a local variable in Runnable will be preferred? For performance reasons. I hope using a local variable will give more chances for cpu caching, etc.

Which one among ThreadLocal or a local variable in Runnable will be preferred.
If you have a variable that is declared inside the thread's class (or the Runnable) then a local variable will work and you don't need the ThreadLocal.
new Thread(new Runnable() {
// no need to make this a thread local because each thread already
// has their own copy of it
private SimpleDateFormat format = new SimpleDateFormat(...);
public void run() {
...
// this is allocated per thread so no thread-local
format.parse(...);
...
}
}).start();
On the other hand, ThreadLocals are used to save state on a per thread basis when you are executing common code. For example, the SimpleDateFormat is (unfortunately) not thread-safe so if you want to use it in code executed by multiple threads you would need to store one in a ThreadLocal so that each thread gets it's own version of the format.
private final ThreadLocal<SimpleDateFormat> localFormat =
new ThreadLocal<SimpleDateFormat>() {
#Override
protected SimpleDateFormat initialValue() {
return new SimpleDateFormat(...);
}
};
...
// if a number of threads run this common code
SimpleDateFormat format = localFormat.get();
// now we are using the per-thread format (but we should be using Joda Time :-)
format.parse(...);
An example of when this is necessary is a web request handler. The threads are allocated up in Jetty land (for example) in some sort of pool that is outside of our control. A web request comes in which matches your path so Jetty calls your handler. You need to have a SimpleDateFormat object but because of its limitations, you have to create one per thread. That's when you need a ThreadLocal. When you are writing reentrant code that may be called by multiple threads and you want to store something per-thread.
Instead, if you want pass in arguments to your Runnable then you should create your own class and then you can access the constructor and pass in arguments.
new Thread(new MyRunnable("some important string")).start();
...
private static class MyRunnable implements {
private final String someImportantString;
public MyRunnable(String someImportantString) {
this.someImportantString = someImportantString;
}
// run by the thread
public void run() {
// use the someImportantString string here
...
}
}

Whenever your program could correctly use either of the two (ThreadLocal or local variable), choose the local variable: it will be more performant.
ThreadLocal is for storing per-thread state past the execution scope of a method. Obviously local variables can't persist past the scope of their declaration. If you needed them to, that's when you might start using a ThreadLocal.
Another option is using synchronized to manage access to a shared member variable. This is a complicated topic and I won't bother to go into it here as it's been explained and documented by more articulate people than me in other places. Obviously this is not a variant of "local" storage -- you'd be sharing access to a single resource in a thread-safe way.

I was also confused why i need ThreadLocal when i can just use local variables, since they both maintain their state inside a thread. But after a lot of searching and experimenting i see why is ThreadLocal needed.
I found two uses so far -
Saving thread specific values inside the same shared object
Alternative to passing variables as parameters through N-layers of code
1:
If you have two threads operating on the same object and both threads modify this object - then both threads keep losing their modifications to each other.
To make this object have two separate states for each thread, we declare this object or part of it ThreadLocal.
Of course, ThreadLocal is only beneficial here because both threads are sharing the same object. If they are using different objects, there's no need for the objects to be ThreadLocal.
2:
The second benefit of ThreadLocal, seems to be a side effect of how its implemented.
A ThreadLocal variable can be .set() by a thread, and then be .get() anywhere else. .get() will retrieve the same value that this thread had set anywhere else. We'll need a globally available wrapper to do a .get() and .set(), to actually write down the code.
When we do a threadLocalVar.set() - its as if its put inside some global "map", where this current thread is the key.
As if -
someGlobalMap.put(Thread.currentThread(),threadLocalVar);
So ten layers down, when we do threadLocalVar.get() - we get the value that this thread had set ten layers up.
threadLocalVar = someGlobalMap.get(Thread.currentThread());
So the function at tenth level does not have to lug around this variable as parameter, and can access it with a .get() without worrying about if it is from the right thread.
Lastly, since a ThreadLocal variable is a copy to each thread, of course, it does not need synchronization. I misunderstood ThreadLocal earlier as an alternative to synchronization, that it is not. It is just a side effect of it, that we dont need to synchronize the activity of this variable now.
Hope this has helped.

This question is answered by the simple rule that a variable should be declared in the smallest possible enclosing scope. A ThreadLocal is the largest possible enclosing scope so you should only use it for data that is needed across many lexical scopes. If it can be a local variable, it should be.

When to use Volatile modifier? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When exactly do you use the volatile keyword in Java?
When and why volatile modifier is required in java?
I am interested in seeing a real world usage of a volatile modified primitive or object reference.

volatile modifier will tell the JVM to be cautious of the threads which runs concurrently. Essentially, volatile is used to indicate that a variable's value will be modified by different threads.
Declaring a volatile Java variable means:
The value of this variable will never be cached thread-locally: all reads and writes will go straight to "main memory"
Access to the variable acts as though it is enclosed in a synchronized block, synchronized on itself.
We say "acts as though" in the second point, because to the programmer at least (and probably in most JVM implementations) there is no actual lock object involved.
The volatile modifier tells the compiler that the variable modified by volatile can be changed unexpectedly by other parts of your program. One of these situations involves multithreaded programs.
In a multithreaded program, sometimes, two or more threads share the same instance variable. For efficiency considerations, each thread can keep its own, private copy of such a shared variable.
The real (or master) copy of the variable is updated at various times, such as when a synchronized method is entered. While this approach works fine, it may be inefficient at times. In some cases, all that really matters is that the master copy of a variable always reflects its current state.
To ensure this, simply specify the variable as volatile, which tells the compiler that it must always use the master copy of a volatile variable (or, at least, always keep any private copies up to date with the master copy, and vice versa). Also, accesses to the master variable must be executed in the precise order in which they are executed on any private copy.

If you are working with the multi-threaded programming, the volatile keyword will be more useful. When multiple
threads using the same variable, each thread will have its own copy of the local cache for that variable. So, when it’s
updating the value, it is actually updated in the local cache not in the main variable memory. The other thread which
is using the same variable doesn’t know anything about the values changed by the another thread. To avoid this problem,
if you declare a variable as volatile, then it will not be stored in the local cache. Whenever thread are updating the
values, it is updated to the main memory. So, other threads can access the updated value.
Declaring a variable volatile means
There will be no cache maintained means all the changes made in main memory.
Access to this variable acts as synchronized block, even though it is in synchronized unit.
Example -
public class Snippet implements Runnable{
volatile int num =0;
public void run(){
Thread t = Thread.currentThread();
String name = t.getName();
if(name.equals("Thread1")){
num=10;
}
else{
System.out.println("value of num is :"+num);
}
}
public static void main(String args[]) throws InterruptedException{
Runnable r = new Snippet();
Thread t1 = new Thread(r);
t1.setName("Thread1");
t1.start();
Thread.sleep(1000);
Thread t2 = new Thread(r);
t2.setName("Thread2");
t2.start();
}
}

(This answer assumes Java 5+ -- before that, volatile had weaker guarantees.)
It's useful when you want to ensure a memory barrier, aka a formal "happens-before" relationship, between a write to a field and a subsequent read to that field by a separate thread. Synchronization can also give you that relationship, as well as other multithreading guarantees, but it's a tad slower and can create synchronization bottlenecks.
One use case is in concurrent collection classes (like ConcurrentHashMap, or LinkedBlockingQueue) where, in conjunction with things like atomic compare-and-set (CAS) operations, you can write correct thread-safe code without having to use synchronized.

You got good answers for the first question. The second one:
Can any one give me real time scenario of it
IMO, you should never ever you volatile. There are better tools for multithreaded apps. It's a bit bizarre that such a high level language has this keyword. Here is a good read (It's about C#, but Java is similar in this matter).

Java application stuck when using synchronized keyword

I have a class that starts a few threads. Each thread (extends Thread) calls a new instance of class WH, class WH has a variable that is to be shared among all threads. So the hierarchy looks like:
class S extends Thread {
....
....
WH n = new WH(args);
....
....
}
Now class WH has a variable that is to be shared, declared as:
private static volatile Integer size;
One of the functions tries to access size through Synchronized:
Synchronized (size) { // Program gets stuck at this line
... stuff ...
}
It gets stuck even if I spawn off just one thread. Any idea why this is happening? (FYI- I do not want to use AtomicInteger based on my design choices)
Thanks

Your problem is that locking on a Non-Final variable reference has useless semantics.
Anytime you see something doing a synchronized(var); and var is an instance or static variable and isn't marked final, it is an error, because anything can come along and do a var = new Thing(); and now there are at least 2 threads that can operate on that block at the same time, this is a logical error no exceptions. Every Java lint style checker flags this as a critical error, just because the compiler doesn't catch this doesn't mean it has any usefulness in any case.
In this case, you are exposing these useless semantics by changing the value of the immutable Integer class.
Your Integer variable size is non-Final and is Immutable which means every time you change it you must change the reference to the new object that represents the new value and every thread will get a new and different reference to lock onto. Thus no locking.
Use a private static final AtomicInteger size = new AtomicInteger();
And then you can synchronize(size); since size is now final you can mutate it in place and get the intended and correct semantics.
or you can synchronize(some_other_final_reference); and use a regular int as long as that reference that is synchronized on is final and can be in scope of any thread that needs to acquire a handle to it, it will work.
Personally I would use the AtomicInteger it is more cohesive that way, you are locking on what you don't want changing by any other thread, self-documenting and clear intentions.

I cannot use AtomicInteger since I need to get the value of size,
check a condition on it, and increment or not based on the condition.
So I have to do a get then possibly an increment on it. I still need
locking in that case.
I believe what you are describing is something that AtomicInteger can definitely do, without locking, via the compareAndSet() method, no? Though the only supported test is equality, so maybe that won't work for you.
Also, if you are planning on synchronizing on the variable, then there is no need to also make it volatile.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.