Related
Synchronization works by providing exclusive access to an object or method by putting a Synchronized keyword before a method name. What if I want to give higher precedence to one particular access if two or more accesses to a method occurs at the same time. Can we do that?
Or just may be I'm misunderstanding the concept of Synchronization in java. Please correct me.
I have other questions as well,
Under what requirements should we make method synchronized?
When to make method synchronized ? And when to make block synchronized ?
Also if we make a method synchronized will the class too be synchronized ? little confused here.
Please Help. Thanks.
No. Sadly Java synchronization and wait/notify appear to have been copied from the very poor example of Unix, rather than almost anywhere else where there would have been priority queues instead of thundering herds. When Per Brinch Hansen, author of monitors and Objective Pascal, saw Java, he commented 'clearly I have laboured in vain'.
There is a solution for almost everything you need in multi-threading and synchronization in the concurrent package, it however requires some thinking about what you do first. The synchronized, wait and notify constructs are like the most basic tools if you have just a very basic problem to solve, but realistically most advanced programs will (/should) never use those and instead rely on the tools available in the Concurrent package.
The way you think about threads is slightly wrong. There is no such thing as a more important thread, there is only a more important task. This is why Java clearly distinguishes between Threads, Runnables and Callables.
Synchronization is a concept to prevent more than one thread from entering a specific part of code, which is - again - the most basic concept of avoiding threading issues. Those issues happen if more than one thread accesses some data, where at least one of those multiple threads is trying to modify that data. Think about an array that is read by Thread A, while it is written by Thread B at the same time. Eventually Thread B will write the cell that Thread A is just about to read. Now as the order of execution of threads is undefined, it is as well undefined whether Thread A will read the old value, the new value or something messed up in between.
A synchronized "lock" around this access is a very brute way of ensuring that this will never happen, more sophisticated tools are available in the concurrent package like the CopyOnWriteArray, that seamlessly handles the above issue by creating a copy for the writing thread, so neither Thread A nor Thread B needs to wait. Other tools are available for other solutions and problems.
If you dig a bit into the available tools you soon learn that they are highly sophisticated, and the difficulties using them is usually located with the programmer and not with the tools, because countless hours of thinking, improving and testing has been gone into those.
Edit: to clarify a bit why the importance is on the task even though you set it on the thread:
Imagine a street with 3 lanes that narrows to 1 lane (synchronized block) and 5 cars (threads) are arriving. Let's further assume there is one person (the car scheduler) that has to define which cars get the first row and which ones get the other rows. As there is only 1 lane, he can at best assign 1 cars to the first row and the others need to come behind. If all cars look the same, he will most likely assign the order more or less randomly, while a car already in front might stay in front more likely, just because it would be to troublesome to move those cars around.
Now lets say one car has a sign on top "President of the USA inside", so the scheduler will most likely give that car priority in his decision. But even though the sign is on the car, the reason for his decision is not the importance of the car (thread), but the importance on the people inside (task). So the sign is nothing but an information for the scheduler, that this car transports more important people. Whether or not this is true however, the scheduler can't say (at least not without inspection), so he just has to trust the sign on the car.
Now if in another scenario all 5 cars have the "President inside" sign, the scheduler doesn't have any way to decide which one goes first, and he is in the same situation again as he was with all the cars having no sign at all.
Well in case of synchronized, the access is random if multiple threads are waiting for the lock. But in case you need first-come first-serve basis: Then you can probably use `ReentrantLock(fairness). This is what the api says:
The constructor for this class accepts an optional fairness parameter.
When set true, under contention, locks favor granting access to the
longest-waiting thread.
Else if you wish to give access based on some other factor, then I guess it shouldn;t be complicated to build one. Have a class that when call's lock gets blocked if some other thread is executing. When called unlock it will unblock a thread based on whatever algorithm you wish to.
There's no such thing as "priority" among synchronized methods/blocks or accesses to them. If some other thread is already holding the object's monitor (i.e. if another synchronized method or synchronized (this) {} block is in progress and hasn't relinquished the monitor by a call to this.wait()), all other threads will have to wait until it's done.
There are classes in the java.util.concurrent package that might be able to help you if used correctly, such as priority queues. Full guidance on how to use them correctly is probably beyond the scope of this question - you should probably read a decent tutorial to start with.
Say, I have a data object:
class ValueRef { double value; }
Where each data object is stored in a master collection:
Collection<ValueRef> masterList = ...;
I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):
class Job implements Runnable {
Collection<ValueRef> neededValues = ...;
void run() {
double sum = 0;
for (ValueRef x: neededValues) sum += x;
System.out.println(sum);
}
}
Use-case:
for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated
Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.
Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?
I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).
To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.
My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.
Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?
Thanks for your time.
Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?
It doesn't matter that threads of a thread pool have evaluated some jobs in the past.
Javadoc of Executor says:
Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.
So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.
What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."
The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.
So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.
I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.
Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.
As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.
If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.
Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.
That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.
The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).
I am not understanding this concept in any manner.
public class SomeName {
public static void main(String args[]) {
}
}
This is my class SomeName. Now what is thread here.
Do we call the class as a thread.
Do we call this class as thread when some other object is trying to access its method or members?
Do we call this class as thread when some other object is trying to access this object?
What does it mean when we call something in java as thread-safe ?
Being thread-safe means avoiding several problems. The most common and probably the worst is called threadlock. The old analogy is the story of the dining philosophers. They are very polite and will never reach out their chopsticks to take food when someone else is doing the same. If they all reach out at the same time, then they all stop at the same time, and wait...and nothing ever happens, because they're all too polite to go first.
As someone else pointed out, if your app never creates additional threads, but merely runs from a main method, then there is only one thread, or one "dining philosopher," so threadlock can't occur. When you have multiple threads, the simplest way to avoid threadlock is to use a "monitor", which is just an object that's set aside. In effect, your methods have to obtain a "lock" on this monitor before accessing threads, so there are no collisions. However, you can still have threadlock, because there might be two objects trying to access two different threads, each with its own monitor. Object A has to wait for Object B to release its lock on monitor object 1; Object B has to wait for Object A to release its lock on monitor object 2. So now you're back to threadlock.
In short, thread safety is not terribly difficult to understand, but it does take time, practice and experience. The first time you write a multi-threaded app, you will run into threadlock. Then you will learn, and it soon becomes pretty intuitive. The biggest caveat is that you need to keep the multi-threaded parts of an app as simple as possible. If you have lots of threads, with lots of monitors and locks, it becomes exponentially more difficult to ensure that your dining philosophers never freeze.
The Java tutorial goes over threading extremely well; it was the only resource I ever needed.
You might want to think of thread as CPU executing the code that you wrote.
What is thread?
A thread is a single sequential flow of control within a program.
From Java concurrency in practice:
Thread-safe classes encapsulate any needed synchronization so that
clients need not provide their own.
At any time you have "execution points" where the JVM is running your code stepping through methods and doing what your program tells it to do.
For simple programs you only have one. For more complex programs you can have several, usually invoked with a new Thread().run or an Executor.
"Thread-safe" refers to that your code is written in such a way that one execution point cannot change what another execution point sees. This is usually very desirable as these changes can be very hard to debug, but as you only have one, there is not another so this does not apply.
Threads is an advanced subject which you will come back to later, but for now just think that if you do not do anything special with Threads or Swing this will not apply to you. It will later, but not now.
Well, in your specific example, when your program runs, it has just 1 thread.
The main thread.
A class is thread safe when an object of that class can be accessed in parallel from multiple threads (and hence from multiple CPUs) without any of the guarantees that it would provide in a single threaded way to be broken.
You should read first about what exactly threads are, for instance on Wikipedia, which might make it then easier to understand the relation between classes and threads and the notion of threadsafety.
Every piece of code in Java is executed on some thread. By default, there is a "main" thread that calls your main method. All code in your program executes on the main thread unless you create another thread and start it. Threads start when you explicitly call the Thread.start() method; they can also start implicitly when you call an API that indirectly calls Thread.start(). (API calls that start a thread are generally documented to do so.) When Thread.start() is called, it creates a new thread of execution and calls the Thread object's run() method. The thread exits when its run() method returns.
There are other ways to affect threads, but that's the basics. You can read more details in the Java concurrency tutorial.
Under what circumstances would an unsynchronized collection, say an ArrayList, cause a problem? I can't think of any, can someone please give me an example where an ArrayList causes a problem and a Vector solves it? I wrote a program that have 2 threads both modifying an arraylist that has one element. One thread puts "bbb" into the arraylist while the other puts "aaa" into the arraylist. I don't really see an instance where the string is half modified, I am on the right track here?
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Many Thanks in advance.
There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.
The first scenario is that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:
public void add(T element) {
if (!haveSpace(size + 1)) {
expand(size + 1);
}
elements[size] = element;
// HERE
size++;
}
Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE and is preempted. The second thread comes along, and overwrites the slot in elements that the first thread just updated with its own element, and then increments size. When the first thread finally gets control, it updates size. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)
The second scenario arises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size attribute. However, since size is not volatile, the new value of size may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size() on the list and get a stale value of size. In the worst case, the second thread (calling get(int) for example) might see inconsistent values of size and the elements array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)
The third scenario arises when you synchronize operations on the ArrayList; e.g. by wrapping it as a SynchronizedList.
List list = Collections.synchronizedList(new ArrayList());
// Thread 1
List list2 = ...
for (Object element : list2) {
list.add(element);
}
// Thread 2
List list3 = ...
for (Object element : list) {
list3.add(element);
}
If thread2's list is an ArrayList or LinkedList and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequence of list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU).
Correct. If there is only one core available to run the application, obviously only one thread gets to run at a time. This makes some of the hazards impossible and others become much less likely likely to occur. However, it is possible for the OS to switch from one thread to another thread at any point in the code, and at any time.
If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Yup. That's possible. The probability of it happening is very small1 but that just makes this kind of problem more insidious.
1 - This is because thread time-slicing events are extremely infrequent, when measured on the timescale of hardware clock cycles.
A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?
static class ListTester implements Runnable {
private List<Integer> a;
public ListTester(List<Integer> a) {
this.a = a;
}
public void run() {
try {
for (int i = 0; i < 20; ++i) {
a.add(i);
Thread.sleep(10);
}
} catch (InterruptedException e) {
}
}
}
public static void main(String[] args) throws Exception {
ArrayList<Integer> a = new ArrayList<Integer>();
Thread t1 = new Thread(new ListTester(a));
Thread t2 = new Thread(new ListTester(a));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(a.size());
for (int i = 0; i < a.size(); ++i) {
System.out.println(i + " " + a.get(i));
}
}
edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukar asked. (should've done it right away, when posting answer)
This is the famous problem of incrementing integer from two different threads. There's a nice explanation in Sun's Java tutorial on concurrency. Only in that example they have --i and ++i and we have ++size twice. (++size is part of ArrayList#add implementation.)
I don't really see an instance where the string is half modified, I am on the right track here?
That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.
can someone please give me an example where an ArrayList causes a problem and a Vector solves it?
If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:
// broken, even though vector is "thread-safe"
if (vector.isEmpty())
vector.add(1);
The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have.
You need to synchronize in your application code (and then there is no need to use Vector).
synchronized(list){
if (list.isEmpty())
list.add(1);
}
The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.
The first part of youe query has been already answered. I will try to answer the second part :
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
in wait-notify framework, the thread aquiring the lock on the object releases it when waiting on some condition. A great example is the producer-consumer problem. See here: link text
When will it cause trouble?
Anytime that a thread is reading the ArrayList and the other one is writing, or when they are both writing. Here's a very known example.
Also, I remember that I was told that
multiple threads are not really
running simultaneously, 1 thread is
run for sometime and another thread
runs after that(on computers with a
single CPU). If that was correct, how
could two threads ever access the same
data at the same time? Maybe thread 1
will be stopped in the middle of
modifying something and thread 2 will
be started?
Yes, Single core cpus can execute only one instruction at a time (not really, pipelining has been here for a while, but as a professor once said, thats "free" parallelism). Even though, each process running in your computer is only executed for a period of time, then it goes to an idle state. In that moment, another process may start/continue its execution. And then go into an idle state or finish. Processes execution are interleaved.
With threads the same thing happens, only that they are contained inside a process. How they execute is dependant on the Operating System, but the concept remains the same. They change from active to idle constantly through their lifetime.
You cannot control when one thread will be stopped and other will start. Thread 1 will not wait until it has completely finished adding data. There is always possible to corrupt data.
What are the disadvantages of making a large Java non-static method synchronized? Large method in the sense it will take 1 to 2 mins to complete the execution.
If you synchronize the method and try to call it twice at the same time, one thread will have to wait two minutes.
This is not really a question of "disadvantages". Synchronization is either necessary or not, depending on what the method does.
If it is critical that the code runs only once at the same time, then you need synchronization.
If you want to run the code only once at the same time to preserve system resources, you may want to consider a counting Semaphore, which gives more flexibility (such as being able to configure the number of concurrent executions).
Another interesting aspect is that synchronization can only really be used to control access to resources within the same JVM. If you have more than one JVM and need to synchronize access to a shared file system or database, the synchronized keyword is not at all sufficient. You will need to get an external (global) lock for that.
If the method takes on the order of minutes to execute, then it may not need to be synchronized at such a coarse level, and it may be possible to use a more fine-grained system, perhaps by locking only the portion of a data structure that the method is operating on at the moment. Certainly, you should try to make sure that your critical section isn't really 2 minutes long - any method that takes that long to execute (regardless of the presence of other threads or locks) should be carefully studied as a candidate for parallelization. For a computation this time-consuming, you could be acquiring and releasing hundreds of locks and still have it be negligible. (Or, to put it another way, even if you need to introduce a lot of locks to parallelize this code, the overhead probably won't be significant.)
Since your method takes a huge amount of time to run, the relatively tiny amount of time it takes to acquire the synchronized lock should not be important.
A bigger problem could appear if your program is multithreaded (which I'm assuming it is, since you're making the method synchronized), and more than one thread needs to access that method, it could become a bottleneck. To prevent this, you might be able to rewrite the method so that it does not require synchronization, or use a synchronized block to reduce the size of the protected code (in general, the smaller the amount of code that is protected by the synchronize keyword, the better).
You can also look at the java.util.concurrent classes, as you may find a better solution there as well.
If the object is shared by multiple threads, if one thread tries to call the synchronized method on the object while another's call is in progress, it will be blocked for 1 to 2 minutes. In the worst case, you could end up with a bottleneck where the throughput of your system is dominated by executing these computations one at a time.
Whether this is a problem or not depends on the details of your application, but you probably should look at more fine-grained synchronization ... if that is practical.
In simple two lines Disadvantage of synchronized methods in Java :
Increase the waiting time of the thread
Create performance problem
First drawback is that threads that are blocked waiting to execute synchronize code can't be interrupted.Once they're blocked their stuck there, until they get the lock for the object the code is synchronizing on.
Second drawback is that the synchronized block must be within the same method in other words we can't start a synchronized block in one method and end the syncronized block in another for obvious reasons.
The third drawback is that we can't test to see if an object's intrinsic lock is available or find out any other information about the lock also if the lock isn't available we can't timeout after we waited lock for a while. When we reach the beginning of a synchronized block we can either get the lock and continue executing or block at that line of code until we get the lock.
The fourth drawback is that if multiple threads are awaiting to get lock, it's not first come first served. There isn't set order in which the JVM will choose the next thread that gets the lock, so the first thread that blocked could be the last thread to get the lock and vice Versa.
so instead of using synchronization we can prevent thread interference using classes that implement the java.util.concurrent locks.lock interface.
In simple two lines Disadvantage of synchronized methods in Java :
1. Increase the waiting time of the thread
2. Create a performance problem