I have been reading the source code of PriorityBlockingQueue in Java and I was wondering :
why is the tryGrow() method releasing the lock acquired during the offer() method, just to do its thing non-blocking, and then block again when ready to replace the contents of the queue ? i mean, it could have just kept the lock it had...
how come this works? growing the queue, which involves an array copy, does not cause misbehavior on concurrent adds, where the additional adds can totally come when the current add is increasing the size of the queue?
Because the memory allocation can be comparatively slow and can be done while the array is unlocked.
By releasing the lock it is allowing other threads to continue functioning while it is allocating the (potentially large) new array.
As this process can be done without locks it is good practice to do so. You should only hold a lock for the minimum length of time that you have to.
Sufficient checks are made to ensure no other thread is doing this at the same time.
UNSAFE.compareAndSwapInt(this, allocationSpinLockOffset, 0, 1)
will allow only one thread into this section of code at a time.
Note the
lock.lock();
if (newArray != null && queue == array) {
This grabs the lock again and then confirms that the array it is about to replace is the same one it grabbed a copy of at the start. If it has been replaced meanwhile then it just abandons the one it has just created on the assumption that some other thread has grown the array.
If it is still the same it then copies the old data into the new bigger array and plants it back into the field.
As Kamil nicely explains.
Purpose of that unlock is only to be sure that the faster thread will grow the queue, so we will not waste time while locking the "better ones".
Related
My main thread has a private LinkedList which contains task objects for the players in my game. I then have a separate thread that runs every hour that accesses and clears that LinkedList and runs my algorithm which randomly adds new uncompleted tasks to every players LinkedList. Right now I made a getter method that is synchronized so that I dont run into any concurrency issues. This works fine but the synchronized keyword has a lot of overhead especially since its accessed a ton from the main thread while only accessed hourly from my second thread.
I am wondering if there is a way to prioritize the main thread? For example on that 2nd thread I could loop through the players then make a new LinkedList then run my algorithm and add all the tasks to that LinkedList then quickly assign the old LinkedList equal to the new one. This would slightly increase memory usage on the stack while improving main thread speed.
Basically I am trying to avoid making my main thread use synchronization when it will only be used once an hour at most and I am willing to greatly degrade the performance of the 2nd thread to keep the main threads speed. Is there a way I can use the 2nd thread to notify the 1st that it will be locking a method instead of having the 1st thread physically have to go through all of the synchronization over head steps? I feel like this would be possible since if that 2nd thread shares a cache with the main thread and it could change a boolean denoting that the main thread has to wait till that variable is changed back. The main thread would have to check that boolean every time it tries run that method and if the 2nd thread is telling it to wait the main thread will then freeze till the boolean is changed.
Of course the 2nd thread would have to specify which object and method has the lock along with a binary 0 or 1 denoting if its locked or not. Then the main thread would just need to check its shared cache for the object and the binary boolean value once it reaches that method which seems way faster than normal synchronization. Anyways this would then result in them main thread running at normal speed while the 2nd thread handles a bunch of work behind the scenes without degrading main thread performance. Does this exist if so how can I do it and if it does not exist how hard would it actually be to implement?
Premature optimization
It sounds like you are overly worried about the cost of synchronization. Doing a dozen, or a hundred, or even a thousand synchronizations once an hour is not going to impact the performance of your app by any significant amount.
If your concern has not yet been validated by careful study with a profiling tool, you’ve fallen into the common trap of premature optimization.
AtomicReference
Nevertheless, I can suggest an alternative approach.
You want to replace a list once an hour. If you do not mind letting any threads continue using the current list already accessed while you swap out for a new list, then use AtomicReference. An object of this class holds the reference to another object of a specified type.
I generally like the Atomic… classes for thread-safety work because they scream out to the reader that a concurrency problem is at hand.
AtomicReference < List < Task > > listRef = new AtomicReference<>( originalList ) ;
A different thread is able to replace that reference to the old list with a reference to the new list.
listRef.set( newList ) ;
Access by the other thread:
List< Task > list = listRef.get() ;
Note that this approach does not make thread-safe the payload, the list itself. But you claim that only a single thread will ever be manipulating the content of the list. You claim a different thread will only replace the entire list. So this AtomicReference serves the purpose of replacing the list in a thread-safe manner while making the issue of concurrency quite obvious.
volatile
Using AtomicReference accomplishes the same goal as volatile. I’m wary of volatile because (a) its use may go unnoticed by the reader, and (b) I suspect many Java programmers do not understand volatile, especially since its meaning was redefined.
For more info about why plain reference assignment is not thread-safe, see this Question.
Q1) I created a linked list based implementation of DB Connection Pool. Threads which needs connection poll() it from list and threads release connections using add() or addFirst(). During testing I noticed that even though one thread has locked the list using synchronized(ll) {some code here} , other threads were able to poll() out the connections from the list. This test makes me conclude that only the block of code within {} will be guaranteed to be executed by 1 thread at a time but the object itself i.e. ll , will not get locked and other threads can still write on it. Is that correct ?? then what's the use of putting ll as the monitor ?? I could as well use synchronized(this)..
Q2) If I create the Linked list as thread safe using Collections.synchronizedList() during creation of the List, then can I get rid of the synchronized blocks. Assume I have 2 seperate methods for obtaining connection and releasing connection. Currently both methods uses synchronized blocks to obtain/release connections.
Q3) If I decide to use a non blocking list such as ConcurrentLinkedQueue (we have JDK 1.5) will that help in our case. Our peak connection usage is 30 but we have not imposed any limits in our code, so connections can grow infinitely. We were planning to write a Timer Task which will run in the night and close some connections from the head of the list(old connections), but for executing business logic we would prefer to use connections from tail of the queue since those are latest released connections so that chances of having a non-stale connections are high. But since its a fifo queue so I cannot poll() data from tail of queue, so I am forced to use maybe-stale connections from head of queue. So basically what I need is stack like feature for executing business logic , but queue like feature for implementing timer task. Any data structure which you can suggest.
OK, you have some list:
LinkedList ll = ...;
Synchronizing on ll does not prevent other threads from accessing the list or modifying the list:
synchronized(ll) {
...protected code...
// This does NOT prevent other threads from examining or updating
// ll while the protected code runs.
}
The only thing that a synchronized block prevents is, it prevents other threads from synchronizing on the same object at the same time.
The reason for synchronizing some data structure like a linked list is to prevent other threads from seeing the structure in an invalid state when one thread must create that temporary invalid state in order to do its work.
In order to make it work, the block of code that creates the temporary bad state must be synchronized, and every block of code that must be prevented from seeing the bad state must also be synchronized on the same object.
This is useless because no other thread will ever be allowed to synchronize on ll:
public void run() {
synchronized(ll) {
while(true) {
...
}
}
}
You generally want to keep synchronized blocks as short as possible: You want the code to get in and get out quickly to minimize the amount of time that other threads will spend waiting for their turn.
Synchronization doesn't lock objects. It locks pieces of code that are synchronizing on the same object. If other pieces of code are using the same object without synchronizing on it, they will proceed.
I have a java class which receives inputs from the outside (i.e., many threads which run concurrently), and then stores inputs into two circular buffers. These buffers work together to carry out the same job and differ only by their level of priority. That is, the buffers are named "primary" and "secondary": when an input arrives, the primary buffer is checked first, and in case it is full the secondary buffer is checked. Should even the secondary buffer be full, the input waits for a slot in one of the buffers to be available. I thought I could manage the concurrency by first locking access on the primary buffer, and requesting lock for the secondary buffer only if necessary and while still holding the previous lock.
I don't know why but something sounds strange to me. Is holding two locks at the same time a good/safe pratice as long as it doesn't lead to deadlocks or heavy starvation scenarios?
Thank you for your attention.
If you cannot guarantee that the two locks are always acquired in the same order, this risks deadlock.
If you can guarantee that the outer lock will always be held when entering the inner lock, the inner lock is redundant. (*)
If you can guarantee that if code holds the inner lock, no other code will ever request the outer lock, then this does not incur deadlock. However, this is fragile, because now you are reasoning about code that is usually pretty far away. Unless there is an outer-outer lock that guarantees that the outer lock isn't acquired concurrently, but then you are back to one of the two above situations.
So yes you should avoid nested locks: Either you get deadlock, or it is useless, or you have a fragile system.
(*) There is an exception here: If the outer-locked code region starts threads which coordinate using the inner lock, this does not apply.
This kind of code is pretty rare, and so different from the nested-lock scenarios you see in practice, that I am even reluctant to call this a nested lock even though it is, technically.
Say, I have a data object:
class ValueRef { double value; }
Where each data object is stored in a master collection:
Collection<ValueRef> masterList = ...;
I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):
class Job implements Runnable {
Collection<ValueRef> neededValues = ...;
void run() {
double sum = 0;
for (ValueRef x: neededValues) sum += x;
System.out.println(sum);
}
}
Use-case:
for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated
Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.
Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?
I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).
To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.
My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.
Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?
Thanks for your time.
Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?
It doesn't matter that threads of a thread pool have evaluated some jobs in the past.
Javadoc of Executor says:
Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.
So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.
What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."
The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.
So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.
I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.
Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.
As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.
If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.
Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.
That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.
The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).
Under what circumstances would an unsynchronized collection, say an ArrayList, cause a problem? I can't think of any, can someone please give me an example where an ArrayList causes a problem and a Vector solves it? I wrote a program that have 2 threads both modifying an arraylist that has one element. One thread puts "bbb" into the arraylist while the other puts "aaa" into the arraylist. I don't really see an instance where the string is half modified, I am on the right track here?
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Many Thanks in advance.
There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.
The first scenario is that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:
public void add(T element) {
if (!haveSpace(size + 1)) {
expand(size + 1);
}
elements[size] = element;
// HERE
size++;
}
Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE and is preempted. The second thread comes along, and overwrites the slot in elements that the first thread just updated with its own element, and then increments size. When the first thread finally gets control, it updates size. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)
The second scenario arises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size attribute. However, since size is not volatile, the new value of size may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size() on the list and get a stale value of size. In the worst case, the second thread (calling get(int) for example) might see inconsistent values of size and the elements array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)
The third scenario arises when you synchronize operations on the ArrayList; e.g. by wrapping it as a SynchronizedList.
List list = Collections.synchronizedList(new ArrayList());
// Thread 1
List list2 = ...
for (Object element : list2) {
list.add(element);
}
// Thread 2
List list3 = ...
for (Object element : list) {
list3.add(element);
}
If thread2's list is an ArrayList or LinkedList and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequence of list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU).
Correct. If there is only one core available to run the application, obviously only one thread gets to run at a time. This makes some of the hazards impossible and others become much less likely likely to occur. However, it is possible for the OS to switch from one thread to another thread at any point in the code, and at any time.
If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Yup. That's possible. The probability of it happening is very small1 but that just makes this kind of problem more insidious.
1 - This is because thread time-slicing events are extremely infrequent, when measured on the timescale of hardware clock cycles.
A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?
static class ListTester implements Runnable {
private List<Integer> a;
public ListTester(List<Integer> a) {
this.a = a;
}
public void run() {
try {
for (int i = 0; i < 20; ++i) {
a.add(i);
Thread.sleep(10);
}
} catch (InterruptedException e) {
}
}
}
public static void main(String[] args) throws Exception {
ArrayList<Integer> a = new ArrayList<Integer>();
Thread t1 = new Thread(new ListTester(a));
Thread t2 = new Thread(new ListTester(a));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(a.size());
for (int i = 0; i < a.size(); ++i) {
System.out.println(i + " " + a.get(i));
}
}
edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukar asked. (should've done it right away, when posting answer)
This is the famous problem of incrementing integer from two different threads. There's a nice explanation in Sun's Java tutorial on concurrency. Only in that example they have --i and ++i and we have ++size twice. (++size is part of ArrayList#add implementation.)
I don't really see an instance where the string is half modified, I am on the right track here?
That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.
can someone please give me an example where an ArrayList causes a problem and a Vector solves it?
If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:
// broken, even though vector is "thread-safe"
if (vector.isEmpty())
vector.add(1);
The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have.
You need to synchronize in your application code (and then there is no need to use Vector).
synchronized(list){
if (list.isEmpty())
list.add(1);
}
The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.
The first part of youe query has been already answered. I will try to answer the second part :
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
in wait-notify framework, the thread aquiring the lock on the object releases it when waiting on some condition. A great example is the producer-consumer problem. See here: link text
When will it cause trouble?
Anytime that a thread is reading the ArrayList and the other one is writing, or when they are both writing. Here's a very known example.
Also, I remember that I was told that
multiple threads are not really
running simultaneously, 1 thread is
run for sometime and another thread
runs after that(on computers with a
single CPU). If that was correct, how
could two threads ever access the same
data at the same time? Maybe thread 1
will be stopped in the middle of
modifying something and thread 2 will
be started?
Yes, Single core cpus can execute only one instruction at a time (not really, pipelining has been here for a while, but as a professor once said, thats "free" parallelism). Even though, each process running in your computer is only executed for a period of time, then it goes to an idle state. In that moment, another process may start/continue its execution. And then go into an idle state or finish. Processes execution are interleaved.
With threads the same thing happens, only that they are contained inside a process. How they execute is dependant on the Operating System, but the concept remains the same. They change from active to idle constantly through their lifetime.
You cannot control when one thread will be stopped and other will start. Thread 1 will not wait until it has completely finished adding data. There is always possible to corrupt data.