Thread::yield vs Thread::onSpinWait - java

Well the title basically says it all, with the small addition that I would really like to know when to use them. And it might be simple enough - I've read the documentation for them both, still can't tell the difference much.
There are answers like this here that basically say:
Yielding also was useful for busy waiting...
I can't agree much with them for the simple reason that ForkJoinPool uses Thread::yield internally and that is a pretty recent addition in the jdk world.
The thing that really bothers me is usages like this in jdk too (StampledLock::tryDecReaderOverflow):
else if ((LockSupport.nextSecondarySeed() & OVERFLOW_YIELD_RATE) == 0)
Thread.yield();
else
Thread.onSpinWait();
return 0L;
So it seems there are cases when one would be preferred over the other. And no, I don't have an actual example where I might need this - the only one I actually used was Thread::onSpinWait because 1) I happened to busy wait 2) the name is pretty much self explanatory that I should have used it in the busy spin.

When blocking a thread, there are a few strategies to choose from: spin, wait() / notify(), or a combination of both. Pure spinning on a variable is a very low latency strategy but it can starve other threads that are contending for CPU time. On the other hand, wait() / notify() will free up the CPU for other threads but can cost thousands of CPU cycles in latency when descheduling/scheduling threads.
So how can we avoid pure spinning as well as the overhead associated with descheduling and scheduling the blocked thread?
Thread.yield() is a hint to the thread scheduler to give up its time slice if another thread with equal or higher priority is ready. This avoids pure spinning but doesn't avoid the overhead of rescheduling the thread.
The latest addition is Thread.onSpinWait() which inserts architecture-specific instructions to hint the processor that a thread is in a spin loop. On x86, this is probably the PAUSE instruction, on aarch64, this is the YIELD instruction.
What's the use of these instructions? In a pure spin loop, the processor will speculatively execute the loop over and over again, filling up the pipeline. When the variable the thread is spinning on finally changes, all that speculative work will be thrown out due to memory order violation. What a waste!
A hint to the processor could prevent the pipeline from speculatively executing the spin loop until prior memory instructions are committed. In the context of SMT (hyperthreading), this is useful as the pipeline will be freed up for other hardware threads.

Related

Running while loop infinitely without any code inside in java

Lets say I have written a infinite write loop but didn't have statement inside it? Will it create any issue like memory will be full etc or JVM will stop responding after sometime?
Why would you do something like that?
To answer, it wouldn't consume endless memory but Cpu usage could be a pain with really no instruction at all.
At minimum, you should help CPU preemption allowing the Thread to yield:
Thread.yield();
You can read this in Java Api Javadoc:
A hint to the scheduler that the current thread is willing to yield its current use of a processor. The scheduler is free to ignore this hint.
Yield is a heuristic attempt to improve relative progression between threads that would otherwise over-utilise a CPU. Its use should be combined with detailed profiling and benchmarking to ensure that it actually has the desired effect.
It is rarely appropriate to use this method. It may be useful for debugging or testing purposes, where it may help to reproduce bugs due to race conditions. It may also be useful when designing concurrency control constructs such as the ones in the java.util.concurrent.locks package.
An infinite loop might and probably will result in 100% CPU core utilization. Depending what you mean by "write loop" a similar technique is called Busy Waiting or Spinning.
spinning as a time delay technique often produces unpredictable or even inconsistent results unless code is implemented to determine how quickly the processor can execute a "do nothing" loop, or the looping code explicitly checks a real-time clock
You'll certainly keep one hardware thread busy. It wont create any objects, so memory isn't a direct issue as such.
However, the context is important.
If it is a high priority thread, the system may become unresponsive. This is implementation specific. Twenty years ago I wrote an infinite loop that made a Windows NT system unresponsive. (I think this was a TCP proxy and only happened when an IBM 3090 running CICS sent an empty keep alive frame to a 3270 terminal. Good times.)
If the thread is holding any locks, that wont be released.
If the thread does something useful, that useful thing wont happen. For instance if you were to write the loop in a finaliser (and the system only has one finaliser thread), no other object will get finalised and therefore not garbage collected either. The application may behave peculiarly. It'salways fun to run random code on the finaliser thread.

Thread.sleep() VS Executor.scheduleWithFixedDelay()

Goal: Execute certain code every once in a while.
Question: In terms of performance, is there a significant difference between:
while(true) {
execute();
Thread.sleep(10 * 1000);
}
and
executor.scheduleWithFixedDelay(runnableWithoutSleep, 0, 10, TimeUnit.SECONDS);
?
Of course, the latter option is more kosher. Yet, I would like to know whether I should embark on an adventure called "Spend a few days refactoring legacy code to say goodbye to Thread.sleep()".
Update:
This code runs in super/mega/hyper high-load environment.
You're dealing with sleep times termed in tens of seconds. The possible savings by changing your sleep option here is likely nanoseconds or microseconds.
I'd prefer the latter style every time, but if you have the former and it's going to cost you a lot to change it, "improving performance" isn't a particularly good justification.
EDIT re: 8000 threads
8000 threads is an awful lot; I might move to the scheduled executor just so that you can control the amount of load put on your system. Your point about varying wakeup times is something to be aware of, although I would argue that the bigger risk is a stampede of threads all sleeping and then waking in close succession and competing for all the system resources.
I would spend the time to throw these all in a fixed thread pool scheduled executor. Only have as many running concurrently as you have available of the most limited resource (for example, # cores, or # IO paths) plus a few to pick up any slop. This will give you good throughput at the expense of latency.
With the Thread.sleep() method it will be very hard to control what is going on, and you will likely lose out on both throughput and latency.
If you need more detailed advice, you'll probably have to describe what you're trying to do in more detail.
Since you haven't mentioned the Java version, so, things might change.
As I recall from the source code of Java, the prime difference that comes is the way things are written internally.
For Sun Java 1.6 if you use the second approach the native code also brings in the wait and notify calls to the system. So, in a way more thread efficient and CPU friendly.
But then again you loose the control and it becomes more unpredictable for your code - consider you want to sleep for 10 seconds.
So, if you want more predictability - surely you can go with option 1.
Also, on a side note, in the legacy systems when you encounter things like this - 80% chances there are now better ways of doing it- but the magic numbers are there for a reason(the rest 20%) so, change it at own risk :)
There are different scenarios,
The Timer creates a queue of tasks that is continually updated. When the Timer is done, it may not be garbage collected immediately. So creating more Timers only adds more objects onto the heap. Thread.sleep() only pauses the thread, so memory overhead would be extremely low
Timer/TimerTask also takes into account the execution time of your task, so it will be a bit more accurate. And it deals better with multithreading issues (such as avoiding deadlocks etc.).
If you thread get exception and gets killed, that is a problem. But TimerTask will take care of it. It will run irrespective of failure in previous run
The advantage of TimerTask is that it expresses your intention much better (i.e. code readability), and it already has the cancel() feature implemented.
Reference is taken from here
You said you are running in a "mega... high-load environment" so if I understand you correctly you have many such threads simultaneously sleeping like your code example. It takes less CPU time to reuse a thread than to kill and create a new one, and the refactoring may allow you to reuse threads.
You can create a thread pool by using a ScheduledThreadPoolExecutor with a corePoolSize greater than 1. Then when you call scheduleWithFixedDelay on that thread pool, if a thread is available it will be reused.
This change may reduce CPU utilization as threads are being reused rather than destroyed and created, but the degree of reduction will depend on the tasks they're doing, the number of threads in the pool, etc. Memory usage will also go down if some of the tasks overlap since there will be less threads sitting idle at once.

Which one is better for performance to check another threads boolean in java

while(!anotherThread.isDone());
or
while(!anotherThread.isDone())
Thread.sleep(5);
If you really need to wait for a thread to complete, use
anotherThread.join()
(You may want to consider specifying a timeout in the join call.)
You definitely shouldn't tight-loop like your first snippet does... and sleeping for 5ms is barely better.
If you can't use join (e.g. you're waiting for a task to complete rather than a whole thread) you should look at the java.util.concurrent package - chances are there's something which will meet your needs.
IMHO, avoid using such logic altogether. Instead, perhaps implement some sort of notification system using property change listeners.
As others have said, it's better to just use join in this case. However, I'd like to generalize your question and ask the following:
In general when a thread is waiting for an event that depends on another thread to occur is it better to:
Use a blocking mechanism (i.e. join, conditional variable, etc.) or
Busy spin without sleep or
Busy spin with sleep?
Now let's see what are the implications for each case:
In this case, using a blocking call will effectively take your thread off the CPU and not schedule it again until the expected event occurs. Good for resource utilization (the thread would waste CPU cycles otherwise), but not very efficient if the event may occur very frequently and at small intervals (i.e. a context switch is much more time-consuming than the time it takes for the event to occur). Generally good when the event will occur eventually, but you don't know how soon.
In case two, you are busy spinning, meaning that you are actively using the CPU without performing useful work. This is the opposite of case 1: it is useful when the event is expected to occur very very soon, but otherwise may occupy the CPU unnecessarily.
This case is a sort of trade-off. You are busy spinning, but at the same time allowing other threads to run by giving up the CPU. This is generally employed when you don't want to saturate the CPU, but the event is expected to occur soon and you want to be sure that you will still be there in almost real time to catch it when it occurs.
I would recommend utilizing the wait/notify mechanism that is built into all Java objects (or using the new Lock code in Java 5).
Thread 1 (waiting for Thread2)
while(!thread2.isDone()) {
synchronize(thread2.lockObject) {
thread2.lockObject.wait();
}
}
Thread 2
// finish work, set isDone=true, notify T1
thread2.lockObject.notify();
'lockObject' is just a plain (Object lockObject = new Object()) -- all Java objects support the wait/notify calls.
After that last call to notify(), Thread1 will wake up, hit the top of the while, see that T2 is now done, and continue execution.
You should account for interrupt exceptions and the like, but using wait/notify is hugely helpful for scenarios like this.
If you use your existing code, with or without sleep, you are burning a huge number of cycles doing nothing... and that's never good.
ADDENDUM
I see a lot of comments saying to use join - if the executing thread you are waiting on will complete, then yes, use join. If you have two parallel threads that run at all times (e.g. a producer thread and a consumer) and they don't "complete", they just run in lock-step with each other, then you can use the wait/notify paradigm I provided above.
The second one.
Better though is to use the join() method of a thread to block the current thread until it is complete :).
EDIT:
I just realised that this only addresses the question as it applies to the two examples you gave, not the question in general (how to wait for a boolean value to be changed by another Thread, not necessarily for the other Thread to actually finish).
To answer the question in general I would suggest that rather than using the methods you described, to do something like this I would recommend using the guarding block pattern as described here. This way, the waiting thread doesn't have to keep checking the condition itself and can just wait to be notified of the change. Hope this helps!
Have you considered: anotherThread.join() ? That will cause the current one to be 'parked' without any overhead until the other one terminates.
The second is better than the first, but neither is very good. You should use anotherThread.join() (or anotherThread.join(timeout)).
Neither, use join() instead:
anotherThread.join();
// anotherThread has finished executing.

Thread.sleep() implementation

Today I had an interview on which I asked candidate quite usual and basic question about the difference between Thread.sleep() and Object.wait(). I expected him to answer something like like this, but he said these methods basically are the same thing, and most likely Thread.sleep is using Object.wait() inside it, but sleep itself doesn't require external lock. This is not exactly a correct answer, because in JDK 1.6 this method have following signature.
public static native void sleep(long millis) throws InterruptedException;
But my second thought was that it's not that ridiculous. It's possible to use timed wait to achieve the same effect. Take a look at the following code snippet:
public class Thread implements Runnable {
private final Object sleepLock = new Object();
// other implementation details are skipped
public static void sleep(long millis) throws InterruptedException {
synchronized (getCurrentThread().sleepLock){
getCurrentThread().sleepLock.wait(millis);
}
}
In this case sleepLock is an object which is used particularly for the synchronization block inside sleep method. I assume that Sun/Oracle engineers are aware of Occam's razor, so sleep has native implementation on purpose, so my question is why it uses native calls.
The only idea I came up with was an assumption that someone may find useful invocation like Thread.sleep(0). It make sense for scheduler management according to this article:
This has the special effect of clearing the current thread's quantum and putting it to the end of the queue for its priority level. In other words, all runnable threads of the same priority (and those of greater priority) will get a chance to run before the yielded thread is next given CPU time.
So a synchronized block will give unnecessary overhead.
Do you know any other reasons for not using timed wait in Thread.sleep() implementation?
One could easily say Occam's Razor cuts the other way. The normal/expected implementation of the JVM underlying JDK is assumed to bind java 'threads' onto native threads most of the time, and putting a thread to sleep is a fundamental function of the underlying platform. Why reimplement it in java if thread code is going to be native anyway? The simplest solution is use the function that's already there.
Some other considerations:
Uncontested synchronization is negligible in modern JVMs, but this wasn't always so. It used to be a fairly "expensive" operation to acquire that object monitor.
If you implement thread sleeping inside java code, and the way you implement it does not also bind to a native thread wait, the operating system has to keep scheduling that thread in order to run the code that checks if it's time to wake up. As hashed out in the comments, this would obviously not be true for your example on a modern JVM, but it's tough to say
1) what may have been in place and expected at the time the Thread class was first specified that way.
and
2) If that assertion works for every platform one may have ever wanted to implement a JVM on.
Do you know any other reasons for not using timed wait in Thread.sleep() implementation?
Because the native thread libraries provide a perfectly good sleep function: http://www.gnu.org/software/libc/manual/html_node/Sleeping.html
To understand why native threads are important, start at http://java.sun.com/docs/hotspot/threads/threads.html
Version 1.1 is based on green threads and won't be covered here. Green threads are simulated threads within the VM and were used prior to going to a native OS threading model in 1.2 and beyond. Green threads may have had an advantage on Linux at one point (since you don't have to spawn a process for each native thread), but VM technology has advanced significantly since version 1.1 and any benefit green threads had in the past is erased by the performance increases over the years.
Thread.sleep() will not be woken up early by spurious wakeups. If using Object.wait(), to do it properly (i.e. ensure you wait enough time) you would need a loop with a query to elapsed time (such as System.currentTimeMillis()) to make sure you wait enough.
Technically you could achieve the same functionality of Thread.sleep() with Object.wait() but you would need to write more code do it correctly.
This is also a relevant and useful discussion.
When a thread is called the sleep method, the thread will be added into a sleep queue. If the compute clock frequency is 100HZ, that means every 10ms the current running process will be interrupted. After reserve the current context of the thread, then it will decrease the value (-10ms) for each thread. When it comes to zero, the thread will move to "waiting for CPU" queue. When time slice comes to this thread, it will be running again. Also because this which not immediately become running, so the time actually sleeps is larger than the value it set.

SynchronousQueue fairness

I'm using a 1producer-1consumer design in my app using a SynchronousQueue. By now, I'm using it with the default constructor (fair=true). And I'm wondering about how "fair=false" would affect to the system (performance and specially concurrency behaviour).
Here what the docs tell:
SynchronousQueue
public SynchronousQueue()
Creates a SynchronousQueue with nonfair access policy.
SynchronousQueue
public SynchronousQueue(boolean fair)
Creates a SynchronousQueue with the specified fairness policy.
Parameters:
fair - if true, waiting threads contend in FIFO order for
access; otherwise the order is
unspecified.
Thanks in advance.
Your question contains the answer, more or less. Anyway, the short answer is that it will make no effective difference in your single-consumer case (with perhaps an infinitesimal performance decrease).
If you set the fair flag to true, then as you've pasted in your question, waiting threads contend in FIFO order for access. This places specific constraints on the scheduling of waiting threads as to how they are reawakened; an unfair system has no such constraints (and consequently the compiler/runtime is free to do things which may run a little faster).
Note that this only ever effects which thread is chosen to wake up out of the set of threads that are waiting; and with only one thread that will ever wait, the decision algorithm is irrelevant as it will always pick the same thread. The distinction comes when you have multiple threads waiting - is it acceptable for one individual thread to never get anything from the queue so long as other threads are able to handle the whole workload between them?
Wrt. performance, have you tried measuring this ? It'll most likely give you more of an indication as to what's going on than any answer here.
From the doc:
Fairness generally decreases
throughput but reduces variability and
avoids starvation
but it would be interesting to run a repeatable test and study how much that will affect you and your particular circumstances. As you have only one consumer thread I don't think it'll affect your application beyond (perhaps) a small (perhaps imperceptible?) performance decrease. But I would reiterate that you should try and measure it.

Categories