Is there a way to get the number of idle monitors

Is there a way to get the number of idle monitors - java

This link mentions the number of monitors currently in a running JVM (i.e. 250000)
https://bugs.openjdk.org/browse/JDK-8153224
How were they able to obtain that number?
Specifically, I want to obtain the number of idle monitors in a running system. We have observed a long "Stopping threads" phase i a gc run.
The safepoint log shows that the time is spent in the "sync" phase.
We are looking for culprits.

It is true that every Java object can be used as a monitor, but the "idle monitors" that https://bugs.openjdk.org/browse/JDK-8153224 refers to are not simply objects. That wouldn't make sense. It would mean that the mere existence of lots of objects in the heap would delay GC "sync".
So what does it mean?
First we need to understand a bit about how Java monitors (primitive locks) work.
The default state of a monitor is "thin". In this state, the monitor is represented by a couple of lock bits in the object header. When a thread attempts to acquire a monitor that is in the thin state, it uses (I believe) a CAS instruction (or similar) to atomically test that the lock is thin+unlocked and flip it to thin+locked. If that succeeds, the thread has acquired the monitor lock and it proceeds on its way.
However, if the lock bits say that the monitor is already locked, the CAS will fail: we have lock contention. The locking code now needs to add the current thread to a queue for the monitor, and "park" it. But the object header bits cannot represent the queue and so on. So the code creates a "fat" lock data structure to hold the extra state needed. This is called lock inflation. (The precise details don't matter for the purpose of this explanation.)
The problem is that "fat" locks use more memory than "thin" locks, and have more complicated and expensive lock acquire and release operations. So we want to turn them back into "thin" locks; i.e. "deflate" them. But we don't want to do this immediately that the lock is released because there is a fair chance that the lock contention will recur and the lock will need to be reinflated.
So there is a mechanism that scans for "fat" locks that are current not locked and don't have threads waiting to acquire them. These are the "idle monitors" that the JDK-8153224 is talking about. If I understand what the bug report is saying, when the number of these "idle monitors" is large, the deflater mechanism can significantly delay the GC "sync" step.
This brings us to your problem. How can you tell if the deflater is what is causing your "syncs" to take a long time?
I don't think you can ... directly.
However, for this phenomenon to occur, you would need a scenario where there is a lot of thread contention on a large number of monitors. My advice would be to examine / analyze your application to see if that is plausible. (Note that the Twitter example involved thousands of threads and hundreds of thousands of monitors. It sounds a bit extreme to me ...)
There are other things (culprits) that can cause delays in the GC "sync" step. These include:
certain kinds of long running loops,
long running System.arraycopy() calls, and
having a large number of threads in RUNNING state.
So before spend lots of time on the "idle monitor" theory, I think you should do some safepoint profiling as mentioned in How to reduce time taken on threads reaching Safepoint - Sync state. Look for application threads that maybe the cause.

Related

Thread::yield vs Thread::onSpinWait

Well the title basically says it all, with the small addition that I would really like to know when to use them. And it might be simple enough - I've read the documentation for them both, still can't tell the difference much.
There are answers like this here that basically say:
Yielding also was useful for busy waiting...
I can't agree much with them for the simple reason that ForkJoinPool uses Thread::yield internally and that is a pretty recent addition in the jdk world.
The thing that really bothers me is usages like this in jdk too (StampledLock::tryDecReaderOverflow):
else if ((LockSupport.nextSecondarySeed() & OVERFLOW_YIELD_RATE) == 0)
Thread.yield();
else
Thread.onSpinWait();
return 0L;
So it seems there are cases when one would be preferred over the other. And no, I don't have an actual example where I might need this - the only one I actually used was Thread::onSpinWait because 1) I happened to busy wait 2) the name is pretty much self explanatory that I should have used it in the busy spin.

When blocking a thread, there are a few strategies to choose from: spin, wait() / notify(), or a combination of both. Pure spinning on a variable is a very low latency strategy but it can starve other threads that are contending for CPU time. On the other hand, wait() / notify() will free up the CPU for other threads but can cost thousands of CPU cycles in latency when descheduling/scheduling threads.
So how can we avoid pure spinning as well as the overhead associated with descheduling and scheduling the blocked thread?
Thread.yield() is a hint to the thread scheduler to give up its time slice if another thread with equal or higher priority is ready. This avoids pure spinning but doesn't avoid the overhead of rescheduling the thread.
The latest addition is Thread.onSpinWait() which inserts architecture-specific instructions to hint the processor that a thread is in a spin loop. On x86, this is probably the PAUSE instruction, on aarch64, this is the YIELD instruction.
What's the use of these instructions? In a pure spin loop, the processor will speculatively execute the loop over and over again, filling up the pipeline. When the variable the thread is spinning on finally changes, all that speculative work will be thrown out due to memory order violation. What a waste!
A hint to the processor could prevent the pipeline from speculatively executing the spin loop until prior memory instructions are committed. In the context of SMT (hyperthreading), this is useful as the pipeline will be freed up for other hardware threads.

Why Non blocking Concurrency is better than blocking concurrency

I just want to know why Non Blocking concurrency is better than Blocking concurrency. In Blocking Concurrency Your thread must wait till other thread completes its execution. So thread would not consuming CPU in that case.
But if I talk about Non Blocking Concurrency, Threads do not wait to get a lock they immediately returns if certain threads is containing the lock.
For Example in ConcurrentHashMap class , inside put() method there is tryLock() in a loop. Other thread will be active and continuously trying to check if lock has been released or not because tryLock() is Non Blocking. I assume in this case, CPU is unnecessary used.
So Is it not good to suspend the thread till other thread completes its execution and wake the thread up when work is finished?

Whether or not blocking or non-blocking concurrency is better depends on how long you expect to have to wait to acquire the resource you're waiting on.
With a blocking wait (i.e. a mutex lock, in C parlance), the operating system kernel puts the waiting thread to sleep. The CPU scheduler will not allocate any time to it until after the required resource has become available. The advantage here is that, as you said, this thread won't consume any CPU resources while it is sleeping.
There is a disadvantage, however: the process of putting the thread to sleep, determining when it should be woken, and waking it up again is complex and expensive, and may negate the savings achieved by not having the thread consume CPU while waiting. In addition (and possibly because of this), the OS may choose not to wake the thread immediately once the resource becomes available, so the lock may be waited on longer than is necessary.
A non-blocking wait (also known as a spinlock) does consume CPU resource while waiting, but saves the expense of putting the thread to sleep, determining when it should be woken, and waking it. It also may be able to respond faster once the lock becomes free, as it is less at the whim of the OS in terms of when it can proceed with execution.
So, as a very general rule, you should prefer a spinlock if you expect to only wait a short time (e.g. the few CPU cycles it might take for another thread to finish with an entry in ConcurrentHashMap). For longer waits (e.g. on synchronized I/O, or a number of threads waiting on a single complex computation), a mutex (blocking wait) may be preferable.

If you consider ConcurrentHashMap as an example , considering the overhead due to multiple threads performing update operations (like put) , and block waiting for the locks to release (as you mention other thread will be active and continuously trying to check if lock has been released), is not going to be the case,always.
Compared to HashTable , Concurrency control in ConcurrentHashMap is split up. So multiple threads can acquire lock(on segments of the table).
Originally, the ConcurrentHashMap class supports a hard-wired preset concurrency level of 32. This allows a maximum of 32 put and/or remove operations to proceed concurrently(factors other than synchronization tend to be bottlenecks when more than 32 threads concurrently attempt updates.)
Also, successful retrievals (when the key is present) using get(key) and containsKey(key) usually run without locking.
So for instance, one thread might be in the process of adding an element, what cannot be done with such a locking strategy is operations like add an element only if it is not already present (ConcurrentReaderHashMap provides such facilities).
Also, the size() and isEmpty() methods require accumulations across 32 control segments, and so might be slightly slower.

Can a running thread become runnable on entering a uncontested synchronized block?

There is a strange thing happening on our production box.
Code functionality:
A UI servlet takes a monitor lock on the document object which is being actioned upon by the user and performs some computation on it. The monitor lock is acquired to prevent the same document object from getting modified concurrently by multiple users simultaneously.
Issue Observed in Prod:
Few user actions are getting timed out.
Log Analysis:
The thread corresponding to the timed out user actions is printing all logs prior to acquiring the monitor lock on the document object. Then there is a gap of over 1 hour where the thread is not surfacing up in the logs and then it suddenly becomes alive and does the computation and attempts to send back a response which obviously errors out as the HTTP request has already timed out.
We have checked the logs and code and can confirm that there is no other thread which had acquired the monitor lock on that particular document object. So the lock was uncontested at the point in question.
What could be the possible issue? Is it just that the thread was put into a Runnable state on encountering a synchronized block and for the next 60-80 mins, the CPU never got a chance to run this particular runnable thread?

Ensure the application code is not messing around with thread priority via Thread.setPriority() method or the like. If you're using an IDE like IntelliJ and the Java sources are available, and assuming you can run the application and relevant flow locally in your development machine, you can put a breakpoint in Thread.setPriority() to see if anywhere it is getting invoked. This is an excerpt from Java Concurrency in Practice, Goetz 2006, regarding how unpredictable behavior can be if you try to set Thread priority manually:
10.3.1. Starvation
Starvation occurs when a thread is perpetually denied access to resources it needs in order to make progress; the most
commonly starved resource is CPU cycles. Starvation in Java applications can be caused by inappropriate use of thread
priorities. It can also be caused by executing nonterminating constructs (infinite loops or resource waits that do not
terminate) with a lock held, since other threads that need that lock will never be able to acquire it.
The thread priorities defined in the Thread API are merely scheduling hints. The Thread API defines ten priority levels
that the JVM can map to operating system scheduling priorities as it sees fit. This mapping is platform􀍲specific, so two
Java priorities can map to the same OS priority on one system and different OS priorities on another. Some operating
systems have fewer than ten priority levels, in which case multiple Java priorities map to the same OS priority.
Operating system schedulers go to great lengths to provide scheduling fairness and liveness beyond that required by the
Java Language Specification. In most Java applications, all application threads have the same priority, Thread.
NORM_PRIORITY. The thread priority mechanism is a blunt instrument, and it's not always obvious what effect changing
priorities will have; boosting a thread's priority might do nothing or might always cause one thread to be scheduled in
preference to the other, causing starvation.
It is generally wise to resist the temptation to tweak thread priorities. As soon as you start modifying priorities, the
behavior of your application becomes platform specific and you introduce the risk of starvation. You can often spot a
program that is trying to recover from priority tweaking or other responsiveness problems by the presence of
Thread.sleep or Thread.yield calls in odd places, in an attempt to give more time to lower priority threads.[5]

What will happen if the locks themselves get contended upon?

All objects in Java have intrinsic locks and these locks are used for synchronization. This concept prevents objects from being manipulated by different threads at the same time, or helps control execution of specific blocks of code.
What will happen if the locks themselves get contended upon - i.e. 2 threads asking for the lock at the exact microsecond.
Who gets it, and how does it get resolved?

What will happen if the locks themselves get contended upon - i.e. 2 threads asking for the lock at the exact microsecond.
One thread will get the lock, and the other will be blocked until the first thread releases it.
(Aside: some of the other answers assert that there is no such thing as "at the same time" in Java. They are wrong!! There is such a thing! If the JVM is using two or more cores of a multi-core system, then two threads on different cores could request the same Object lock in exactly the same hardware clock cycle. Clearly, only one will get it, but that is a different issue.)
Who gets it, and how does it get resolved?
It is not specified which thread will get the lock.
It is (typically) resolved by the OS'es thread scheduler ... using whatever mechanisms that uses. This aspect of the JVM's behaviour is (obviously) platform specific.
If you really, really want to figure out precisely what is going on, the source code for OpenJDK and Linux are freely available. But to be frank, you don't need to know.

When it comes to concurrency, there is no such thing as "at the same time"; java ensures that someone is first.
If you are asking about simultaneous contended access to lock objects, that is the essence of concurrent programming - nothing to say other than "it happens by design"
If you are asking about simultaneously using an object as a lock and as a regular object, it's not a problem: It happens all the time when using non synchronized methods during a concurrent call to a synchronized method (which uses this as the lock object)

The thing handling lock requests can only handle one thing at a time; therefore, 2 threads can't ask for the lock at the same time.
Even if it is in the same microsecond, one will still be ahead of the other one (perhaps faster by a nanosecond). The one that asks first will get the lock. The one who asks second will then wait for the lock to be released.
An analogy will be ... stacking papers together... Suppose I have one hand and that hand can only hold one piece of paper. Different people(threads) are handing me a single piece of paper. If two people "offer me papers at the same time" I will handle one before the other
In reality, there is no such thing as at the same time. The phrase exists because our brains can not work at the micro...nano...pico second speeds
http://docs.oracle.com/javase/tutorial/essential/concurrency/locksync.html

Locks are implemented not only in JVM but also at OS and hardware level so the mechanisms may differ. We rely on Java API and JVM specs and they say that one of the threads will acquire the lock the other will block.

How to remove deadlock in Java code using NetBeans

I have old code in Java which deadlocks... I never used netbeans as a development tool... however, I need to fix the code.
I ran the application in debug mode, clicked on check for deadlock and netBeans brought a screen. Two out of four threads were in red... see the screen dump below.
I'm new to multithreading, and on the top of that code is not mine...
What's most likely causing the problem?

As far as I can tell the problem is very likely related to the way in which (or more specifically the order in which) the multiple threads acquire and release locks.
In the above example the two threads need access to two locks (or monitors):
nano.toolbox.strategies.ESMarketMaker
nano.toolbox.strategies.ExecutionManager
From the stack trace on the two threads currently in a deadlock, we can see that thread 'ExecutionManager' has aquired the ExecutionManager monitor but is awaiting acquisition (while still holding the 'ExecutionManager' monitor) of the 'ESMarketMaker' monitor.
The 'StrategyManager' thread on the other hand, has acquired the 'ESMarketMaker' monitor but is awaiting acqusition (while still holding the 'ESMarketMaker' monitor) of the 'ExecutionManager' monitor.
This is a class example of deadlocks and the many ways in which order of acquisition of locks can cause deadlocks.
There are many ways to address these kind of problems:
If possible, all threads needing some set of locks to operate, must acquire the shared locks in the same order (the inversed order is the problem in the above deadlock). But this is not always possible, as multiple threads may have only semi-overlapping lock usage in different conditions, why it may be hard or impossible to design a protocol of acquisition that will ensure uniform ordering.
You may also use tryLock() instead, which is a non-blocking acquisition, it returns a flag to indicate success or failure and gives you the option to do something else before re-trying. One thing I would recommend in this case, is that if acquisition fails, it is to drop all currently owned locks and try from scratch again (thus giving way for any who is blocked on any or all locks the current thread holds, to complete their work, maybe freeing the locks this thread needs when it retries).
One thing to note though, is that sometimes when deciding on the protocol to use, you need more explicit control over your locks, rather than normal synchronization in Java. In these cases, the usage of explicit ReentrantLock instances can be a benefit, as these allows you to do stuff like inspecting whether a lock is unlocked or currently locked, and do non-blocking try-locks as described above.
I hope this helps, I'm sorry I can't be more specific, but I would need to see the source code for that. :-)
(Oh an p.s., a third thing one might opt for, if deadlock is something that must be avoided by all cost, is to look into modeling tools, to model a state machine over the states of the program and locks, which can be used together with analysis tools which can check for possible deadlocks in such a model and give you examples if any such is found).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.