There is a strange thing happening on our production box.
Code functionality:
A UI servlet takes a monitor lock on the document object which is being actioned upon by the user and performs some computation on it. The monitor lock is acquired to prevent the same document object from getting modified concurrently by multiple users simultaneously.
Issue Observed in Prod:
Few user actions are getting timed out.
Log Analysis:
The thread corresponding to the timed out user actions is printing all logs prior to acquiring the monitor lock on the document object. Then there is a gap of over 1 hour where the thread is not surfacing up in the logs and then it suddenly becomes alive and does the computation and attempts to send back a response which obviously errors out as the HTTP request has already timed out.
We have checked the logs and code and can confirm that there is no other thread which had acquired the monitor lock on that particular document object. So the lock was uncontested at the point in question.
What could be the possible issue? Is it just that the thread was put into a Runnable state on encountering a synchronized block and for the next 60-80 mins, the CPU never got a chance to run this particular runnable thread?
Ensure the application code is not messing around with thread priority via Thread.setPriority() method or the like. If you're using an IDE like IntelliJ and the Java sources are available, and assuming you can run the application and relevant flow locally in your development machine, you can put a breakpoint in Thread.setPriority() to see if anywhere it is getting invoked. This is an excerpt from Java Concurrency in Practice, Goetz 2006, regarding how unpredictable behavior can be if you try to set Thread priority manually:
10.3.1. Starvation
Starvation occurs when a thread is perpetually denied access to resources it needs in order to make progress; the most
commonly starved resource is CPU cycles. Starvation in Java applications can be caused by inappropriate use of thread
priorities. It can also be caused by executing nonterminating constructs (infinite loops or resource waits that do not
terminate) with a lock held, since other threads that need that lock will never be able to acquire it.
The thread priorities defined in the Thread API are merely scheduling hints. The Thread API defines ten priority levels
that the JVM can map to operating system scheduling priorities as it sees fit. This mapping is platformspecific, so two
Java priorities can map to the same OS priority on one system and different OS priorities on another. Some operating
systems have fewer than ten priority levels, in which case multiple Java priorities map to the same OS priority.
Operating system schedulers go to great lengths to provide scheduling fairness and liveness beyond that required by the
Java Language Specification. In most Java applications, all application threads have the same priority, Thread.
NORM_PRIORITY. The thread priority mechanism is a blunt instrument, and it's not always obvious what effect changing
priorities will have; boosting a thread's priority might do nothing or might always cause one thread to be scheduled in
preference to the other, causing starvation.
It is generally wise to resist the temptation to tweak thread priorities. As soon as you start modifying priorities, the
behavior of your application becomes platform specific and you introduce the risk of starvation. You can often spot a
program that is trying to recover from priority tweaking or other responsiveness problems by the presence of
Thread.sleep or Thread.yield calls in odd places, in an attempt to give more time to lower priority threads.[5]
Related
This link mentions the number of monitors currently in a running JVM (i.e. 250000)
https://bugs.openjdk.org/browse/JDK-8153224
How were they able to obtain that number?
Specifically, I want to obtain the number of idle monitors in a running system. We have observed a long "Stopping threads" phase i a gc run.
The safepoint log shows that the time is spent in the "sync" phase.
We are looking for culprits.
It is true that every Java object can be used as a monitor, but the "idle monitors" that https://bugs.openjdk.org/browse/JDK-8153224 refers to are not simply objects. That wouldn't make sense. It would mean that the mere existence of lots of objects in the heap would delay GC "sync".
So what does it mean?
First we need to understand a bit about how Java monitors (primitive locks) work.
The default state of a monitor is "thin". In this state, the monitor is represented by a couple of lock bits in the object header. When a thread attempts to acquire a monitor that is in the thin state, it uses (I believe) a CAS instruction (or similar) to atomically test that the lock is thin+unlocked and flip it to thin+locked. If that succeeds, the thread has acquired the monitor lock and it proceeds on its way.
However, if the lock bits say that the monitor is already locked, the CAS will fail: we have lock contention. The locking code now needs to add the current thread to a queue for the monitor, and "park" it. But the object header bits cannot represent the queue and so on. So the code creates a "fat" lock data structure to hold the extra state needed. This is called lock inflation. (The precise details don't matter for the purpose of this explanation.)
The problem is that "fat" locks use more memory than "thin" locks, and have more complicated and expensive lock acquire and release operations. So we want to turn them back into "thin" locks; i.e. "deflate" them. But we don't want to do this immediately that the lock is released because there is a fair chance that the lock contention will recur and the lock will need to be reinflated.
So there is a mechanism that scans for "fat" locks that are current not locked and don't have threads waiting to acquire them. These are the "idle monitors" that the JDK-8153224 is talking about. If I understand what the bug report is saying, when the number of these "idle monitors" is large, the deflater mechanism can significantly delay the GC "sync" step.
This brings us to your problem. How can you tell if the deflater is what is causing your "syncs" to take a long time?
I don't think you can ... directly.
However, for this phenomenon to occur, you would need a scenario where there is a lot of thread contention on a large number of monitors. My advice would be to examine / analyze your application to see if that is plausible. (Note that the Twitter example involved thousands of threads and hundreds of thousands of monitors. It sounds a bit extreme to me ...)
There are other things (culprits) that can cause delays in the GC "sync" step. These include:
certain kinds of long running loops,
long running System.arraycopy() calls, and
having a large number of threads in RUNNING state.
So before spend lots of time on the "idle monitor" theory, I think you should do some safepoint profiling as mentioned in How to reduce time taken on threads reaching Safepoint - Sync state. Look for application threads that maybe the cause.
The link to the documentation says: Thread.sleep causes the current thread to suspend execution for a specified period
What does the term current thread mean? I mean if the processor has only one core that it makes sense to coin one of the threads as the current thread, but if all the threads(say 4 of them) are running individually on separate cores, then which one is the current thread?
The "current thread" is the thread which calls Thread.sleep(delay).
Also if a thread sleeps, it does not block the entire CPU core. Some other thread can be run on the same CPU core while your thread is asleep.
Every single command and method call you make has to be executed by anyone thread. From that thread's perspective, it is itself the current thread. So in other words: Thread.sleep(delay) pauses the thread that executes the Thread.sleep() method.
Also, keep in mind that multi-threading and multiple cores only have a very distant relationship.
Even before multi-core CPUs were commonplace, pretty much every operating system supported heavy multi-threading (or multi-tasking, which is basically the same thing for the purpose of this discussion) operation.
In modern OS this is done with a technique called preemptive multitasking. This basically means that the OS can forcibly pause the currently running process and allow another one to run for a short time, providing the illusion of actual parallel processing.
And since a lot of time in a given process is often spent waiting for some external I/O (network, disk, ...) that even means that you can use the CPU more efficiently (since the time a process would spend waiting for IO another process can spend doing actual computation).
As an example at the time of writing this, my laptop has 1311 threads (most of which are probably sleeping and only a handful will actually run and/or wait to run), even though it has only 4 cores.
tl;dr while multiple cores allow more than one thread to actually execute at the exact same time, you can have multi-threading even with a single core and there's very little noticeable difference if you do (besides raw performance, obviously)
The name, "Current thread," was chosen for the convenience of the authors of the operating system, not for the authors of applications that have to run under the operating system.
In the source code for an operating system, it makes sense to have a variable current_thread[cpu_id] that points to a data structure that describes the thread that is running on that cpu_id at any given moment.
From the point-of-view of an application programmer, any system call that is supposed to do something to the "current thread," is going to do it to the thread that makes the call. If a thread that is running on CPU 3 calls Thread.sleep(n), the OS will look up current_thread[3] (i.e., the thread that made the call) and put that thread to sleep.
From the application point-of-view, Thread.sleep(n) is a function that appears to do nothing, and always takes at least n milliseconds to do it.
In general, you should substitute "the caller" or "the calling thread" any time you see "current thread" in any API document.
What I can't find is any statement on whether changing a thread's priority is a costly operation, time-wise. I would like to do it frequently, but if each switch carries a significant time penalty it is probably not worth the trouble.
What I can't find is any statement on whether changing a thread's priority is a costly operation, time-wise. I would like to do it frequently, but if each switch carries a significant time penalty it is probably not worth the trouble.
Any answer here is going to be very OS dependent. I suspect with most Unix variants that the answer is no, it's not costly. It may require some sort of data synchronization but otherwise it is just setting a value on the thread's administrative information. I suspect that there is no rescheduling of the threads as discussed in the comments.
That said, without knowing more about your particular use case, I doubt it is going to be worth the trouble. As I say in the answer listed below, about the only time thread prioritization will make a difference is if all of the threads are completely CPU bound and you want one task or another to get more cycles.
Also, thread priorities are very non-linear and small changes to them may have little to no effect so any overhead incurred in setting the thread priorities will overwhelm any benefits gained by changing them.
See my answer here:
Guide for working with Linux thread priorities and scheduling policies?
Also, check out this article about Java thread priorities and some real life testing of them under Linux. To quote:
As can be seen, thread priorities 1-8 end up with a practically equal share of the CPU, whilst priorities 9 and 10 get a vastly greater share (though with essentially no difference between 9 and 10). The version tested was Java 6 Update 10.
In the case of Windows, a call to SetThreadPriority to change the priority of a ready to run thread is a system call that will move the thread from it's current priority ready queue to a different priority ready queue, which is more costly than just setting some value in a thread object.
If SetThreadPriority is used to increase the priority of a thread, and if that results in the now higher priority thread preempting a lower priority thread, the preemption occurs at call time, not at the next time slice.
Ready queues are mentioned here:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682105(v=vs.85).aspx
Context switching related to a priority change is mentioned here: "The following events might require thread dispatching ... A thread’s priority changes, either because of a system service call or because Windows itself changes the priority value." and "Preemption ... a lower-priority thread is preempted when a higher-priority thread becomes ready to run. This situation might occur for a couple of reasons: A higher-priority thread’s wait completes ... A thread priority is increased or decreased." Ready queues are also mentioned: "Windows multiprocessor systems have per-processor dispatcher ready queues"
https://www.microsoftpressstore.com/articles/article.aspx?p=2233328&seqNum=7
I asked about this at MSDN forums. The fourth post agrees with the sequence I mention in the first and third post in this thread:
https://social.msdn.microsoft.com/Forums/en-US/d4d40f9b-bfc9-439f-8a76-71cc5392669f/setthreadpriority-to-higher-priority-is-context-switch-immediate?forum=windowsgeneraldevelopmentissues
In the case of current versions of Linux, run queues indexed by priority were replaced by a red-black tree. Changing a thread's priority would involve removal and reinsertion of a thread object within the red-black tree. Preemption would occur if the thread object is moved sufficiently to the "left" of the red-black tree.
https://www.ibm.com/developerworks/library/l-completely-fair-scheduler
In response to the comments about the app that "examines a full-speed stream of incoming Bluetooth data packets", the receiving thread should be highest priority, hopefully spending most of its time not running while waiting for the reception of a packet. The high priority packets would be queued up to be processed by another thread just lower in priority than the receiving thread. Multiple processing threads could take advantage of multiple cores if needed.
What does thread priority means? will a thread with MAX_PRIORITY completes its execution before a thread which has MIN_PRIORITY? Or a MAX_PRIORITY thread will be given more execution time then MIN_PRIORITY thread? or any thing else?
The javadoc for Thread only says this, "Threads with higher priority are executed in preference to threads with lower priority." That can mean different things depending on what JVM you are running and, more likely, on what operating system you are running.
In the simplest interpretation of "priority", as implemented by some real-time, embedded operating systems; a thread with a lower priority will never get to run when a higher priority thread is waiting to run. The lower priority thread will be immediately preempted by whatever event caused the higher priority thread to become runnable. That kind of absolute priority is easy to implement, but it puts a burden on the programmer to correctly assign priorities to all of the different threads of all of the different processes running in the box. That is why you usually don't see it outside of embedded systems.
Most general-purpose operating systems assume that not all processes are designed to cooperate with one another. They try to be fair, giving an equal share to each thread that wants CPU time. Usually that is accomplished by continually adjusting the thread's true priorities according to some formula that accounts for how much CPU different threads have wanted in the recent past, and how much each got. There usually is some kind of a weighting factor, to let a programmer say that this thread should get a larger "share" than that thread. (e.g., the "nice" value on a Unix-like system.)
Because any practical JVM must rely on the OS to provide thread scheduling, and because there are so many different ways to interpret"priority"; Java does not attempt to dictate what "priority" really means.
I have a Java application where I am using PooledExecuter from oswego.
I strongly suspect that there is a thread contention in my application as irrespective of using PooledExecuter, requests are taking time = roughly no of requests * time for one request.
I want to gather evidence that there is definitely a thread contention.
Is there any way I can set some JVM parameters which show me what different threads are doing or any other way that can be useful to detect thread contention.
From your description it sounds like you have a resource which is single threaded and you code isn't able to use multiple threads efficiently. You should be able to see this by taking a thread dump while this is happening (a few times) You should see one thread doing "real" work, and all the other thread in the pool waiting for something or idle.
I don't know of jvm options that can tell you this, I would attach a profiler to it and see if the threads are contending (blocked / waiting) lots, and then see which locks they are contending for.