JVM signal chaining SIGPIPE

JVM signal chaining SIGPIPE - java

We have a C++ application with an embedded JVM (Sun's). Because we register our own signal handlers, it's recommended we do so before initializing the JVM since it installs its own handlers (see here).
From what I understood, the JVM knows internally if the signal originated from its own code and if not it passes it along the chain - to our handlers.
What we started seeing is that we're getting SIGPIPEs, with a call stack that looks roughly like this (the top entry is our signal handler):
/.../libos_independent_utilities.so(_ZN2os32smart_synchronous_signal_handlerEiP7siginfoPv+0x9) [0x2b124f7a3989]
/.../jvm/jre/lib/amd64/server/libjvm.so [0x2aaaab05dc6c]
/.../jvm/jre/lib/amd64/server/libjvm.so [0x2aaaab05bffb]
/.../jvm/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x718) [0x2aaaab05e878]
/.../jvm/jre/lib/amd64/server/libjvm.so [0x2aaaab05bf0e]
/lib64/libpthread.so.0 [0x3c2140e4c0]
/lib64/libpthread.so.0(send+0x91) [0x3c2140d841]
/.../jvm/jre/lib/amd64/libnet.so [0x2aaabd360269]
/.../jvm/jre/lib/amd64/libnet.so(Java_java_net_SocketOutputStream_socketWrite0+0xee) [0x2aaabd35cf4e]
[0x2aaaaeb3bf7f]
It seems like the JVM is deciding that the SIGPIPE that was raised from send should be passed along to our signal hander. Is it right when doing so?
Also, why is the call stack incomplete? I mean obviously it can't show me java code before socketWrite0 but why can't I see the stack before the java code?

The JVM can't tell whether the SIGPIPE came from it's own code, or your code. That information just isn't given by the signal. Because it doesn't want you to miss out on any possible events that you could be interested in, it has to pass you all SIGPIPEs, even the ones that it turns out were from its own code.
Unix signals come in two flavors -- "synchronous" and "asynchronous". A few exceptional conditions when just executing code can cause traps and result in "synchronous" signals. These are things such as unaligned memory access (SIGBUS), illegal memory access, often NULLs, (SIGSEGV), division by zero and other math errors (SIGFPE), undecodable instructions (SIGILL), and so forth. These have a precise execution context, and are delivered directly to the thread that caused them. The signal handler can look up the stack and see "hey I got an illegal memory access executing java code, and the pointer was a NULL. Let me go fix that up."
In contrast, the signals that interact with the outside world are the "asynchronous" variety, and include such things as SIGTERM, SIGQUIT, SIGUSR1, etc. These do not have a fixed execution context. For threaded programs they are delivered pretty much at random to any thread. Importantly, SIGPIPE is among these. Yes, in some sense, it is normally associated with one system call. But it is quite possible to (for instance) have two threads listening to two separate connections, both of which close before either thread is scheduled. The kernel just makes sure that there is a SIGPIPE pending (the usual implementation is as a bitmask of pending signals), and deals with it on rescheduling any of the threads in the process. This is only one of the simpler cases possible where the JVM might not have enough information to rule out your client code being interested in this signal.
(As to what happens to the read calls, they return "there was an error: EINTR" and continue on. At this point, the JVM can turn that into an exception, but the return happens after the signal delivery and the signal handler fires.)
The upshot is you'll just have to deal with false-positives. (And deal with getting only one signal where two might have been expected.)

Related

Will JVM threads always maintain their mapping to OS threads

I'm writing a service that uses JNA to delegate work from Java to a native C++ library. The C++ library makes an async call for a computationally expensive task, and then gets a callback (on a different OS thread) when that task is complete. I would like to route the result of this work back to the correct thread in JVM.
What I'm wondering is can I be guaranteed that the JVM thread id will always have a one to one mapping with a native thread_id? I.e. if I record the thread id in C++ via
std::this_thread::get_id()
then kick off some expensive work and block on a cv, that the thread will still be there once the work is complete, and that I'll be able to return results to JVM correctly. Will any behind-the-scenes JVM work like JIT, GC, or stop the world collections be causes for concern with this pattern?

The answer is not specified in the JLS, the JVM spec or the Javadocs.
Indeed, it is possibly platform specific. For example, in JVMs for Solaris it is (or was) possible to do N:M mapping of user-space threads to kernel-space threads; see this document. It is not clear what that means / meant for the native thread id.
So will a thread's native thread_id be constant for the JVM that you are using?
There is only one way to be sure: download the JVM source code and check.
Warning: it is fearsome complicated!
(And you should probably take that as a hint that you shouldn't be doing this kind of thing ... if you have to resort to asking on StackOverflow if it will work!)

It seems like a bad design to me, that requires you to know how the JVM works.
If you anyway marshal some Java data to the C++ layer, why not marshal a callback + context? when the C++ thread finishes processing the data, it calls the Java callback with the provided context and in the Java layer - you push the result back to the Java thread.
the C++ layer shouldn't know anything about how Java threads work - all it has to do is to call a callback and let the callback deal with the implementation details.
I've actually done this a few times in the past but in C# and P/Invoke, which easily allows you to marshal a C# function as a C-function pointer. I'm sure it's possible for JNI as well.

Do exceptions caught and handled by programming languages count as software interrupts?

From the beginning of https://en.wikipedia.org/wiki/Interrupt, it says that a software interrupt can be caused by an exceptional condition in the processor itself (often called a trap or exception).
In many programming languages (C++, Java, Python,...), there are language supports for catching and handling exceptions defined by default, and also exceptions self-defined. For example, try {...} catch .... Let me called both kinds of exceptions "language-supported exceptions" (because I don't know what is the right terminology).
Do the language-supported exceptions count as software interrupts?
When a language-supported exception happens, does the same thing
happen as for handling a software interrupt? Specifically, does the
cpu save the current process into stacks, and then switch to run the
OS kernel which then calls the exception handler, and after
finishing running the handler, resume running the saved process?

No, java language exception has nothing to do with software interrupts
Java language exception just initiates some exception handling code within same process and thread.

No.
An interrupt causes an interrupt handler to be invoked. Once that handler is complete the original code will continue executing from the place it was at when the interrupt happened.
Exceptions are handled in a catch block. Program flow is directly affected.
From your link:
The processor responds by suspending its current activities, saving its state, and executing a function called an interrupt handler (or an interrupt service routine, ISR) to deal with the event. This interruption is temporary, and, after the interrupt handler finishes, the processor resumes normal activities.

Question 1 No. As per your wiki reference one explanation is
The former is often called a trap or exception and is used for
errors or events occurring during program execution that are
exceptional enough that they cannot be handled within the program
itself.
You are able to handle any Java exception within your program. So that is one difference. A Java exception could be triggered by an exceptional condition within the processor but the exception handler in your program is responding to an event generated within the JVM, not directly responding to a software interrupt. The more conventional way to think of software interrputs is
Software interrupt instructions function similarly to subroutine
calls and are used for a variety of purposes, such as to request
services from low-level system software such as device drivers.
Do a little research about how to invoke BIOS or MS/DOS services using the INT x86 instruction (this generates a software interrupt).
Questionn 2. Not necessarily. The JVM can generate an exception that has nothing to do with an exceptional processor condition. Think null reference.

No. Exceptions don't count as software interrupts, nor do they act as software interrupts.
Specifically, language-supported exceptions don't need to call the operating system; a context switch is generally unnecessary. Instead, throwing an exception makes a call to user-side code which understands how to look for handlers, unwind the call stack, and so forth, for that particular language.
To look at it another way: a general-purpose operating system will not know or care about the language-specific details necessary to handle language-supported exceptions. Software interrupts fall under the category of the operating system's ABI, which does not need to be remotely similar to the internal standards of a given language implementation.

Why do we have exit codes in Java?

In Java, we use System.exit(int) to exit the program.
The reason for an "exit value" in C was that the exit value was used to check for errors in a program. But in Java, errors are reflected by an Exception being thrown, thus they can be handled easily. So why do we have exit values in Java at all?

exit values are returned to the calling program e.g. the shell. An Exception cannot be caught by an external program.
BTW When you throw an Exception it is caught by that thread or that thread dies, the finally blocks are still called for that thread. When you call System.exit(), all threads stop immediately and finally blocks are not called.

For the same reason.
Exit codes are exclusively used by parties and applications outside of the program for debugging and handling purposes. A super-application can definitely handle a return code better than trying to parse a stack trace.
Also, if you are creating an application for an end-user, you would much rather exit gracefully from your app than post a bunch of stack trace information, for a couple of reasons: one, you will just be scaring them with lots of crazy-looking techno-gibberish, and two, stack traces often reveal sensitive and confidential information about the way the program is structured fundamentally (giving a potential attacker more knowledge about the system).
For a real-world example, I was working on a Java Batch program which used exit codes for its jobs. A user could see whether the job executed successfully or not based on whether the exit code was "0". If it was anything else, they could contact technical support, armed with the additional information of the exit code, and the help desk would have all the necessary information based on that exit code to help them out. It works much nicer than trying to ask a non-technical end-user, "Okay, so what Exception are you getting?"

exit values are returned to the callers to signal the successful or insuccessful completion of the program. The caller may not be able to catch the exception and handle it accordingly.
For eg. 0 exit value means successful completion whereas non-zero return value means some error in execution.
Also, System.exit() will make all the threads in the application to stop at that point itself.

Long story short, Exit codes are simplified signals to the user who encounters an exception while running a Java program. Since we assume that most of the users do not understand stack trace data of an exception, these simple non zero custom code will tell them that something is wrong and this should be reported to the vendor. So the vendor gets the code and he knows the stack trace associated with that code and tries to repair the system. This is an abstraction provided by the programmers so that users don't have to read and report voluminous stack traces. A very good analogy here is the getErrorCode() method in SQLException class. This method also closes the current JVM that is running on the client machine. This implies that this terminates all the threads that are in the JVM. This method calls the exit method in the class Java.lang.Runtime. If you go to the documentation of this method, you will understand how virtual machine is shut down.
This is the link
http://docs.oracle.com/javase/6/docs/api/java/lang/Runtime.html#exit%28int%29

Multithreading, Multiprocessing with STOP and Continue Signals

I am working on a project where I need to get the native stack of the Java application. I am able to achieve this partially thanks to ptrace, multiprocessing, and signals.
On Linux, a normal Java application has, at a minimum, 14 threads. Out of these 14, I am interested in only the main thread of which I have to get the native stack. Considering this objective, I have started a separate process using fork() which is monitoring the native stack of the main thread. In short, I have 2 separate processes: one is being monitored and the other does the monitoring using ptrace and signal handling.
Steps in the monitoring process:
Get the main thread ID out of the 14 threads from the monitored process.
ptrace_attach on the main ID.
ptrace_cont on the main ID.
continuous loop starts
{
kill(main_ID, SIGSTOP)
nanosleep and check the status from the /proc/[pid]/stat directory.
ptrace_peekdata to read the stack and navigate.
ptrace_cont on the main ID.
nanosleep and check the status from the /proc/[pid]/stat directory.
}
ptrace_detach on the main ID.
This perfectly gives the native stack information continuously. However, sometimes I encounter an issue:
When I kill(main_ID, SIGSTOP) the main thread, the other threads from the process get into a finished or stoped state (T) and the entire process blocks. This is not the consistent behavior and sometimes entire process executes correctly. I cannot understand this behavior as i am only signaling the main thread. Why are the other threads affected?
Can someone help me analyze this problem?
I also tried sending SIGCONT and SIGSTOP to all of the threads of the process but the issue still occurs sometimes.
Thanks,
Sandeep

Assuming you are using Linux, you should be using tkill(2) or tgkill(2) instead of kill(2). On FreeBSD, you should use the SYS_thr_kill2 syscall. Per the tkill(2) manpage:
tgkill() sends the signal sig to the thread with the thread ID tid in
the thread group tgid. (By contrast, kill(2) can only be used to send
a signal to a process (i.e., thread group) as a whole, and the signal
will be delivered to an arbitrary thread within that process.)
Ignore the stuff about tkill(2) and friends being for internal thread library usage, it is commonly used by debuggers/tracers to send signals to specific threads.
Also, you should use waitpid(2) (or some variation of it) to wait for the thread to receive the SIGSTOP instead of polling on /proc/[pid]/stat. This approach will be more efficient and more responsive.
Finally, it appears that you are doing some sort of stack sampling. You may want to check out Google PerfTools as these tools include a CPU sampler that is doing stack sampling to obtain estimates of what functions are consuming the most CPU time. You could maybe reuse the work these tools have already done, as stack sampling can be tricky to make robust.

Java concurrency - Should block or yield?

I have multiple threads each one with its own private concurrent queue and all they do is run an infinite loop retrieving messages from it. It could happen that one of the queues doesn't receive messages for a period of time (maybe a couple seconds), and also they could come in big bursts and fast processing is necessary.
I would like to know what would be the most appropriate to do in the first case: use a blocking queue and block the thread until I have more input or do a Thread.yield()?
I want to have as much CPU resources available as possible at a given time, as the number of concurrent threads may increase with time, but also I don't want the message processing to fall behind, as there is no guarantee of when the thread will be reescheduled for execution when doing a yield(). I know that hardware, operating system and other factors play an important role here, but setting that aside and looking at it from a Java (JVM?) point of view, what would be the most optimal?

Always just block on the queues. Java yields in the queues internally.
In other words: You cannot get any performance benefit in the other threads if you yield in one of them rather than just block.

You certainly want to use a blocking queue - they are designed for exactly this purpose (you want your threads to not use CPU time when there is no work to do).
Thread.yield() is an extremely temperamental beast - the scheduler plays a large role in exactly what it does; and one simple but valid implementation is to simply do nothing.

Alternatively, consider converting your implementation to use one of the managed ExecutorService implementations - probably ThreadPoolExecutor.
This may not be appropriate for your use case, but if it is, it removes the whole burden of worrying about thread management from your own code - and these questions about yielding or not simply vanish.
In addition, if better thread management algorithms emerge in future - for example, something akin to Apple's Grand Central Dispatch - you may be able to convert your application to use it with almost no effort.

Another thing that you could do is use the concurrent hash map for your queue. When you do a read it gives you a reference of the object you were looking for, so it is possible you my miss a message that was just put into the queue. But if all this is doing is listening for a message you will catch it the next iteration. It would be different if the messages could be updated by other threads. But there doesn't really seem to be a reason to block that I can see.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.