JVM hangs on exit - java

I am working on a small app that should sign documents using digital signature and quit.
The signature can be in a PKCS#12 archive (.pfx file) or on a smartcard device.
Working with the pfx file is easy and working fine.
However, sometimes using the smartcard device, the process hangs on Windows 8 PCs.
The document is signed correctly, but the process doesn't terminate. It just hangs.
I'm using the Sun's PKCS#11 provider - sun.security.pkcs11.SunPKCS11
Basically I'm doing this:
SunPKCS11 provider = new SunPKCS11(configuration);
Security.addProvider(provider);
..... some work .....
provider.logout()
Security.removeProvider(provider);
Now... even if I call System.exit(0) or throw an exception at the end of the main method, I can see the stacktrace in the output but the process doesn't terminate.
I've added a shutdown hook to see if it is executed and it is, i.e. the JVM is trying to stop.
The hang occures rarely, only on Windows 8 PCs. Tried with different smartcards and it happens only with cards that use cmp11.dll (dlls are provided from the vendors of the smartcards).
Using the same dll for communication with the smartcard, however, works fine on Windows 7, XP or some Windows 8 PCs
Running it with Java 8, Update 45, on either x86 or x64 Windows 8
Tried to get a thread dump to see what is hanging:
public static void main(String[] args) {
// do my job, register provider, sign documents, remove provider ...
for(int i = 0; i < 20; ++i) {
System.err.println("Sleep... " + i);
Thread.sleep(2 * 1000);
}
System.err.println("Exiting...");
}
If I execute jstack -l 3232 > dump.log 2>&1 when Sleep... x is printing, everything looks OK.
However, if I execute jstack -F -l 3232 > dump2.log 2>&1 when Exiting... is printed and the app hangs (using -F because the process hangs), i got the following:
Attaching to process ID 3232, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.45-b02
Deadlock Detection:
No deadlocks found.
Thread Exception in thread "main"
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
at sun.tools.jstack.JStack.main(JStack.java:106)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: Windbg Error: GetThreadIdBySystemId failed!
at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.getThreadIdFromSysId0(Native Method)
at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.getThreadIdFromSysId(WindbgDebuggerLocal.java:284)
at sun.jvm.hotspot.debugger.windbg.amd64.WindbgAMD64Thread.getThreadID(WindbgAMD64Thread.java:88)
at sun.jvm.hotspot.debugger.windbg.amd64.WindbgAMD64Thread.toString(WindbgAMD64Thread.java:81)
at java.lang.String.valueOf(String.java:2982)
at java.io.PrintStream.print(PrintStream.java:683)
at sun.jvm.hotspot.runtime.win32_amd64.Win32AMD64JavaThreadPDAccess.printThreadIDOn(Win32AMD64JavaThreadPDAccess.java:114)
at sun.jvm.hotspot.runtime.JavaThread.printThreadIDOn(JavaThread.java:265)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:79)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
... 6 more
I can see the process with PID 3232 in the task manager!
Any idea why it is not terminating or why jstack fails?
EDIT
Ok, tried to extract the signing in a separate process, execute it with Runtime.exec and then kill it with Process.destroy but... doesn't seem to help. The child process still stays in the task manager.
Aaaaand... now I have no other choice but to make it kill itself ;(
try {
String name = java.lang.management.ManagementFactory.getRuntimeMXBean().getName();
Runtime.getRuntime().exec("taskkill.exe /F /PID " + name.split("#")[0]);
}
catch(Throwable t) {
Runtime.getRuntime().exec("taskkill.exe /F /IM java.exe");
}
EDIT 2
Tried with Runtime.halt as well. Still doesn't terminate the process...
I would appreciate any ideas!

This won't address your root cause, but this method can be used to force the JVM to terminate:
http://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#halt(int)
As the Javadoc says, use with extreme caution ;-)

I have some problem with sun.security.pkcs11.SunPKCS11 on Windows 8 PCs.This is working for me:
Runtime.getRuntime().exec("taskkill.exe /F /PID " + name.split("#")[0]);
Thread.sleep(500);

Related

Zeppelin on WSL. java.io.IOException: Fail to launch interpreter process

I am new to Zeppelin and want to install it on my Windows10Pro/WSL machine.
These installation scripts are used https://github.com/x4ax/lxss-install-zeppelin .
Since it is three years old, I had to modify it a bit, so I have:
Ubuntu 20.04, bash
zeppelin-0.9.0-bin-all
hadoop-3.3.0
spark-3.0.1-bin-hadoop2.7
I link python3 to python
Everything is installed, hadoop and spark are successfully tested with provided scripts. At the end I managed to see "Welcome to Zeppelin!" landing page.
First, I go to the provided tutorial python notes, "1.IPython Basic" and run first cell with %md only. I get the error message":
"
org.apache.zeppelin.interpreter.InterpreterException: java.io.IOException: Fail to launch interpreter process:
null
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:129)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:271)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:444)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:72)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:182)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Fail to launch interpreter process:
null
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:126)
at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:68)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:104)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:154)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:126)
... 13 more
Info from md-shared.log
INFO [2021-01-14 18:54:46,610] ({RemoteInterpreterServer-Thread} RemoteInterpreterServer.java[run]:193) - Launching ThriftServer at 169.254.120.3:52579
INFO [2021-01-14 18:54:47,785] ({RegisterThread} RemoteInterpreterServer.java[run]:609) - Registering interpreter process
ERROR [2021-01-14 18:54:47,790] ({RegisterThread} RemoteInterpreterServer.java[run]:613) - Error while registering interpreter: RegisterInfo(host:169.254.120.3, port:52579, interpreterGroupId:md-shared_process), cause: {}
java.lang.RuntimeException: java.io.IOException: org.apache.zeppelin.shaded.org.apache.thrift.transport.TTransportException: java.net.SocketException: Network is unreachable (connect failed)
Which means that there is some network problem
Steps to find solution:
From logs I see the command like this, which is run by RemoteInterpreter java-object:
/usr/local/zeppelin/bin/interpreter.sh -d /usr/local/zeppelin/interpreter/md -c 169.254.120.3 -p 52579 -r : -i md-shared_process -l /usr/local/zeppelin/local-repo/md -g md
It runs silently.
zeppelin-daemon.sh start/stop run ok. 'status' also shows correct status. So, restart does not help.
Reinstall of zeppelin and of wsl did not help.
I also tested problem while firewall was shutdown.
I am puzzled.
By looking here Hello world in zeppelin failed I managed to run md interpreter by fixing in conf/zeppelin-env.sh
ZEPPELIN_LOCAL_IP=127.0.0.1
I saw some statements (I cannot confirm them!) that:
Microsoft WSL blocks random addressing or
WSL listens to localhost only if it is really local.
PS. Now, I have difficulty to run python interpreter, but that is another problem.
(Besides, it can be linked to my aliasing of python with python3 or similar shell settings).

Groovy cannot run batch command "Caused: java.io.IOException: Cannot run program "cmd.exe": error=2, No such file or directory"

I'm trying to run some command on a Windows agent with cmd using the groovy String.execute() in a Jenkins pipeline.
I know that there exists the bat script which I can use and I intentionally don't want to use it in this specific case since it bloats the logs and the Blue Ocean plugin has a limit of number of steps it can shows for any stage.
Basically what I have is basically some cleanup function that I call very often which does a lot of checks and runs multiple commands (at the moment with sh and bat in addition to isUnix() and similar). The result is usually a very bloated logs of steps like sh and bat which makes the logs larger and more difficult to analyze.
I decided to use some native way of running shell commands "silently" and only print its stdout and stderr in case the command fails.
I have written such a function that executes the command
# NonCPS
def exec(cmd) {
def isUnix = Jenkins.instance.getNodes().find { it.getNodeName() == env.NODE_NAME }.toComputer().isUnix()
cmd = (isUnix ? ["/bin/bash", "-c"] : ["cmd.exe", "/c"]) + [cmd]
def proc = cmd.execute()
proc.waitFor()
def result = [out: proc.in.text.trim(), err: proc.err.text.trim(), exitCode: proc.exitValue()]
return result
}
The above function works perfectly for linux agents but not on Windows. I always get the following error:
java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at java.base/java.lang.Runtime.exec(Runtime.java:591)
at java.base/java.lang.Runtime.exec(Runtime.java:415)
at java.base/java.lang.Runtime.exec(Runtime.java:312)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:47)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
Caused: java.io.IOException: Cannot run program "cmd.exe": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at java.base/java.lang.Runtime.exec(Runtime.java:591)
at java.base/java.lang.Runtime.exec(Runtime.java:415)
at java.base/java.lang.Runtime.exec(Runtime.java:312)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Or mainly the following line Cannot run program "cmd.exe": error=2, No such file or directory.
I tried replacing "cmd.exe" with the full path "C:\Windows\System32\cmd.exe" or "C:\Windows\SysWOW64\cmd.exe", replace cmd.exe with ["start", "cmd.exe", "/C"], using Runtime.getRuntime().exec() and many others and I cannot make it work.
If I do "git version".execute(), that works fine but I cannot run some more complex commands like "echo hi && git version".execute() which will print "hi && git version" instead of "hi" and the real git version. I thought maybe the PATH environment variable is not set properly so I ran the following command "set".execute() which resulted in the same error above ("set" no such file or directory).
I have been trying various combinations of this for the past 2 days and that is not going anywhere.
Any help is much appreciated. Thanks in advance.
Update:
Found the reason why it is failing on Windows but not on Linux nodes with Jenkins and hit another "harder" wall here.
The problem is that String.exucute() method behaves internally like Runtime.getRuntime().exec() and this uses the runtime environment from the master node even if encapsulated within the node() directive. This basically means that when it was running fine on linux, it was actually running all the time on the master node which is also linux. The reason why it cannot run cmd or start or any similar is because those commands (obviously) do not exist on linux.

Shell script not recording Java exit status on failure

I am trying to write a shell script that records the exit status of a Java program. The script should simple launch a Java app, and if the Java app doesn't run for some reason, the shell script should detect this and take mitigating measures.
The following is my script:
#!/bin/bash
APPNAME="app"
APPFOLDER=$APPNAME
BACKUP=$APPFOLDER"-backup"
LOGFOLDER=$APPNAME"-log"
echo "Starting new app"
java -jar $APPFOLDER/$APPNAME*.jar > $LOGFOLDER/$APPNAME"_$(date+%Y.%m.%d.%s).log"
wait
STATUS=$?
if [ $STATUS -eq 0 ]
then
echo "Deployment successful" $?
else
echo "Deployment failed: ... derp" $?
fi
I have written a simple Swing GUI that runs fine. However, I packaged it as a jar without specifying an entry point. Hence, I should get the error:
Exception in thread "main" java.lang.NoClassDefFoundError: Demo$1
and the script should detect that the application failed to start.
All of this works FINE until I try to launch the Java app in the background using &. Whenever I do this:
java -jar $APPFOLDER/$APPNAME*.jar > $LOGFOLDER/$APPNAME"_$(date+%Y.%m.%d.%s).log" &
the script always returns a 0 for $?, indicating it passed.
What am I doing wrong? Is there a better way to go about detecting if the app failed to launch?
Thanks!
Wait! you are recording the exit status of wait!
This is why you see unexpected result with your script. Look at the man page for bash (wait is a bash built-in so you need to read the bash manual):
wait [-n] [n ...]
Wait for each specified child process and return its termination status. Each n may be a process ID... If n is not given, all currently active child processes are waited for, and the return status is zero(!). If n specifies a non-existent process or job, the return status is 127. Otherwise, the return status is the exit status of the last process ... waited for.
Since you have not specified the n (child pid to wait for) the return status is zero as per spec.
Another question is: do you really need a wait.
If you don't need to run your app in the background then just do this:
echo "Starting new app"
java -jar $APPFOLDER/$APPNAME*.jar > $LOGFOLDER/$APPNAME"_$(date+%Y.%m.%d.%s).log"
STATUS=$?
the only difference is that i removed unnecessary wait.
If for some reason you need to run your app in the background and read exit status later, then you need wait for that pid. To find out the pid of the last background process use special variable $!:
echo "Starting new app"
java -jar $APPFOLDER/$APPNAME*.jar > $LOGFOLDER/$APPNAME"_$(date+%Y.%m.%d.%s).log" &
CHILDPID=$!
wait "${CHILDPID}"
STATUS=$?
Here's short example of how it works:
user#s:~$ (sleep 10 && exit 42)&
[1] 27792
user#s:~$ wait "$!"
[1]+ Exit 42 ( sleep 10 && exit 42 )
user#s:~$ echo $?
42
What I want to know is if the app fails on startup or not. In the case of the former, my script would bag up the app and role out the previous version.
This purpose is too vague. Are you only interested in missing dependencies?
I don't think there is an easy way to distinguish between JRE non-zero exit code and you java application non-zero exit-code.
I can imagine lots of other reasons to unroll deployment many of which do not lead to non-zero exit code.

Finding total file descriptors throws exception

I'm trying to find total file descriptors and found that sigar api allows to get those information. However while trying to do the below
Sigar sigar = new Sigar();
sigar.getProcFd(<pid>);
replaced the pid with an actual process if, throws the following exception:
org.hyperic.sigar.SigarNotImplementedException: This method has not been implemented on this platform
at org.hyperic.sigar.SigarNotImplementedException.<clinit>(SigarNotImplementedException.java:28)
at org.hyperic.sigar.ProcFd.gather(Native Method)
at org.hyperic.sigar.ProcFd.fetch(ProcFd.java:30)
at org.hyperic.sigar.Sigar.getProcFd(Sigar.java:531)
From the exception it's clear that the native Method - gather() hasn't been implemented/available on my OS (Mac OS X). How do I fix this? I tried adding the "libsigar-universal64-macosx.dylib" to the classpath but with no luck.
Also, I tried creating ProcFd like below instead of getting it from sigar:
ProcFd proc = new ProcFd();
System.out.println("Total FD: " + proc.getTotal());
In this case the output is always 0. Based on the api doc it looks like it should be providing the total number of open file descriptor (http://cpansearch.perl.org/src/DOUGM/hyperic-sigar-1.6.3-src/docs/javadoc/org/hyperic/sigar/ProcFd.html). Not sure if it's returning 0 because of the same reason as above i.e. missing implementation for my OS. Is that correct?
Also, wondering why is that when ProcFd is got using "sigar.getProcFd()" it throws the above mentioned exception. But when created using "ProcFd proc = new ProcFd()" it doesn't, however proc.getTotal() always returns 0?
I ended up using lsof in shell script instead of using sigar library. Never got this to work on mac. I tried in Linux and it worked without any issues.
The answer is in the documentation (http://cpansearch.perl.org/src/DOUGM/hyperic-sigar-1.6.3-src/docs/javadoc/org/hyperic/sigar/ProcFd.html), and as per your finding: OSX is not supported.
getTotal
public long getTotal()
Get the Total number of open file descriptors.
Supported Platforms: AIX, HPUX, Linux, Solaris, Win32.
System equivalent commands:
AIX: lsof
Darwin: lsof
FreeBSD: lsof
HPUX: lsof
Linux: lsof
Solaris: lsof
Win32:
Returns:
Total number of open file descriptors

Bash ignore exit from Java app

I have a simple BASH script that wraps a java program with the intention of restarting it if that application crashes:
STOP=0
while [ "$STOP" -eq 0 ]
do
echo "Starting"
exec java com.site.app.Worker
echo "Crashed"
sleep 3
done
However if the Java process exits it also quits the bash script so the process is never started again.
E.g. (pointing at a fake class):
$ ./RestartApp.ksh
Starting
Exception in thread "main" java.lang.NoClassDefFoundError: com/site/app/Worker
Caused by: java.lang.ClassNotFoundException: com.site.app.Worker
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.site.app.Worker. Program will exit.
$
Is there a way I can catch the errors (but still display them) to allow the script to continue running?
Remove the exec. That's completely replacing the current process (your shell) with the Java VM.
Just remove that and it should work fine.
As Mat said, what exec does is to replace the current shell process by the Java process. It it fails, there is no-one waiting for it to relaunch it. exec can be a very useful and professional tool to use, but it is rather advanced.
An example of a right use for it would be a script that sets variables or priorities in the current shell, and then exec's the process you are wrapping.
The variable "STOP" does not seem to be used. I would simply go for:
while ! java com.site.app.Worker
do
echo Failed: Sleeping and restarting >&2
sleep 3
done

Categories