Cloud Run graceful shutdown - java

I am following up on:
https://cloud.google.com/blog/topics/developers-practitioners/graceful-shutdowns-cloud-run-deep-dive
How to process SIGTERM signal gracefully in Java?
I have a CloudRun service which is doing some while cycles which currently seem not to end after the CloudRun revision is replaced. I guess the CloudRun manager is waiting for the revision to gracefully end, which unfortunately does not happen.
I tried adding:
Runtime.getRuntime().addShutdownHook()
And end the loops with this listener. Unfortunately there are two issues:
It is not possible to subscribe more shutdownHooks (each for one running loop) - getting errors of Hook already running. This is an implementation issue and I wonder if there is a different way to do this.
When I send the SIGTERM or SIGINT to locally running service, it ends immediately and the shutdownHook is never called. This is a logical issue and I am not sure if this is the way CloudRun ends the revisions - it seems not, otherwise the loops would be immediately ended like on my localhost.

Related

Why would a server process stop receiving requests?

I have a few java programs, Appserver1 and Appserver2, which handle command line arguments to external programs, and VocsServer, an intermediary to the MySql database.
These programs are always supposed to be running and waiting to receive requests. However, at seemingly random times they stop receiving requests. E.g. when one of my programs makes a database request, it just hangs because the VocsServer has not received the request.
The thing is, all these processes are still running. Nothing appears to have changed, but they stop working. The only thing that fixes them is killing the processes and running them again.
What could be causing them to fail?

wait with systemd until a service socket becomes available and then start a depended service

Currently I have slow starting java service in systemd which takes about 60 seconds until it opens its HTTP port and serves other clients.
Another client service expects this service to be available (is a client of the this service), otherwise it dies after a certain retry. It also started with systemd. This is to be clear also a service. But uses the former like database.
Can I configure systemd to wait until the first service has made his socket available? (something like if the socket is actually listens , then the second client service should start).
Initialization Process Requires Forking
systemd waits for a daemon to initialize itself if the daemon forks. In your situation, that's pretty much the only way you have to do this.
The daemon offering the HTTP service must do all of its initialization in the main thread, once that initialization is done and the socket is listening for connections, it will fork(). The main process then exits. At that point systemd knows that your process was successfully initialized (exit 0) or not (exit 1).
Such a service receives the Type=... value of forking as follow:
[Service]
Type=forking
...
Note: If you are writing new code, consider not using fork. systemd already creates a new process for you so you do not have to fork. That was an old System V boot requirement for services.
"Requires" will make sure the process waits
The other services have to wait so they have to require the first to be started. Say your first service is called A, you would have a Requires like this:
[Unit]
...
Requires=A
...
Program with Patience in Mind
Of course, there is always another way which is for the other services to know to be patient. That means try to connect to the HTTP port, if it fails, sleep for a bit (in your case, 1 or 2 seconds would be just fine) then try again, until it works.
I have developed both methods and they both work very well.
Note: A powerful aspect to this method, if service A gets restarted, you'd get a new socket. This server can then auto-reconnect to the new socket when it detects that the old one goes down. This means you don't have to restart the other services when restarting service A. I like this method, but it's a bit more work to make sure it's all properly implemented.
Use the systemd Auto-Restart Feature?
Another way, maybe, would be to use the restart on failure. So if the child attempts to connect to that HTTP service and fails, it should fail, right? systemd can automatically restart your process over and over again until it succeeds. It's sucky, but if you have no control over the code of those daemons, it's probably the easiest way.
[Service]
...
Restart=on-failure
RestartSec=10
#SuccessExitStatus=3 7 # if success is not always just 0
...
This example waits 10 seconds after a failure before attempting to restart.
Hack (last resort, not recommended)
You could attempt a hack, although I do not ever recommend such things because something could happen that breaks such... in the services, change the files so that they have a sleep 60 then start the main process. For that, just write a script like so:
#!/bin/sh
sleep 60
"$#"
Then in the .service files, call that script as in:
ExecStart=/path/to/script /path/to/service args to service
This will run the script instead of directly your code. The script will first sleep for 60 seconds and then try to run your service. So if for some reason this time the HTTP service takes 90 seconds... it will still fail.
Still, this can be useful to know since that script could do all sorts of things, such as use the nc tool to probe the port before actually starting the service process. You could even write your own probing tool.
#!/bin/sh
while true
do
sleep 1
if probe
then
break
fi
done
"$#"
However, notice that such a loop is blocking until probe returns with exit code 0.
You have several options here.
Use a socket unit
The most elegant solution is to let systemd manage the socket for you. If you control the source code of the Java service, change it to use System.inheritedChannel() instead of allocating its own socket, and then use systemd units like this:
# example.socket
[Socket]
ListenStream=%t/example
[Install]
WantedBy=sockets.target
# example.service
[Service]
ExecStart=/usr/bin/java ...
StandardInput=socket
StandardOutput=socket
StandardError=journal
systemd will create the socket immediately (%t is the runtime directory, so in a system unit, the socket will be /run/example), and start the service as soon as the first connection attempt is made. (If you want the service to be started unconditionally, add an Install section to it as well, with WantedBy=multi-user.target.) When your client program connects to the socket, it will be queued by the kernel and block until the server is ready to accept connections on the socket. One additional benefit from this is that you can restart the service without any downtime on the socket – connection attempts will be queued until the restarted service is ready to accept connections again.
Make the service signal readiness to systemd
Alternatively, you can set up the service so that it signals to systemd when it is ready, and order the client after it. (Note that this requires After=example.service, not just Requires=example.service! Dependencies and ordering are orthogonal – without After=, both will be started in parallel.) There are two main service types that might make this possible:
Type=forking: systemd will consider the service to be ready as soon as the main program exits. Since you can’t fork in Java, I think you would have to write a small shell script which starts the server in the background and then waits until the socket is available (while ! test -S /run/example; do sleep 1s; done). Once the script exits, the service is considered ready.
Type=notify: systemd will wait for a message from the service before it is considered ready. Ideally, the message should be sent from the service PID itself: check if you can call the sd_notify function from libsystemd via JNI/JNA/whatever (specifically, sd_notify(0, "READY=1")). If that’s not possible, you can use the systemd-notify command-line tool (--ready option), but then you need to set NotifyAccess=all in the service unit (by default, only the main process may send notifications), and even then it likely will not work (systemd needs to process the message before systemd-notify exits, otherwise it will not be able to verify which cgroup the message came from).

Tomcat8 kills my threads on shutdown

I created a webapplication that needs to do some cleanup on shutdown. This cleanup will take about a minute and its completely OK for it to do so.
When I deploy my webapp onto Tomcat 8 and then stop it, my ContextListener gets called and the cleanup begins. But it seems like Tomcat stops my thread the hard way and it won't complete anymore. At least on Tomcat 6 that wasn't an issue.
An ideas how to configure Tomcat 8 to stop from misbehaving?
Partial Answer:
I found out it has something to do with a performance optimization I did. I used startStopThreads="2" to start my applications in parallel, which works out well, but on shutdown this also seems to kill my threads.
If you have a task which is to be performed on shutdown, I would add this as shutdown hook. Most likely Tomcat 8 is called System.exit() which is a normal thing to do and this kills all user threads but start shutdown hooks.
A better solution is to never leave the system in a state where you really need this. i.e. you cannot assume an application will die gracefully.
if you are waiting for client to disconnect, I suggest you add a shutting down phase. During this phase you refuse new connections, move connections to another server or attempt to gracefully tell existing ones you are going away. After a short period or time out, you then shut down.

How to process servlet requests during long shutdown

We need to implement a graceful shutdown mechanism into our Servlet application.
EDIT: We want to make it simple as possible, which would be handling a kill signal sent via operating system's function. This would allow system admins to use built in shell utilities (kill or taskkill on Windows), otherwise they would have to install another utility just to "talk" with server.
This mechanism works in two phases:
upon shutdown request, deny certain critical activities
block until previously initiated critical actions are completed; these may take several hours
Phase #1 is implemented in our DAO layer.
Phase #2 is implemented in our ServletContextListener#contextDestroyed method
Our problem is that once contextDestroyed is called the Servlet container stops servicing further HTTP requests.
EDIT: contextDestroyed is called when someone is calling the operating system's kill function on server's process.
We would like to let the application alive during Phase #2, notifying the users that some activities are unavailable.
Use a filter to keep a list of all critical requests.
When the "prepare shutdown" request is received, the filter should start denying some requests.
Write a servlet that tells you how many critical jobs are still left in the queue.
In the shutdown tool, send the "prepare shutdown". The poll the servlet for the number of critical jobs. When this reaches 0, send the actual shutdown command.
To make this happen, create a service in the business layer which orchestrates this. Note that everything must happen before contextDestroyed() is being called! Your special application shutdown doesn't fit into the J2EE view of the world, so you have to manage it yourself.
The service should be able to tell interested parties when a shutdown is in progress, how many critical jobs are still running, etc. Servlets and filters can then use this service to deny requests or tell how many jobs are left.
When all jobs are done, deny all requests except access to the "shutdown info" servlet which should then tell that the app is now ready for death.
Write a tool which gives the administrators a nice UI to initiate shutdown of your app.
[EDIT] You may feel tempted to prevent the OS from shutting down your application. Don't do that.
What you should do is write a special tool to shut down your application using the two phase process that I described above. This should be the standard way to shutdown.
Yes, administrators will complain about it. On Unix, you can hide this tool by putting it into the init script, so no one will notice. There might be a similar solution on Windows.
Killing the server should always be possible to be able to stop it in case of (un)expected circumstances like: bugs in your shutdown code, emergency shutdown during power failure, bugs in your application code, or when Murphy happens.

How can I get this external JMS client to terminate?

I'm working through the 'Simple Point-to-Point Example' section of the Sun JMS tutorial (sender source, receiver source), using Glassfish as my JMS provider. I've set up the QueueConnectionFactory and Queue in the Glassfish admin UI, and added the relevant JARs to my classpath and the receiver is receiving the messages sent by the sender.
However, neither sender nor receiver terminate. The main thread exits normally (after successfully calling queueConnection.close()) but two non-daemon threads are left hanging around:
iMQReadChannel-0
imqConnectionFlowControl-0
It seems (from this java.net thread) that the reason is that queueConnection.close() just returns the connection to the pool, rather than really closing it. I can't find any way to tell the pool to shutdown, so the only option I'm left with is System.exit(), which feels wrong.
I've tried setting the minimum pool size to 0, the maximum pool size to 1 and the idle timeout to 10 seconds but it seems to make no difference. Even when I just lookup the connection factory and don't ask for a connection, these two threads are still started and don't terminate.
Any help much appreciated!
Why don't you simply terminate with a System.exit(0)? Given the sample, the current behavior is correct (a Java program terminates when all non-daemon threads end).
Maybe you can have the samples shutting down properly by playing with client library's properties (idle time, etc...), but it seems others ( http://www.nabble.com/Simple-JMS-Client-doesn%27t-quit-td15662753.html) still experience the very same problem (and, anyway, i still don't understand what the point is).
Good news for us. "Will not fixed"
http://java.net/jira/browse/GLASSFISH-1429?focusedCommentId=85555&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_85555

Categories