Why would a server process stop receiving requests? - java

I have a few java programs, Appserver1 and Appserver2, which handle command line arguments to external programs, and VocsServer, an intermediary to the MySql database.
These programs are always supposed to be running and waiting to receive requests. However, at seemingly random times they stop receiving requests. E.g. when one of my programs makes a database request, it just hangs because the VocsServer has not received the request.
The thing is, all these processes are still running. Nothing appears to have changed, but they stop working. The only thing that fixes them is killing the processes and running them again.
What could be causing them to fail?

Related

Cloud Run graceful shutdown

I am following up on:
https://cloud.google.com/blog/topics/developers-practitioners/graceful-shutdowns-cloud-run-deep-dive
How to process SIGTERM signal gracefully in Java?
I have a CloudRun service which is doing some while cycles which currently seem not to end after the CloudRun revision is replaced. I guess the CloudRun manager is waiting for the revision to gracefully end, which unfortunately does not happen.
I tried adding:
Runtime.getRuntime().addShutdownHook()
And end the loops with this listener. Unfortunately there are two issues:
It is not possible to subscribe more shutdownHooks (each for one running loop) - getting errors of Hook already running. This is an implementation issue and I wonder if there is a different way to do this.
When I send the SIGTERM or SIGINT to locally running service, it ends immediately and the shutdownHook is never called. This is a logical issue and I am not sure if this is the way CloudRun ends the revisions - it seems not, otherwise the loops would be immediately ended like on my localhost.

weblogic freeze and block all processing thread

We deploy our web application onto weblogic server 10.3.6. We try to run a stress test on our server. But sometime there a thread which do something we do not know yet, making the whole weblogic server freeze and does not process any request nor running thread, except only that thread.
We try and find out one of the reason is jxl library which call GC when read, create new file. After we disable it, our application run much more smoother. But it still freeze for some other reason.
So i want to ask is there a way to find out what our server doing at freezing time? Or is there any possible reason for the whole server to freeze like when they call GC?

wait with systemd until a service socket becomes available and then start a depended service

Currently I have slow starting java service in systemd which takes about 60 seconds until it opens its HTTP port and serves other clients.
Another client service expects this service to be available (is a client of the this service), otherwise it dies after a certain retry. It also started with systemd. This is to be clear also a service. But uses the former like database.
Can I configure systemd to wait until the first service has made his socket available? (something like if the socket is actually listens , then the second client service should start).
Initialization Process Requires Forking
systemd waits for a daemon to initialize itself if the daemon forks. In your situation, that's pretty much the only way you have to do this.
The daemon offering the HTTP service must do all of its initialization in the main thread, once that initialization is done and the socket is listening for connections, it will fork(). The main process then exits. At that point systemd knows that your process was successfully initialized (exit 0) or not (exit 1).
Such a service receives the Type=... value of forking as follow:
[Service]
Type=forking
...
Note: If you are writing new code, consider not using fork. systemd already creates a new process for you so you do not have to fork. That was an old System V boot requirement for services.
"Requires" will make sure the process waits
The other services have to wait so they have to require the first to be started. Say your first service is called A, you would have a Requires like this:
[Unit]
...
Requires=A
...
Program with Patience in Mind
Of course, there is always another way which is for the other services to know to be patient. That means try to connect to the HTTP port, if it fails, sleep for a bit (in your case, 1 or 2 seconds would be just fine) then try again, until it works.
I have developed both methods and they both work very well.
Note: A powerful aspect to this method, if service A gets restarted, you'd get a new socket. This server can then auto-reconnect to the new socket when it detects that the old one goes down. This means you don't have to restart the other services when restarting service A. I like this method, but it's a bit more work to make sure it's all properly implemented.
Use the systemd Auto-Restart Feature?
Another way, maybe, would be to use the restart on failure. So if the child attempts to connect to that HTTP service and fails, it should fail, right? systemd can automatically restart your process over and over again until it succeeds. It's sucky, but if you have no control over the code of those daemons, it's probably the easiest way.
[Service]
...
Restart=on-failure
RestartSec=10
#SuccessExitStatus=3 7 # if success is not always just 0
...
This example waits 10 seconds after a failure before attempting to restart.
Hack (last resort, not recommended)
You could attempt a hack, although I do not ever recommend such things because something could happen that breaks such... in the services, change the files so that they have a sleep 60 then start the main process. For that, just write a script like so:
#!/bin/sh
sleep 60
"$#"
Then in the .service files, call that script as in:
ExecStart=/path/to/script /path/to/service args to service
This will run the script instead of directly your code. The script will first sleep for 60 seconds and then try to run your service. So if for some reason this time the HTTP service takes 90 seconds... it will still fail.
Still, this can be useful to know since that script could do all sorts of things, such as use the nc tool to probe the port before actually starting the service process. You could even write your own probing tool.
#!/bin/sh
while true
do
sleep 1
if probe
then
break
fi
done
"$#"
However, notice that such a loop is blocking until probe returns with exit code 0.
You have several options here.
Use a socket unit
The most elegant solution is to let systemd manage the socket for you. If you control the source code of the Java service, change it to use System.inheritedChannel() instead of allocating its own socket, and then use systemd units like this:
# example.socket
[Socket]
ListenStream=%t/example
[Install]
WantedBy=sockets.target
# example.service
[Service]
ExecStart=/usr/bin/java ...
StandardInput=socket
StandardOutput=socket
StandardError=journal
systemd will create the socket immediately (%t is the runtime directory, so in a system unit, the socket will be /run/example), and start the service as soon as the first connection attempt is made. (If you want the service to be started unconditionally, add an Install section to it as well, with WantedBy=multi-user.target.) When your client program connects to the socket, it will be queued by the kernel and block until the server is ready to accept connections on the socket. One additional benefit from this is that you can restart the service without any downtime on the socket – connection attempts will be queued until the restarted service is ready to accept connections again.
Make the service signal readiness to systemd
Alternatively, you can set up the service so that it signals to systemd when it is ready, and order the client after it. (Note that this requires After=example.service, not just Requires=example.service! Dependencies and ordering are orthogonal – without After=, both will be started in parallel.) There are two main service types that might make this possible:
Type=forking: systemd will consider the service to be ready as soon as the main program exits. Since you can’t fork in Java, I think you would have to write a small shell script which starts the server in the background and then waits until the socket is available (while ! test -S /run/example; do sleep 1s; done). Once the script exits, the service is considered ready.
Type=notify: systemd will wait for a message from the service before it is considered ready. Ideally, the message should be sent from the service PID itself: check if you can call the sd_notify function from libsystemd via JNI/JNA/whatever (specifically, sd_notify(0, "READY=1")). If that’s not possible, you can use the systemd-notify command-line tool (--ready option), but then you need to set NotifyAccess=all in the service unit (by default, only the main process may send notifications), and even then it likely will not work (systemd needs to process the message before systemd-notify exits, otherwise it will not be able to verify which cgroup the message came from).

Tyrus sometimes stuck when trying to connect

I'm using the tyrus-standalone-client-1.12.jar to maintain a connection to a Websocket server (or set of servers) I have no control over. I'm creating a ClientManager instance that I configure and then use clientManager.asyncConnectToServer(this, new URI(server)), where this is the instance of a class with annotated methods like #OnOpen, #OnMessage and so on.
I also have a ClientManager.ReconnectHandler registered that handles onDisconnect and onConnectFailure and of course outputs debug messages.
Most of the time it connects just fine, but especially when the server has issues and I loose connection, reconnecting sometimes doesn't work.
I first noticed it when I simply returned true in onDisconnect and it just wouldn't reconnect sometimes (which in this case the ReconnectHandler should have done for me, which it usually did as well). The rest of the program keeps running just fine, but my websocket client just wouldn't do anything after the debug message in onDisconnect.
Since then I changed it to only use the ReconnectHandler to just connect again via asyncConnectToServer (on a delay), in order to be able to switch to another server (I couldn't find a way to do that with just the ReconnectHandler). Even then, the asyncConnectToServer sometimes just seems to not do anything. I'm sure it does something, but it doesn't output the debug message in onConnectFailure and also doesn't call onOpen, even after hours, so the client ends up just being stuck there.
Again, this isn't always the case. It can reconnect just fine several times, both triggered by onDisconnect or by onConnectFailure, and then on one attempt suddenly just hang. When I had two instances of the program running at the same time, both reconnected a few times and then both hang on asyncConnectToServer at the same reconnect attempt, which for me seems to indicate that it is caused by some state of the server or connection.
One time it even failed to connect when intially connecting (not reconnecting), during the same time where the server seemed to have issues.
Does anyone have an idea what could cause this?
Am I missing some property to set a connection attempt timeout?
Am I missing some way to retrieve connection failure info other than ReconnectHandler.onConnectFailure?
Am I even allowed to reuse the same ClientManager to connect several times (after the previous connection closed)?
Could there be something in my client endpoint implementation that somehow prevents onOpen or onConnectFailure to be called?
It's kind of hard to tell what's going on without getting any error message and without being able to reproduce it. I used JConsole's Detect Deadlock button on the program with a client hanging like this and it didn't detect anything (not sure if it would, but I thought I'd mention it).

Need redirecting output from multi-threaded socket server to other clients.

What I have is a multi threaded socket server listening for clients. New thread is created and started for opened connections. Clients can ask a server to execute some commands via Runtime .exec() method. Any new command received is handled by new thread (with PrintWriter passed as a parameter) and all the output (std/err) is send over the socket with PrintWriter.
The problem is that when the command takes longer (i.e. daemon) and the client disconnects for any reason I can't get the output anymore. I need to find a way of getting the output from that command execution thread on another connection (new client session which will be on another thread).
I could try to send all the output from commands to System.out and try to send it (System.out) over socket with PrintWriter (I don't know how to do this). And if I'm sucessfull maybe there is a way of sending all the such an output to every connected clients.
But then, I'm saving all the output to the database and in case of multiple clients connected I would end up having multiple inputs in my database.
Please give me some ideas as how I could go about with this issue. Thanks
You probably want to make your calls asynchronous. Executing tasks of unknown duration should never be made synchronously.
I would consider using a "reactor"-type server (i.e.: one thread per client = quick death) and using some type of message passing mechanism for long running transactions. There are a lot of middlewares that do this kind of work, it really depends on what platform you're on.
By the way, using a socket connection to execute command on a remote machine is a security flaw, but you probably already know that!
So, did you consider using a session ID for each connection? This ID will be associated with the output of each execution. So the same output could be retrieved on a subsequent call from the same user. Temporarily, the output could be stored at a repository (e.g. DB, memory, file).
Please correct me if I am not getting your question properly.

Categories