How to catch lost connection event in netty channel handler - java

I'm working on an old app that is using Netty to connect to a couple of remote TCP endpoints.
The app contains an implementation of IdleStateAwareChannelHandler and overrides several methods provided by it and SimpleChannelHandler (channelConnected, channelIdle, messageReceived, exceptionCaught, channelClosed).
This implementation is not able to cope with the scenario where the application server loses connection towards the remote server while my application is running.
I have hoped that introducing my own custom implementation of channelDisconnected() method would allow me to react to connection loss, but in practice I'm seeing something different:
I simulate connection loss by removing my application server from my network, thus cutting it off from both incoming and outgoing traffic
I leave it isolated for 5-10 min and observe the logs
Then I bring back the application server back to network
Only once I have restored the application server to the network I start seeing my debug logs from exceptionCaught and channelDisconnected methods
While the machine is isolated, I see from the logs that channelIdle method is being invoked regularly
Question: Is it possible to isolate and react on connection loss in my channel handler ?
Additional Info:
Netty version: 3.2.7.Final

Related

App with Event logger on port:8080 listening calls from API port:8090 in SpringBoot

I'm trying to create an app with notification service whenever a call is made on API.
Is it possible for me to create a logger on port:8080 and when app is run on the server it listens to api running on another server.
Both applications are run on local machine for testing purposes using Docker.
So far I've been reading https://www.baeldung.com/spring-boot-logging in order to implement it but I'm having problems with understanding the path mapping.
Any ideas?
First let's name the two applications:
API - the API service that you want to monitor
Monitor - which wants to see what calls are made to (1)
There are several ways to achieve this.
a) Open up a socket on Monitor for inbound traffic. Communicate the IP address and socket port manually to the API server, have it open up the connection to the Monitor and send some packet of data down this "pipe". This is the lowest level approach simple, but very fragile as you have to coordinate the starting of the services, and decide on a "protocol" for how the applications exchange data.
b) REST: Create a RESTful controller on the Monitor app that accepts a POST. Communicate the IP address and port manually to the API server. Initiate a POST request to the Monitor app when needed. This is more robust but still suffers from needing careful starting of the servers
c) Message Queue. install a message queue system like RabbitMQ or ActiveMQ (available in Docker containers). API server publishes a message to a Queue. Monitor subscribes to the Queue. Must more robust, still requires each application to be advised of the address of the MQ server, but now you can stop/start the two applications in any order
d) The java logging article is good started into java logging. Most use cases log to a local file on the local server. There are some implementations of backend logging that send logs to remote places (I don't think that article covers them), and there are ways of adding your own custom receiver of this log traffic. In this option, on the API side, it would use ordinary logging code with no knowledge of the downstream consumption of the logging. Your monitor app would need to integrate tightly into a particular logging system with this approach.

Application resilience impact using Logstash/Graylog log appender

I have some question about the gelf module (http://logging.paluch.biz/) and in particular when the graylog server is not available for some reason.
Is log4j will cache the logs somewhere and will send them when the connection to the graylog is recovered?
Is the application using this module will stop to work during the issue with graylog server?
Thanks.
Gelf-Appenders are online appenders without a cache. They connect directly do a remote service and submit log events as your application produces these.
If the remote service is down, log events get lost. There are a few options with different impacts:
TCP: TCP comes with transport reliability and requires a connection. If the remote service becomes slow/unresponsive, then your application gets affected, as soon as I/O buffer are saturated. logstash-gelf uses NIO in a non-blocking way if all data was sent. If the TCP connection drops, then you will run into connection timeouts, if the remote side is not reachable or connection refused states if the remote port is closed. In any case, you get reliability, but it will affect your application performance.
UDP: UDP has no connection notion, it's used for fire-and-forget communication. If the remote side becomes unhealthy, your application usually is not affected, but you encounter log event loss.
Redis: You can use Redis as an intermediate buffer if your Graylog instance is known to fail/been taken down for maintenance. Once Graylog is available again, it should catch up, and you prevent log event loss to some degree. If your Redis service becomes unhealthy, see Point 1.
HTTP: HTTP is another option that gives you a degree of flexibility. You can put your Graylog servers behind a load-balancer to improve availability and reduce the risk of failure. Log event loss is still possible.
If you want to ensure log continuity and reduce the probability of log event loss, then write logs to disk. It's still no 100% guarantee against loss (disk failure, disk full) but improves application performance. The log file (ideally some JSON-based format) can then be parsed and submitted to Graylog by maintaining a read offset to recover from a remote outage.

How could a server check the availability of a client?

I have classic http client/server application where the server serves the clients data at their will but also performs some kind of call-backs to the list of clients' addresses it has. My two questions are :
1- How would the server know if a client is down (the client did not disconnect but the connection got suddenly interrupted) ?
2- Is there a way to know from the server-side if the process at client-side listening on the call-back port is still up (i.e. client call-back socket is still open) ?
1- How would the server know if a client is down (the client did not disconnect but the connection got suddenly interrupted) ?
Option #1: direct communication
Client tells server "I'm alive" at a periodic interval. You could make your client to ping your server at a configurable interval, and if the server does not receive the signal for a certain time, it'll mark the client as down. Client could even tell server more info(e.g. It's status) in each heartbeat if necessary, this is also the way used in many distributed systems(e.g. Hadoop/Hbase).
Option #2: distributed coordination service
You could treat all clients connected to a server as a group, and use a 3rd party distributed coordination service like Zookeeper to facilitate the membership management. Client registers itself to Zookeeper as a new member of the group right after booting up, and leaves the group if it's down. Zookeeper notifies the server whenever the membership changes.
2- Is there a way to know from the server-side if the process at client-side listening on the call-back port is still up (i.e. client call-back socket is still open) ?
I think this can only be done by the way Option #1 listed above. It could be either the way clients tell server "My callback port is OK" at a fixed interval, or the server asks clients "Are your callback port OK?" and wait its response at a fixed interval
You would have to establish some sort of protocol; and simply spoken: the server keeps track of "messages" that it tried to sent to clients.
If that "send" is acknowledged, fine; if not: then the server might do a limited number of retries; and then regard that client as "gone"; and then drop any other messages for that client.
1- How would the server know if a client is down (the client did not disconnect but the connection got suddenly interrupted) ?
A write to the client will fail.
2- Is there a way to know from the server-side if the process at client-side listening on the call-back port is still up (i.e. client call-back socket is still open
A write to the client will fail.
The write won't necessarily fail immediately, due to TCP buffering, but the write will eventually provoke retries and retry timeouts that will cause a subsequent read or write to fail.
In Java the failure will manifest itself as an IOException: connection reset.

MQTT broker connection management

I'm using Paho to communicate with an MQTT broker and all the example I found (like this) do these 3 steps when performing an action (publish or subscribe):
connect to the broker
do action
disconnect
My question is: are there any drawbacks holding a connection for the whole life of the application instead of opening/closing it for each action? Isn't it a faster solution removing the time for opening the connection?
No, holding a connection open for the lifetime of the application is a fully expected usecase, it's the only real way you'd be able to subscribe to a topic and receive messages when they are published.
The protocol has built in ping messages to ensure the broker knows the client is still connected.
The examples tend to be relatively trivial but want to show the full life cycle of the client which is why they connect, do something, disconnect

Elasticsearch unclosed client. Live threads after Tomcat shutdown. Memory usage impact?

I am using Elasticsearch 1.5.1 and Tomcat 7. Web application creates a TCP client instance as Singleton during server startup through Spring Framework.
Just noticed that I failed to close the client during server shutdown.
Through analysis on various tools like VisualVm, JConsole, MAT in Eclipse, it is evident that threads created by the elasticsearch client are live even after server(tomcat) shutdown.
Note: after introducing client.close() via Context Listener destroy methods, the threads are killed gracefully.
But my query here is,
how to check the memory occupied by these live threads?
Memory leak impact due to this thread?
We have got few Out of memory:Perm gen errors in PROD. This might be a reason but still I would like to measure and provide stats for this.
Any suggestions/help please.
Typically clients run in a different process than the services they communicate with. For example, I can open a web page in a web browser, and then shutdown the webserver, and the client will remain open.
This has to do with the underlying design choices of TCP/IP. Glossing over the details, under most cases a client only detects it's server is gone during the next request to the server. (Again generally speaking) it does not continually poll the server to see if it is alive, nor does the server generally send a "please disconnect" message on shutting down.
The reason that clients don't generally poll servers is because it allows the server to handle more clients. With a polling approach, the server is limited by the number of clients running, but without a polling approach, it is limited by the number of clients actively communicating. This allows it to support more clients because many of the running clients aren't actively communicating.
The reason that servers typically don't send an "I'm shutting down" message is because many times the server goes down uncontrollably (power outage, operating system crash, fire, short circuit, etc) This means that an protocol which requires such a message will leave the clients in a corrupt state if the server goes down in an uncontrolled manner.
So losing a connection is really a function of a failed request to the server. The client will still typically be running until it makes the next attempt to do something.
Likewise, opening a connection to a server often does nothing most of the time too. To validate that you really have a working connection to a server, you must ask it for some data and get a reply. Most protocols do this automatically to simplify the logic; but, if you ever write your own service, if you don't ask for data from the server, even if the API says you have a good "connection", you might not. The API can report back a good "connections" when you have all the stuff configured on your machine successfully. To really know if it works 100% on the other machine, you need to ask for data (and get it).
Finally servers sometimes lose their clients, but because they don't waste bandwidth chattering with clients just to see if they are there, often the servers will put a "timeout" on the client connection. Basically if the server doesn't hear from the client in 10 minutes (or the configured value) then it closes the cached connection information for the client (recreating the connection information as necessary if the client comes back).
From your description it is not clear which of the scenarios you might be seeing, but hopefully this general knowledge will help you understand why after closing one side of a connection, the other side of a connection might still think it is open for a while.
There are ways to configure the network connection to report closures more immediately, but I would avoid using them, unless you are willing to lose a lot of your network bandwidth to keep-alive messages and don't want your servers to respond as quickly as they could.

Categories