client failure detection in client-server systems (distributed)

client failure detection in client-server systems (distributed) - java

Assume a distributed communication system where client and server communicate via a stateless channel.
The client sends requests to the server and the server does processing and keeps internal records for each client.
Server sends back notifications to the clients as various events happen to the system, as needed.
The notification mechanism depends on the internal records.
My question is, what is the standard appoach in distributed computing to handle the client failures?
I.e. in this context, assume that the client process crashes or simply restarts.
The server still has the records for the client but now client and server are of sync.
As a result client will get notifications according to records created before restart. This is undesirable.
What is a standardized way to detect the client failures? E.g. client has restarted and previous records must be erased?
I thought of periodic callbacks to clients and if a client is not reachable, erase its records but I am not sure if this is a good idea. [EDIT] I thought of callbacks because, the period events send back to the client can be in very large intervals and so the client failure would not be noticable soon
Can anyone help on this? The context of my application domain is web services.
Thank you!

The standard approach varies from system to system depending to the architecture and domain. How the server finds out that the client is down? I think you don't need callbacks, since you send the notifications and can detect that the client is unreachable. For example:
send a notification to the client;
if success, goto 1;
else erase all the notifications in the queue for the client, set a flag to not collect events for the client.
When a client is connected:
unset the flag;
start sending notifications
Or even a simpler approach:
erase the notification queue for the client when it connects before initializing the conversation;
run a low-priority thread to erase all the notifications for all the clients which are older then X, to clean notifications for the client which will never come back.
Update after the original author comments
It strongly depends on how things are organized in your system. Assuming:
The server starts a thread (let's call it "agent") to serve a client, a thread per client.
The agent exits when the clients shuts down the session properly or goes down.
there is a private (which is not shared among agents/clients) record set for each client
there is a shared list of current clients which is used by another component (not an ordinary agent, let's call it "dispatcher") to distribute records for clients.
solution:
1. the server starts an agent and registers the client just connected to list of clients. The dispatcher gets notified that a new client arrived.
2. the agent consumes the records until client is connected. On client's shutdown and/or failure the agents unregisters the client and cleans the record set.
If things in your system aren't organized in the way described above, please provide some details.

Related

How could a server check the availability of a client?

I have classic http client/server application where the server serves the clients data at their will but also performs some kind of call-backs to the list of clients' addresses it has. My two questions are :
1- How would the server know if a client is down (the client did not disconnect but the connection got suddenly interrupted) ?
2- Is there a way to know from the server-side if the process at client-side listening on the call-back port is still up (i.e. client call-back socket is still open) ?

1- How would the server know if a client is down (the client did not disconnect but the connection got suddenly interrupted) ?
Option #1: direct communication
Client tells server "I'm alive" at a periodic interval. You could make your client to ping your server at a configurable interval, and if the server does not receive the signal for a certain time, it'll mark the client as down. Client could even tell server more info(e.g. It's status) in each heartbeat if necessary, this is also the way used in many distributed systems(e.g. Hadoop/Hbase).
Option #2: distributed coordination service
You could treat all clients connected to a server as a group, and use a 3rd party distributed coordination service like Zookeeper to facilitate the membership management. Client registers itself to Zookeeper as a new member of the group right after booting up, and leaves the group if it's down. Zookeeper notifies the server whenever the membership changes.
2- Is there a way to know from the server-side if the process at client-side listening on the call-back port is still up (i.e. client call-back socket is still open) ?
I think this can only be done by the way Option #1 listed above. It could be either the way clients tell server "My callback port is OK" at a fixed interval, or the server asks clients "Are your callback port OK?" and wait its response at a fixed interval

You would have to establish some sort of protocol; and simply spoken: the server keeps track of "messages" that it tried to sent to clients.
If that "send" is acknowledged, fine; if not: then the server might do a limited number of retries; and then regard that client as "gone"; and then drop any other messages for that client.

1- How would the server know if a client is down (the client did not disconnect but the connection got suddenly interrupted) ?
A write to the client will fail.
2- Is there a way to know from the server-side if the process at client-side listening on the call-back port is still up (i.e. client call-back socket is still open
A write to the client will fail.
The write won't necessarily fail immediately, due to TCP buffering, but the write will eventually provoke retries and retry timeouts that will cause a subsequent read or write to fail.
In Java the failure will manifest itself as an IOException: connection reset.

Elasticsearch unclosed client. Live threads after Tomcat shutdown. Memory usage impact?

I am using Elasticsearch 1.5.1 and Tomcat 7. Web application creates a TCP client instance as Singleton during server startup through Spring Framework.
Just noticed that I failed to close the client during server shutdown.
Through analysis on various tools like VisualVm, JConsole, MAT in Eclipse, it is evident that threads created by the elasticsearch client are live even after server(tomcat) shutdown.
Note: after introducing client.close() via Context Listener destroy methods, the threads are killed gracefully.
But my query here is,
how to check the memory occupied by these live threads?
Memory leak impact due to this thread?
We have got few Out of memory:Perm gen errors in PROD. This might be a reason but still I would like to measure and provide stats for this.
Any suggestions/help please.

Typically clients run in a different process than the services they communicate with. For example, I can open a web page in a web browser, and then shutdown the webserver, and the client will remain open.
This has to do with the underlying design choices of TCP/IP. Glossing over the details, under most cases a client only detects it's server is gone during the next request to the server. (Again generally speaking) it does not continually poll the server to see if it is alive, nor does the server generally send a "please disconnect" message on shutting down.
The reason that clients don't generally poll servers is because it allows the server to handle more clients. With a polling approach, the server is limited by the number of clients running, but without a polling approach, it is limited by the number of clients actively communicating. This allows it to support more clients because many of the running clients aren't actively communicating.
The reason that servers typically don't send an "I'm shutting down" message is because many times the server goes down uncontrollably (power outage, operating system crash, fire, short circuit, etc) This means that an protocol which requires such a message will leave the clients in a corrupt state if the server goes down in an uncontrolled manner.
So losing a connection is really a function of a failed request to the server. The client will still typically be running until it makes the next attempt to do something.
Likewise, opening a connection to a server often does nothing most of the time too. To validate that you really have a working connection to a server, you must ask it for some data and get a reply. Most protocols do this automatically to simplify the logic; but, if you ever write your own service, if you don't ask for data from the server, even if the API says you have a good "connection", you might not. The API can report back a good "connections" when you have all the stuff configured on your machine successfully. To really know if it works 100% on the other machine, you need to ask for data (and get it).
Finally servers sometimes lose their clients, but because they don't waste bandwidth chattering with clients just to see if they are there, often the servers will put a "timeout" on the client connection. Basically if the server doesn't hear from the client in 10 minutes (or the configured value) then it closes the cached connection information for the client (recreating the connection information as necessary if the client comes back).
From your description it is not clear which of the scenarios you might be seeing, but hopefully this general knowledge will help you understand why after closing one side of a connection, the other side of a connection might still think it is open for a while.
There are ways to configure the network connection to report closures more immediately, but I would avoid using them, unless you are willing to lose a lot of your network bandwidth to keep-alive messages and don't want your servers to respond as quickly as they could.

best way to keep clients updated in LAN - Java networking

I'm working on a multi-client application that will connect with a server in the LAN.
Every client can send a command that changes the status of the server.
This 'ServerStatus', as I will call it, is an object with some values.
Now if the ServerStatus changes, all clients should know about it immediatly.
My idea was to work like this:
Server sends a multicast to all listening clients with a versionNumber of the ServerStatus every second. So if a new client joins the multicast group, he will see if his versionNumber is the same.
If not, the client will ask the current version of ServerStatus via UDP.
When a client sends a command that changes the ServerStatus,
the server will send his current (and new) ServerStatus to the same multicast group,
while in another thread, the versionnumber of ServerStatus is still shared every second.
Do you guys think this is a good way to deal with this?
Or will this cause too much problems,... etc

What happens if the new ServerStatus fails to reach the clients? In my opinion you should not use UDP when sending the new status to the clients, but a reliable protocol. So if you intend to use multicast on this you will have to get a reliable multicast protocol.
On the other hand, you may prefer client synchronization with the server:
Every time a client enters the network it asks the server its statusid (if not the same, the server sends him ServerStatus) and the client also registers for new status change events. (TCP)
When leaving, the client could send a UNREGISTER message (UDP).
Each time ServerStatus changes, the server sends the new ServerStatus to each registered client. On receiving the new Serverstatus the client would send an ack-like to the server.(TCP)
If the ack was not received by the server, the client in question would be unregistered (because it would mean the client had left the network without unregistering - by error).
hope this helps..

Basically, your idea sounds good to me
I would suggest you dig more into principals of "group communication", and look at frameworks such as jGroups, I know that JBoss Cache uses it to distrubte data among its nodes.
Maybe for reliability clients should also query the server once in X seconds, to see they they ave the correct version number,
or at least perform this when they are started / recover from crash.

Java Instant Messenger questions

I am looking to build an instant messenger in Java.
Clients will connect to the server to log in.
They will start a conversation with one or more other clients.
They will then post messages to the server that will relay the messages to all the clients.
The client needs to be continually updated when users post messages or log in.
so the way I see it, the client needs to run a server itself in a separate thread so that the main server can send stuff to it. Otherwise the client will have to the poll the main server every xyz seconds to get the latest updates. And that would need a separate thread anayway, as that would be purely for getting updates whereas the 'main' thread would be used for when the client initiates actions such as posting messages/inviting others to conversations etc...
So anyone recommendations on how to write this instant messenger? Does it sound like a good idea to make the connection a 'two-way' connection where both the client and server act as servers? Or is polling a better option? Anyone know how the IRC protocol does this?

There's no real advantage of having 2 connections unless they can be handled independently (for example receiving / sending a file usually done in a separate connection). A connection itself is already a two-way communication channel so it can be used to both send and receive messages, events etc. You don't need to poll server since client is able to maintain persistent connection and just wait for data to appear (optionally sending periodic PING-like message to ensure connection is alive).
IRC uses a single connection to server to exchange text commands. For example one of the main commands:
PRIVMSG <msgtarget> <message>
This command can be originated either by client or by server. Client sends PRIVMSG to notify that it wants to deliver message to one or more destination (in IRC this either user(s) or channel(s)). Server's task here is to properly broadcast this message to appropriate clients.

If you're using raw InputOutput streams then yes this is a good way of doing it. You create one thread on the clientside that acts in a similar fashion as the server thread - waits for any incoming updates and when it does it updates the client. I wouldn't call it a server though. So you'd ideally have 2 TCP/UDP connections one for requests made by the client and one to notify the client of server changes.
This solution in an enterprise environment would probably be done through some kind of messaging framework such as Spring Integration but dig deep enough and it will essentially be a similar way to how you mentioned.

Do you need a fully custom protocol or would it be sufficient to use the XMPP? There are several open source libraries implementing XMPP.
http://xmpp.org/xmpp-software/libraries/
e.g. http://www.igniterealtime.org/projects/smack/

For me, to develop instant messaging service, I will use websocket protocol instead of normal java socket because the normal socket can not work well with HTTP protocol and moreover some network providers and firewalls banned custom ports. If you develop it in normal socket, your service could not be accessed by web clients.
Did you plan to develop the instant messaging service yourself? How about using other protocols such as Jabber?

RMI client tracking

I'm building a Client / Server app that has some very specific needs. There are 2 kinds of servers: the first kind provide most of the remote procedures and clients connect to these directly, while the second kind is a single server that should keep track of what users are active (clients) and how many servers of the first kind are active when a method is called.
The main thing is that the monitor should ONLY connect to the servers and not the clients directly. My first idea was to implement a simple login/logout rmi method when a client connects/ disconnects and keep track of those in a list but the main problem is when a client or server end abnormally.
For example, if a client goes offline abruptly the server should somehow be notified and update the list accordingly, while if a server goes out all of the clients connected to it should be marked as not active in the control server.
Any ideas of how to implement this functionality would be appreciated.

I would suggest implementing a "session" approach to the problem, where the servers and the clients are sending a "heartbeat" method call to the monitoring server every several minutes(may seconds or hours depending on your needs). If the monitoring server doesn't receive a "heartbeat" from the servers or clients in a certain amount of time, then you consider them gone (terminated abnormally) and notify accordingly.

Zookeeper may be something to look at. Have each clientserver register an ephemeral node for itself, and for each client that is connected to it. When the clientserver goes down, the ephemeral nodes will die. The monitor server just needs to watch zookeeper to see who is up and connected.
For detecting clients going down, you will need some kind of hearbeating so that the clientserver can detect when a client dies. If the client can talk to zookeeper directly, then simply have the client register an ephemeral node in zookeeper as well, and the clientserver can watch the clients ephemeral node, and know when the client is down.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.