in the PLC4X project we are using Netty for the clients to connect to PLCs which act as server. Sometimes, either by user error or by PLC error the connections are not accepted but rejected. If we retry to build up the connection ASAP multiple times, we run into the error message Too many open files.
I try to clean up everything in my code, so I would assume that there are no filedescriptors that could leak:
try {
final NioEventLoopGroup workerGroup = new NioEventLoopGroup();
Bootstrap bootstrap = new Bootstrap();
bootstrap.group(workerGroup);
bootstrap.channel(NioSocketChannel.class);
bootstrap.option(ChannelOption.SO_KEEPALIVE, true);
bootstrap.option(ChannelOption.TCP_NODELAY, true);
// TODO we should use an explicit (configurable?) timeout here
// bootstrap.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 1000);
bootstrap.handler(channelHandler);
// Start the client.
final ChannelFuture f = bootstrap.connect(address, port);
f.addListener(new GenericFutureListener<Future<? super Void>>() {
#Override public void operationComplete(Future<? super Void> future) throws Exception {
if (!future.isSuccess()) {
logger.info("Unable to connect, shutting down worker thread.");
workerGroup.shutdownGracefully();
}
}
});
// Wait for sync
f.sync();
f.awaitUninterruptibly(); // jf: unsure if we need that
// Wait till the session is finished initializing.
return f.channel();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new PlcConnectionException("Error creating channel.", e);
} catch (Exception e) {
throw new PlcConnectionException("Error creating channel.", e);
}
From my understanding, the Listener should always shutdown the group and free up all descriptors used.
But in reality, when running it on macOS Catalina I see that about 1% of the fails that its not due to "rejection" but due to "Too many open files".
Is this a ulimit thing, as Netty (on macOS) simply needs a number of fd's to use? Or am I leaking something?
Thanks for clarification!
I found out the solution, kind of myself.
There are 2 issues (probably even 3) in an original implementation, which are not really related to Mac OS X:
connect and addListener should be chained
workerGroup.shutdownGracefully() is triggered in another thread, so the main (called) thread already finishes
its not awaited that the workerGroup really finishes.
This together can lead to situations as it seems, where new groups are spawned faster than old groups are closed.
Thus, I changed the implementation to
try {
final NioEventLoopGroup workerGroup = new NioEventLoopGroup();
Bootstrap bootstrap = new Bootstrap();
bootstrap.group(workerGroup);
bootstrap.channel(NioSocketChannel.class);
bootstrap.option(ChannelOption.SO_KEEPALIVE, true);
bootstrap.option(ChannelOption.TCP_NODELAY, true);
// TODO we should use an explicit (configurable?) timeout here
// bootstrap.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 1000);
bootstrap.handler(channelHandler);
// Start the client.
logger.trace("Starting connection attempt on tcp layer to {}:{}", address.getHostAddress(), port);
final ChannelFuture f = bootstrap.connect(address, port);
// Wait for sync
try {
f.sync();
} catch (Exception e) {
// Shutdown worker group here and wait for it
logger.info("Unable to connect, shutting down worker thread.");
workerGroup.shutdownGracefully().awaitUninterruptibly();
logger.debug("Worker Group is shutdown successfully.");
throw new PlcConnectionException("Unable to Connect on TCP Layer to " + address.getHostAddress() + ":" + port, e);
}
// Wait till the session is finished initializing.
return f.channel();
}
catch (Exception e) {
throw new PlcConnectionException("Error creating channel.", e);
}
which adresses the issues above. Thus, the call only finishes when its properly cleaned up.
My tests now show a constant number of open file descriptors.
Related
I have a Server that can receive multiple request at the same time.
In my Server, I have to make some traitement and wait for response. This traitmenet is done by externe library so I don't how much should I wait.
So the Server looks like :
public class MyServer{
#Override
//method from the library
public void workonRequest(){
//---
response=[...]
}
public void listenRequest() {
new Thread(() -> {
while (true) {
try {
socket = server.accept();
ObjectInputStream input = new ObjectInputStream(socket.getInputStream());
ObjectOutputStream output = new ObjectOutputStream(socket.getOutputStream());
socket.setTcpNoDelay(true); //TODO : Not sure !
new Thread(() -> {
try {
handleRequest(input, output);
} catch (IOException e) {
throw new RuntimeException(e);
}
}).start();
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}).start();
}
And the handle request method is :
public void handleRequest(ObjectInputStream input, ObjectOutputStream output) throws IOException {
try {
while (true) {
//forward the request to the library
//work on it [means using the library and waiting]
// return response
}
}
}
The response object is the result that I want return to the client
How to deal with the problem of waiting for the answer?
How can I make sure that there will be no problems when more than 2 clients send requests at the same time.
Thanks in advance
How to deal with the problem of waiting for the answer ?###
Using while(true) can create issues because you are blocking the thread and opening sub thread and multi streams will make it more complex. There is easy way called reactive programming which handles this kind of multi-threaded issues easily, quarkus async solution and spring, if you still want to manage your sockets from java code you can use akka
How can I make sure that there will be no problems when more than 2 clients send requests at the same time.
That can be done by not blocking the main thread and If you manage to use reactive and/or async approach you will not have that problem.
Reference
https://quarkus.io/guides/getting-started-reactive
https://docs.spring.io/spring-framework/docs/current/reference/html/web-reactive.html
The below program acts as TCP client and uses NIO to open socket to a remote server, as below
private Selector itsSelector;
private SocketChannel itsChannel;
public boolean getConnection(Selector selector, String host, int port)
{
try
{
itsSelector = selector;
itsChannel = SocketChannel.open();
itsChannel.configureBlocking(false);
itsChannel.register(itsSelector, SelectionKey.OP_CONNECT);
itsChannel.connect(new InetSocketAddress(host, port));
if (itsChannel.isConnectionPending())
{
while (!itsChannel.finishConnect())
{
// waiting until connection is finished
}
}
itsChannel.register(itsSelector, SelectionKey.OP_WRITE);
return (itsChannel != null);
}
catch (IOException ex)
{
close();
if(ex instanceof ConnectException)
{
LOGGER.log(Level.WARNING, "The remoteserver cannot be reached");
}
}
}
public void close()
{
try
{
if (itsChannel != null)
{
itsChannel.close();
itsChannel.socket().close();
itsSelector.selectNow();
}
}
catch (IOException e)
{
LOGGER.log(Level.WARNING, "Connection cannot be closed");
}
}
This program runs on Red Hat Enterprise Linux Server release 6.2 (Santiago)
When number of concurrent sockets are in establishment phase, file descriptor limit reaches a max value and I see below exception while trying to establish more socket connections.
java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
This happens only when the remote Node is down, and while it is up, all is fine.
When the remote TCP server is down, below exception is thrown as is handled as IOException in the above code
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
Is there any way to forcefully close the underlying file descriptor in this case.
Thanks in advance for all the help.
private Selector itsSelector;
I cannot see the point of this declaration. You can always get the selector the channel is registered with, if you need it, which you never do. Possibly you are leaking Selectors?
itsChannel.configureBlocking(false);
itsChannel.register(itsSelector, SelectionKey.OP_CONNECT);
Here you are registering for OP_CONNECT but never making the slightest use of the facility.
itsChannel.connect(new InetSocketAddress(host, port));
Here you are starting a pending connection.
if (itsChannel.isConnectionPending())
It is. You just started it. The test is pointless.
{
while (!itsChannel.finishConnect())
{
// waiting until connection is finished
}
}
This is just a complete waste of time and space. If you don't want to use the selector to detect when OP_CONNECT fires, you should call connect() before setting the channel to non-blocking, and get rid of this pointless test and loop.
itsChannel.register(itsSelector, SelectionKey.OP_WRITE);
return (itsChannel != null);
itsChannel cannot possibly be null at this point. The test is pointless. You would be better off allowing the IOExceptions that can arise to propagate out of this method, so that the caller can get some idea of the failure mode. That also places the onus on the caller to close on any exception, not just the ones you're catching here.
catch (IOException ex)
{
close();
if(ex instanceof ConnectException)
{
LOGGER.log(Level.WARNING, "The remoteserver cannot be reached");
}
}
See above. Remove all this. If you want to distinguish ConnectException from the other IOExceptions, catch it, separately. And you are forgetting to log anything that isn't a ConnectException.
public void close()
{
try
{
if (itsChannel != null)
{
itsChannel.close();
itsChannel.socket().close();
itsSelector.selectNow();
The second close() call is pointless, as the channel is already closed.
catch (IOException e)
{
LOGGER.log(Level.WARNING, "Connection cannot be closed");
}
I'm glad to see you finally logged an IOException, but you're not likely to get any here.
Don't write code like this.
I made an java-application which has a client- and a server-side. Both sides communicate via sockets. This works well until my server application is killed by something and can't close or shutdown the serversocket.
The client does not seem to notice the broken connection and just hangs itself while trying to read the next object.
I also tried sending a test object from the client every 5 seconds to detect that the server is offline, but that also does not work.
I might have to mention this only occurs when running the server app on Windows and the client on Linux (Ubuntu in VirtualBox). Windows-Windows works fine. Netstat even gives me an ESTABLISHED on Linux, although I already killed the server.
Client code:
requestSocket = new Socket("192.168.1.3", 1234);
out = new ObjectOutputStream(new CipherOutputStream(requestSocket.getOutputStream(), ec));
in = new ObjectInputStream(new CipherInputStream(requestSocket.getInputStream(), dc));
new Thread() {
public void run() {
while(true) {
try {
out.writeObject(obj);
out.flush();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("sent");
try {
Thread.sleep(5000);
} catch (InterruptedException e) {}
}
}
}.start();
Server code:
serverSocket = new ServerSocket(1234);
socket = serverSocket.accept();
out = new ObjectOutputStream(new CipherOutputStream(clientSocket.getOutputStream(), ec));
in = new ObjectInputStream(new CipherInputStream(clientSocket.getInputStream(), dc));
//do-while-reading on the socket[...]
I read multiple threads which told me how to detect a lost connection on the server side, but found none for the client side or the answers did not work for me.
Set a read timeout on the socket, of suitable duration, enough to include all normal transfers, and catch SocketTimeoutException.
The problem seemed to be the VM. When testing it on my Laptop with Manjaro Linux, everything worked as it should have in the beginning!
Thank you for your contributions anyway. :)
Here's what I know so far (please correct me):
In the RabbitMQ Java client, operations on a channel throw IOException when there is a general network failure (malformed data from broker, authentication failures, missed heartbeats).
Operations on a channel can also throw the ShutdownSignalException unchecked exception, typically an AlreadyClosedException when we tried to perform an action on the channel/connection after it has been shut down.
The shutting down process happens in the event of "network failure, internal failure or explicit local shutdown" (e.g. via channel.close() or connection.close()). The shutdown event propagates down the "topology", from Connection -> Channel -> Consumer, and when the Channel it calls the Consumer's handleShutdown() method gets called.
A user can also add a shutdown listener which is called after the shutdown process completes.
Here is what I'm missing:
Since an IOException indicates a network failure, does it also initiate a shutdown request?
How does using auto-recovery mode affect shutdown requests? Does it cause channel operations to block while it tries to reconnect to the channel, or will the ShutdownSignalException still be thrown?
Here is how I'm handling exceptions at the moment, is this a sensible approach?
My setup is that I'm polling a QueueingConsumer and dispatching tasks to a worker pool. The rabbitmq client is encapsulated in MyRabbitMQWrapper here. When an exception occurs polling the queue I just gracefully shutdown everything and restart the client. When an exception occurs in the worker I also just log it and finish the worker.
My biggest worry (related to Question 1): Suppose an IOException occurs in the worker, then the task doesn't get acked. If the shutdown does not then occur, I now have an un-acked task that will be in limbo forever.
Pseudo-code:
class Main {
public static void main(String[] args) {
while(true) {
run();
//Easy way to restart the client, the connection has been
//closed so RabbitMQ will re-queue any un-acked tasks.
log.info("Shutdown occurred, restarting in 5 seconds");
Thread.sleep(5000);
}
}
public void run() {
MyRabbitMQWrapper rw = new MyRabbitMQWrapper("localhost");
try {
rw.connect();
while(!Thread.currentThread().isInterrupted()) {
try {
//Wait for a message on the QueueingConsumer
MyMessage t = rw.getNextMessage();
workerPool.submit(new MyTaskRunnable(rw, t));
} catch (InterruptedException | IOException | ShutdownSignalException e) {
//Handle all AMQP library exceptions by cleaning up and returning
log.warn("Shutting down", e);
workerPool.shutdown();
break;
}
}
} catch (IOException e) {
log.error("Could not connect to broker", e);
} finally {
try {
rw.close();
} catch(IOException e) {
log.info("Could not close connection");
}
}
}
}
class MyTaskRunnable implements Runnable {
....
public void run() {
doStuff();
try {
rw.ack(...);
} catch (IOException | ShutdownSignalException e) {
log.warn("Could not ack task");
}
}
}
Retry Connection in Netty
I am building a client socket system. The requirements are:
First attemtp to connect to the remote server
When the first attempt fails keep on trying until the server is online.
I would like to know whether there is such feature in netty to do it or how best can I solve that.
Thank you very much
This is the code snippet I am struggling with:
protected void connect() throws Exception {
this.bootstrap = new ClientBootstrap(new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool()));
// Configure the event pipeline factory.
bootstrap.setPipelineFactory(new SmpPipelineFactory());
bootstrap.setOption("writeBufferHighWaterMark", 10 * 64 * 1024);
bootstrap.setOption("sendBufferSize", 1048576);
bootstrap.setOption("receiveBufferSize", 1048576);
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
// Make a new connection.
final ChannelFuture connectFuture = bootstrap
.connect(new InetSocketAddress(config.getRemoteAddr(), config
.getRemotePort()));
channel = connectFuture.getChannel();
connectFuture.addListener(new ChannelFutureListener() {
#Override
public void operationComplete(ChannelFuture future)
throws Exception {
if (connectFuture.isSuccess()) {
// Connection attempt succeeded:
// Begin to accept incoming traffic.
channel.setReadable(true);
} else {
// Close the connection if the connection attempt has
// failed.
channel.close();
logger.info("Unable to Connect to the Remote Socket server");
}
}
});
}
Assuming netty 3.x the simplest example would be:
// Configure the client.
ClientBootstrap bootstrap = new ClientBootstrap(
new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool()));
ChannelFuture future = null;
while (true)
{
future = bootstrap.connect(new InetSocketAddress("127.0.0.1", 80));
future.awaitUninterruptibly();
if (future.isSuccess())
{
break;
}
}
Obviously you'd want to have your own logic for the loop that set a max number of tries, etc. Netty 4.x has a slightly different bootstrap but the logic is the same. This is also synchronous, blocking, and ignores InterruptedException; in a real application you might register a ChannelFutureListener with the Future and be notified when the Future completes.
Add after OP edited question:
You have a ChannelFutureListener that is getting notified. If you want to then retry the connection you're going to have to either have that listener hold a reference to the bootstrap, or communicate back to your main thread that the connection attempt failed and have it retry the operation. If you have the listener do it (which is the simplest way) be aware that you need to limit the number of retries to prevent an infinite recursion - it's being executed in the context of the Netty worker thread. If you exhaust your retries, again, you'll need to communicate that back to your main thread; you could do that via a volatile variable, or the observer pattern could be used.
When dealing with async you really have to think concurrently. There's a number of ways to skin that particular cat.
Thank you Brian Roach. The connected variable is a volatile and can be accessed outside the code or further processing.
final InetSocketAddress sockAddr = new InetSocketAddress(
config.getRemoteAddr(), config.getRemotePort());
final ChannelFuture connectFuture = bootstrap
.connect(sockAddr);
channel = connectFuture.getChannel();
connectFuture.addListener(new ChannelFutureListener() {
#Override
public void operationComplete(ChannelFuture future)
throws Exception {
if (future.isSuccess()) {
// Connection attempt succeeded:
// Begin to accept incoming traffic.
channel.setReadable(true);
connected = true;
} else {
// Close the connection if the connection attempt has
// failed.
channel.close();
if(!connected){
logger.debug("Attempt to connect within " + ((double)frequency/(double)1000) + " seconds");
try {
Thread.sleep(frequency);
} catch (InterruptedException e) {
logger.error(e.getMessage());
}
bootstrap.connect(sockAddr).addListener(this);
}
}
}
});