Netty NIO SSL error when running out of filedescriptors

Netty NIO SSL error when running out of filedescriptors - java

I am running a server performance test using a Netty based tester app as a client. The connection is over SSL sockets, where I send a registration and the server starts streaming data back. So I try to create as many connections as the server can handle.
I get up to about 4000 sockets on my tester until it (client OS process) runs out of filedescriptors due to too many open sockets. This would be fine if I got the proper error message from Netty. However, the only thing Netty gives me is: java.nio.channels.ClosedChannelException. This does not even have a stacktrace.
After various runs with the debugger, I believe this is due to io.netty.handler.ssl.SslHandler handling such errors as:
private static final ClosedChannelException CHANNEL_CLOSED = new ClosedChannelException();
static {CHANNEL_CLOSED.setStackTrace(EmptyArrays.EMPTY_STACK_TRACE);}
#Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
// Make sure to release SSLEngine,
// and notify the handshake future if the connection has been closed during handshake.
setHandshakeFailure(ctx, CHANNEL_CLOSED);
super.channelInactive(ctx);
}
In the end this results in throwing the ClosedChannelException with no stacktrace. If I run this over the debugger and set breakpoints in Netty earlier, this seems to be due to SSL Handshake timeout. I believe this timeout is due to running out of the filedescriptors. No idea why Netty treats it as a timeout though.
The reason for me to believe that it is running out of filedescriptors is due to earlier versions of this test getting exceptions for too many open files in the system. However, after reducing the use of files elsewhere in the code it now gets this far but I no longer get a meaningful error message. I also still get the error about too many open files if I run other software concurrently that keeps opening files at the time Netty hangs on this.
I am wondering if there is some trick for me to get Netty to properly report the actual cause of failure?
Here is the main relevant client init code:
private static final EventLoopGroup group = new NioEventLoopGroup();
public SSLClientNetty() throws Exception {
SSLContext context = SSLContext.getInstance("TLS");
context.init(keyManagers, trustManagers, null);
SSLEngine sslEngine = context.createSSLEngine();
sslEngine.setUseClientMode(true);
SslHandler sslHandler = new SslHandler(sslEngine);
//this is the time Netty waits before throwing the ClosedChannelException after reaching file limit
sslHandler.setHandshakeTimeoutMillis(5000);
try {
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.handler(new MyInitializer(sslHandler));
ch = b.connect("localhost", 5555).sync().channel();
} catch (Exception e) {
log.error("Error connecting to server", e);
throw new RuntimeException("Error connecting to server", e);
}
}
The main relevant code for MyInitializer:
#Override
protected void initChannel(SocketChannel ch) throws Exception {
ch.pipeline().addLast(sslHandler);
ch.pipeline().addLast("bytesEncoder", new ByteArrayEncoder());
ch.pipeline().addLast(new MyDecoder());
}
#Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception {
super.exceptionCaught(ctx, cause);
log.error("Error in initializing connection", cause);
}
In MyDecoder, just to make sure I also log any Exceptions:
#Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception {
super.exceptionCaught(ctx, cause);
log.error("Error in decoder", cause);
throw new RuntimeException("Error in decoder", cause);
}
The main loop for tester creating connections:
while (true) {
SSLClientNetty client = new SSLClientNetty();
client.register();
Thread.sleep(10);
}
Right now the error msg is only this:
4000:...................java.nio.channels.ClosedChannelException
Where the tester is printing a dot for every successfully opened connection, and it ends with Netty throwing ClosedChannelException with no stacktrace (as explained above).
So, just to re-iterate, I am looking to get a better error report for what is actually causing the connections to fail. And to understand how Netty handles running out of sockets/how do I manage that..?

Related

Wait for a thread to successfully start

I'm wondering how to log information when a server has successfully started. I cannot do this as simple as that:
createServer().start(Exit.NEVER);
System.out.println("Server is running...");
because the instruction createServer().start(Exit.NEVER) doesn't return back. This is a call to external library that uses a method with a loop similar to while(true).
I cannot also run the server in a new thread and then log information about successful start because the server may throw exception and hence there was a failure.
public void start () {
new Thread("Server") {
#Override
public void run() {
try {
createServer().start(Exit.NEVER);
} catch (final IOException e) {
throw new UncheckedIOException(e);
}
}
}.start();
System.out.println("Server is running...");
}
Last solution I can think of is to wait a couple of second after createServer().start(Exit.NEVER) and then log the successful start as there was no exception thrown. This is not a perfect solution as we can wait for example 5 seconds and the log the successful start but one second later the server may throw exception.
How do I then can tell whether the server has started successfully and hence log this information?
EDIT
The server I'm using is Takes https://github.com/yegor256/takes.

Reply timeout when using AsyncRabbitTemplate::sendAndReceive - RabbitMQ

I recently changed from using a standard Rabbit Template, in my Spring Boot application, to using an Async Rabbit Template. In the process, I switched from the standard send method to using the sendAndReceive method.
Making this change does not seem to affect the publishing of messages to RabbitMQ, however I do now see stack traces as follows when sending messages:
org.springframework.amqp.core.AmqpReplyTimeoutException: Reply timed out
at org.springframework.amqp.rabbit.AsyncRabbitTemplate$RabbitFuture$TimeoutTask.run(AsyncRabbitTemplate.java:762) [spring-rabbit-2.3.10.jar!/:2.3.10]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.3.9.jar!/:5.3.9]
I have tried modifying various settings including the reply and receive timeouts but all that changes is the time it takes to receive the above error. I have also tried setting useDirectReplyToContainer to true as well as setting useChannelForCorrelation to true.
I have managed to recreate the issue in a main method, included bellow, using a RabbitMQ broker running in docker.
public static void main(String[] args) {
com.rabbitmq.client.ConnectionFactory cf = new com.rabbitmq.client.ConnectionFactory();
cf.setHost("localhost");
cf.setPort(5672);
cf.setUsername("<my-username>");
cf.setPassword("<my-password>");
cf.setVirtualHost("<my-vhost>");
ConnectionFactory connectionFactory = new CachingConnectionFactory(cf);
RabbitTemplate rabbitTemplate = new RabbitTemplate(connectionFactory);
rabbitTemplate.setExchange("primary");
rabbitTemplate.setUseDirectReplyToContainer(true);
rabbitTemplate.setReceiveTimeout(10000);
rabbitTemplate.setReplyTimeout(10000);
rabbitTemplate.setUseChannelForCorrelation(true);
AsyncRabbitTemplate asyncRabbitTemplate = new AsyncRabbitTemplate(rabbitTemplate);
asyncRabbitTemplate.start();
System.out.printf("Async Rabbit Template Running? %b\n", asyncRabbitTemplate.isRunning());
MessageBuilderSupport<MessageProperties> props = MessagePropertiesBuilder.newInstance()
.setContentType(MessageProperties.CONTENT_TYPE_TEXT_PLAIN)
.setMessageId(UUID.randomUUID().toString())
.setHeader(PUBLISH_TIME_HEADER, Instant.now(Clock.systemUTC()).toEpochMilli())
.setDeliveryMode(MessageDeliveryMode.NON_PERSISTENT);
asyncRabbitTemplate.sendAndReceive(
"1.1.1.csv-routing-key",
new Message(
"a,test,csv".getBytes(StandardCharsets.UTF_8),
props.build()
)
).addCallback(new ListenableFutureCallback<>() {
#Override
public void onFailure(Throwable ex) {
System.out.printf("Error sending message:\n%s\n", ex.getLocalizedMessage());
}
#Override
public void onSuccess(Message result) {
System.out.println("Message successfully sent");
}
});
}
I am sure that I am just missing a configuration option but any help would be appricated.
Thanks. :)

asyncRabbitTemplate.sendAndReceive(..) will always expect a response from the consumer of the message, hence the timeout you are receiving.
To fire and forget use the standard RabbitTemplate.send(...) and catching any exceptions in a try/catch block:
try {
rabbitTemplate.send("1.1.1.csv-routing-key",
new Message(
"a,test,csv".getBytes(StandardCharsets.UTF_8),
props.build());
} catch (AmqpException ex) {
log.error("failed to send rabbit message, routing key = {}", routingKey, ex);
}

Set reply timeout to some bigger number and see the effect.
rabbitTemplate.setReplyTimeout(60000);
https://docs.spring.io/spring-amqp/reference/html/#reply-timeout

Netty not sending certain character sequences

I started to familiarize myself with Netty since I plan to use it in a future project.
But I stumbled upon some weird behaviour.
Since I will be using a text protocol for the project, I started with the standard "text" pipeline with StringDecoder,StringEncoder and DelimiterBasedFrameDecoder. But now I have reduced this to the following:
EventLoopGroup workerGroup = new NioEventLoopGroup();
try {
Bootstrap b = new Bootstrap();
b.group(workerGroup);
b.channel(NioSocketChannel.class);
b.handler(new ChannelInitializer<SocketChannel>() {
#Override
public void initChannel(SocketChannel ch) throws Exception {
ch.pipeline().addLast(new TestClientHandler());
}
});
ChannelFuture f = b.connect("some.working.web.server", 80).sync();
f.channel().closeFuture().sync();
} finally {
workerGroup.shutdownGracefully();
}
And my TestClient is:
public static class TestClientHandler extends
SimpleChannelInboundHandler<ByteBuf> {
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
System.out.println("channel active");
ctx.writeAndFlush(Unpooled.copiedBuffer(
"GET /index.html HTTP/1.0\r\n", CharsetUtil.US_ASCII));
System.out.println("after write");
}
#Override
public void channelRead0(ChannelHandlerContext ctx, ByteBuf msg) {
System.out.println("got" + msg);
}
#Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
cause.printStackTrace();
ctx.close();
}
}
I looked at the traffic with Wireshark and when this is executed only the TCP handshake is performed and nothing is sent. After a while the program exits (when the HTTP server closes the connection).
The weirdest thing is that if I change the line:
ctx.writeAndFlush(Unpooled.copiedBuffer(
"GET /index.html HTTP/1.0\r\n", CharsetUtil.US_ASCII));
to
ctx.writeAndFlush(Unpooled.copiedBuffer(
"GET /index.html HTTPA1.0\r\n", CharsetUtil.US_ASCII));
Then the "request" gets sent to the server which of course rejects it and replies with an error.
I played with the "request string" a bit more and it seems that Netty for some reason does not like the following:
ctx.writeAndFlush(Unpooled.copiedBuffer(
" HTTP/1.0\n", CharsetUtil.US_ASCII));
This string does not get sent. But removing the leading whitespace, or changing a random character gets the string sent to the server.
And to make things even stranger, if I test this with a SMTP server, the "request" gets sent without problems to the server. The only difference is that the SMTP server sends the HELO message before my string is sent to the server...
The behaviour is also same with Netty 4.0.23 and 4.1.0.Beta3 on Java 1.7.0_21 and 1.8.0_20.
And also stays the same if I change to OioEventLoopGroup and OioSocketChannel
Also changing the channelActive method a bit:
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
System.out.println("channel active");
ByteBuf bb = Unpooled.buffer();
bb.writeBytes("GET index.htm HTTP/1.0\r\n".getBytes());
System.out.println(ByteBufUtil.hexDump(bb));
ctx.writeAndFlush(bb);
System.out.println("after write");
}
Shows that my "request" got turned into proper sequence of bytes:
47455420696e6465782e68746d20485454502f312e300d0a
G E T i n d e x . h t m H T T P / 1 . 0 \r\n
I would appreciate if someone has an explanation for this wierdness...

This problem turned out to be caused by the antivirus software.
Apparently the software that I am using is using a hacky parser for HTTP monitoring and it ate my "request" which was in fact invalid... Missing extra \r\n at the end.
Turning antivirus off solved the problem. And also correcting the request to:
GET /index.html HTTP/1.0\r\n\r\n
Kept antivirus happy.
But why it decided to eat
GET /index.html HTTP/1.0\r\n
and not
GET /index.html HTTPA/1.0\r\n
is beyond me...

Exception during Netty server shutdown

I have application running on Tomcat. I use Netty 4 for websocket handling.
Netty server run in ServletContextListener in contextInitialized method and stop in contextDestroyed.
This my class for Netty server:
public class WebSocketServer {
private final int port;
private final EventLoopGroup bossGroup;
private final EventLoopGroup workerGroup;
private Channel serverChannel;
public WebSocketServer(int port) {
this.port = port;
bossGroup = new NioEventLoopGroup(1);
workerGroup = new NioEventLoopGroup();
}
public void run() throws Exception {
final ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class)
.childHandler(new WebSocketServerInitializer());
serverChannel = b.bind(port).sync().channel();
System.out.println("Web socket server started at port " + port + '.');
System.out
.println("Open your browser and navigate to http://localhost:"
+ port + '/');
}
public void stop() {
if (serverChannel != null) {
ChannelFuture chFuture = serverChannel.close();
chFuture.addListener(new ChannelFutureListener() {
#Override
public void operationComplete(ChannelFuture future) throws Exception {
shutdownWorkers();
}
});
} else {
shutdownWorkers();
}
}
private void shutdownWorkers() {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
}
}
It's work fine after running, but when I try stop Tomcat I get exception:
INFO: Illegal access: this web application instance has been stopped already. Could not load io.netty.util.concurrent.DefaultPromise$3. The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact.
java.lang.IllegalStateException
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1610)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1569)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:592)
at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:403)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:139)
at java.lang.Thread.run(Thread.java:662)
After Tomcat hangs up.
What can be reason?

I assume you call shutdownWorkers() somewhere from Servlet.destroy() or use some other mechanism that ensures your Server goes down when servlet stops / unloads.
Then you need to do
void shutdownWorkers() {
Future fb = trbossGroup.shutdownGracefully();
Future fw = workerGroup.shutdownGracefully();
try {
fb.await();
fw.await();
} catch (InterruptedException ignore) {}
}
It is because shutdownGracefully() returns a Future, and, well, without waiting for it to come, you leave things that try to close the connections in very stressful environment. It also makes sense to first initiate all shutdown's and then wait till futures are awailable, this way it all runs in parallel and happens faster.
It fixed the issue for me. Obviously, you can make it nicer to your system without swallowing InterruptedException and wrapping each call in a nice method and putting reasonable timeout for each await(). Nice excercise in general, but in reality most probably you wouldn't care at this point in your code.
Side note: and yes, for WebSockets you will be better off with Tomcat's native, standards-compliant and robust implementation. Netty is awseome for many other things, but would be a wrong tool here.

Netty slower than Tomcat

We just finished building a server to store data to disk and fronted it with Netty. During load testing we were seeing Netty scaling to about 8,000 messages per second. Given our systems, this looked really low. For a benchmark, we wrote a Tomcat front-end and run the same load tests. With these tests we were getting roughly 25,000 messages per second.
Here are the specs for our load testing machine:
Macbook Pro Quad core
16GB of RAM
Java 1.6
Here is the load test setup for Netty:
10 threads
100,000 messages per thread
Netty server code (pretty standard) - our Netty pipeline on the server is two handlers: a FrameDecoder and a SimpleChannelHandler that handles the request and response.
Client side JIO using Commons Pool to pool and reuse connections (the pool was sized the same as the # of threads)
Here is the load test setup for Tomcat:
10 threads
100,000 messages per thread
Tomcat 7.0.16 with default configuration using a Servlet to call the server code
Client side using URLConnection without any pooling
My main question is why such a huge different in performance? Is there something obvious with respect to Netty that can get it to run faster than Tomcat?
Edit: Here is the main Netty server code:
NioServerSocketChannelFactory factory = new NioServerSocketChannelFactory();
ServerBootstrap server = new ServerBootstrap(factory);
server.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() {
RequestDecoder decoder = injector.getInstance(RequestDecoder.class);
ContentStoreChannelHandler handler = injector.getInstance(ContentStoreChannelHandler.class);
return Channels.pipeline(decoder, handler);
}
});
server.setOption("child.tcpNoDelay", true);
server.setOption("child.keepAlive", true);
Channel channel = server.bind(new InetSocketAddress(port));
allChannels.add(channel);
Our handlers look like this:
public class RequestDecoder extends FrameDecoder {
#Override
protected ChannelBuffer decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) {
if (buffer.readableBytes() < 4) {
return null;
}
buffer.markReaderIndex();
int length = buffer.readInt();
if (buffer.readableBytes() < length) {
buffer.resetReaderIndex();
return null;
}
return buffer;
}
}
public class ContentStoreChannelHandler extends SimpleChannelHandler {
private final RequestHandler handler;
#Inject
public ContentStoreChannelHandler(RequestHandler handler) {
this.handler = handler;
}
#Override
public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {
ChannelBuffer in = (ChannelBuffer) e.getMessage();
in.readerIndex(4);
ChannelBuffer out = ChannelBuffers.dynamicBuffer(512);
out.writerIndex(8); // Skip the length and status code
boolean success = handler.handle(new ChannelBufferInputStream(in), new ChannelBufferOutputStream(out), new NettyErrorStream(out));
if (success) {
out.setInt(0, out.writerIndex() - 8); // length
out.setInt(4, 0); // Status
}
Channels.write(e.getChannel(), out, e.getRemoteAddress());
}
#Override
public void exceptionCaught(ChannelHandlerContext ctx, ExceptionEvent e) {
Throwable throwable = e.getCause();
ChannelBuffer out = ChannelBuffers.dynamicBuffer(8);
out.writeInt(0); // Length
out.writeInt(Errors.generalException.getCode()); // status
Channels.write(ctx, e.getFuture(), out);
}
#Override
public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent e) {
NettyContentStoreServer.allChannels.add(e.getChannel());
}
}
UPDATE:
I've managed to get my Netty solution to within 4,000/second. A few weeks back I was testing a client side PING in my connection pool as a safe guard against idle sockets but I forgot to remove that code before I started load testing. This code effectively PINGed the server every time a Socket was checked out from the pool (using Commons Pool). I commented that code out and I'm now getting 21,000/second with Netty and 25,000/second with Tomcat.
Although, this is great news on the Netty side, I'm still getting 4,000/second less with Netty than Tomcat. I can post my client side (which I thought I had ruled out but apparently not) if anyone is interested in seeing that.

The method messageReceived is executed using a worker thread that is possibly getting blocked by RequestHandler#handle which may be busy doing some I/O work.
You could try adding into the channel pipeline an OrderdMemoryAwareThreadPoolExecutor (recommended) for executing the handlers or alternatively, try dispatching your handler work to a new ThreadPoolExecutor and passing a reference to the socket channel for later writing the response back to client. Ex.:
#Override
public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {
executor.submit(new Runnable() {
processHandlerAndRespond(e);
});
}
private void processHandlerAndRespond(MessageEvent e) {
ChannelBuffer in = (ChannelBuffer) e.getMessage();
in.readerIndex(4);
ChannelBuffer out = ChannelBuffers.dynamicBuffer(512);
out.writerIndex(8); // Skip the length and status code
boolean success = handler.handle(new ChannelBufferInputStream(in), new ChannelBufferOutputStream(out), new NettyErrorStream(out));
if (success) {
out.setInt(0, out.writerIndex() - 8); // length
out.setInt(4, 0); // Status
}
Channels.write(e.getChannel(), out, e.getRemoteAddress());
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Netty NIO SSL error when running out of filedescriptors - java

Related

Wait for a thread to successfully start

Reply timeout when using AsyncRabbitTemplate::sendAndReceive - RabbitMQ

Netty not sending certain character sequences

Exception during Netty server shutdown

Netty slower than Tomcat

Categories

Resources