Java NIO Selector can select no more than 50 SelectionKeys? - java

I use siege to stress test my hand built file server, it works pretty well for small files(less than 1KB), while when tested with a 1MB file, it does not work as expected.
The following is the result of the test with a small file:
neevek#~$ siege -c 1000 -r 10 -b http://127.0.0.1:9090/1KB.txt
** SIEGE 2.71
** Preparing 1000 concurrent users for battle.
The server is now under siege.. done.
Transactions: 10000 hits
Availability: 100.00 %
Elapsed time: 9.17 secs
Data transferred: 3.93 MB
Response time: 0.01 secs
Transaction rate: 1090.51 trans/sec
Throughput: 0.43 MB/sec
Concurrency: 7.29
Successful transactions: 10000
Failed transactions: 0
Longest transaction: 1.17
Shortest transaction: 0.00
The following is the result of a test with a 1MB file:
neevek#~$ siege -c 1000 -r 10 -b http://127.0.0.1:9090/1MB.txt
** SIEGE 2.71
** Preparing 1000 concurrent users for battle.
The server is now under siege...[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
When siege terminates with the above errors, my file server still spins with a fixed number of WRITABLE SelectionKey, i.e. Selector.select() keeps returning a fixed number, say 50.
With the above tests, it looks to me that my file server cannot accept no more than 50 concurrent connections, because when running the test with small file, I notice that the server selects 1 or 2 SelectionKeys, when running with big file, it selects up to 50 every time.
I tried to increase backlog in Socket.bind() with no help.
What could be the cause of the problem?
EDIT
More info:
When testing with a 1MB file, I noticed that siege terminated with a Broken pipe error, and the file server only accepted 198 connections, though I specified 1000 concurrent connections x 10 rounds(1000*10=10000) to flood the server.
EDIT 2
I have tested with the following code(a single class) to reproduce the same problem, in this code, I only accept connections, I don't read or write, the siege client terminated with Connection reset or Broken pipe error before connections time out. I also noticed that Selector can only select less than 1000 keys. you may try the code below to witness the problem.
public class TestNIO implements Runnable {
ServerSocketChannel mServerSocketChannel;
Selector mSelector;
public static void main(String[] args) throws Exception {
new TestNIO().start();
}
public TestNIO () throws Exception {
mSelector = Selector.open();
}
public void start () throws Exception {
mServerSocketChannel = ServerSocketChannel.open();
mServerSocketChannel.configureBlocking(false);
mServerSocketChannel.socket().bind(new InetSocketAddress(9090));
mServerSocketChannel.socket().setSoTimeout(150000);
mServerSocketChannel.register(mSelector, SelectionKey.OP_ACCEPT);
int port = mServerSocketChannel.socket().getLocalPort();
String serverName = "http://" + InetAddress.getLocalHost().getHostName() + ":" + port;
System.out.println("Server start listening on " + serverName);
new Thread(this).start();
}
#Override
public void run() {
try {
Thread.currentThread().setPriority(Thread.MIN_PRIORITY);
while (true) {
int num = mSelector.select();
System.out.println("SELECT = " + num + "/" + mSelector.keys().size());
if (num > 0) {
Iterator<SelectionKey> keys = mSelector.selectedKeys().iterator();
while (keys.hasNext()) {
final SelectionKey key = keys.next();
if (key.isValid() && key.isAcceptable()) {
accept(key);
}
}
// clear the selected keys
mSelector.selectedKeys().clear();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private void accept (SelectionKey key) throws IOException {
SocketChannel socketChannel = mServerSocketChannel.accept();
socketChannel.configureBlocking(false);
socketChannel.socket().setSoTimeout(1000000);
socketChannel.socket().setKeepAlive(true);
// since we are connected, we are ready to READ
socketChannel.register(mSelector, SelectionKey.OP_READ);
}
}

It is actually related the the default backlog value set for the ServerSocketChannel
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/ServerSocketChannelImpl.java#138
You can fix the issue by passing the backlog value as a second parameter to the bind method.
mServerSocketChannel.socket().bind(new InetSocketAddress(9090), "backlog value")

Check the ulimit and hard limit of the number of open files (file descriptors)
I'm guessing you're using linux. You can look in limits.conf
/etc/security/limits.conf

This problem may not relate to my code, I run the same test against an nginx server running locally(MacOSX), the same error occurred. So it most likely relates to hardware or the siege client.

Related

reactor-netty TcpClient connection pool does not release connections

Using reactor-netty ConnectionProvider the connection is not released to the pool, so can not be reused, until disconnected
I found a workaround: disconnecting the connection after the response received, but it is terrible and can be appropriate only in my case when I connect to the neighbour server in the same network
clientsMap.computeIfAbsent("$host:$port") { hostPort ->
TcpClient.create(
ConnectionProvider.builder(hostPort)
.maxConnections(maxConnections)
.pendingAcquireTimeout(Duration.ofSeconds(4))
.maxIdleTime(Duration.ofSeconds(10))
.build()
)
.host(host)
.port(port)
.observe { _, state -> println("state: $state") }
//workaround as one packet is certainly less than 4096 bytes and the connection is stable enough
.option(ChannelOption.RCVBUF_ALLOCATOR, FixedRecvByteBufAllocator(4096))
}.connect().flatMap { connection ->
connection
.outbound()
.sendByteArray(Mono.just(send))
.then()
.thenMany(connection.inbound().receive().asByteArray())
.takeUntil { it.size < 4096 }
.map { it.toHexString() }
.collect(Collectors.joining())
.timeout(timeout)
//the next line causes disconnecting after the response being read
.doFinally { connection.channel().disconnect() }
}
when requesting data the output is
state: [connected]
state: [configured]
state: [disconnecting]
state: [connected]
state: [configured]
state: [disconnecting]
But this solution is much worse than returning connection to the pool and reusing it later
Expected to have a kind of connectionProvider.release(connection) method to forcibly return connection to the pool, and then I'd call it in doFinally, but there is nothing like that
if we comment the doFinally call, then state: [disconnecting] is not written and when more than maxConnections connections are acquired and not disconnected by the remote host yet, we catch
Pool#acquire(Duration) has been pending for more than the configured timeout of 4000ms
tested reactor-netty versions are 1.0.6 - 1.0.13
would appreciate solutions of returning connection to the pool and reusing it instead of disconnecting

Websocket blocks without timing out on client plug pull/power loss

I have an odd issue with my Tomcat + Spring websocket application. When a user disconnects without sending a "closing" signal, due to power loss or a plug pull, the thread will block about 10 seconds later.
The thread blocks on this function :
org.springframework.web.socket.WebSocketSession.sendMessage(WebSocketMessage<?> wsm) throws IOException;
I have tried putting a line in my AppConfig to try and set a timeout of 3 seconds but it does not seem to work properly as the block seems to go on for upwards of 15 minutes before throwing an exception.
#Bean(name="servletServerContainerFactoryBean")
public int maxSessionIdleTimeout() {
return 3000;
}
Here is the stack trace after an eventual SocketTimeoutException
Step: 2304
SendB -> test isOpen -> sendMes -> Done -> Finished Send.
SendB -> test2 isOpen -> sendMes -> User closed connection during packet sending: s01
Propogating exception up!
java.io.IOException: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:197)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:102)
at org.infpls.noxio.game.module.game.session.NoxioSession.sendPacket(NoxioSession.java:40)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby.step(GameLobby.java:117)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby$GameLoop.run(GameLobby.java:274)
Caused by: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:494)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:309)
... 8 more
## CRITICAL ## Ejecting player: test2 :: Exception thrown on packet send...
java.io.IOException: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:197)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:102)
at org.infpls.noxio.game.module.game.session.NoxioSession.sendPacket(NoxioSession.java:40)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby.step(GameLobby.java:117)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby$GameLoop.run(GameLobby.java:274)
Caused by: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:494)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:309)
... 8 more
Finished Send. Step Finished.
Having threads be blocked for 15 minutes at a time is a major problem. Any info on why this happens and how to fix it would be greatly appreciated. Thank you!
Found an answer finally. It's actually a system setting.
Source
How many times to retry before deciding that something is wrong and
it is necessary to report this suspicion to network layer. Minimal RFC
value is 3, it is default, which corresponds to 3sec-8min depending on
RTO.
/proc/sys/net/ipv4/tcp_retries2

Short code to test HTTP latency?

There are some nice libraries like this from Apache, but that's little bit too complex for my purpose. All I need is to get the best estimate of HTTP latency (the time it takes to get connected with the server, regardless of transfer speed).
I tried the HTTP connection code from this answer:
private void doPing() {
//Remember time before connection
long millis = System.currentTimeMillis();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"))) {
//We don't need any data
reader.close();
//Times is the array where we store our results
times.add((int)(System.currentTimeMillis()-millis));
//I see lot's of these in console, so it's probably working
System.out.println("Request done...");
}
//If internet is dead, does it throw exception?
catch(Exception e) {
times.add(-1);
}
}
The thing is that I am not so sure what am I measuring. Looping through the values gave me this results:
Testing connection to http://www.seznam.cz
Threads: 5
Loops per thread: 50
Given up waiting for results.
Average time to connection: 53.8 [ms]
Failures: 0.0%
Testing connection to http://www.seznam.cz
Threads: 5
Loops per thread: 100
Average time to connection: 43.58 [ms]
Failures: 0.0%
Testing connection to http://www.seznam.cz
Threads: 5
Loops per thread: 400
Average time to connection: 30.145 [ms]
Failures: 0.0%
Testing connection to http://www.stackoverflow.com
Threads: 5
Loops per thread: 30
Given up waiting for results.
Average time to connection: 4006.1111111111113 [ms]
Failures: 0.0%
Testing connection to http://www.stackoverflow.com
Threads: 5
Loops per thread: 80
Given up waiting for results.
Average time to connection: 2098.695652173913 [ms]
Failures: 0.0%
Testing connection to http://www.stackoverflow.com
Threads: 5
Loops per thread: 200
Given up waiting for results.
Average time to connection: 0.0 [ms]
//Whoops, connection dropped again
Failures: 100.0%
//Some random invalid url
Testing connection to http://www.sdsfdser.tk/
Threads: 4
Loops per thread: 20
Average time to connection: 0.0 [ms]
Failures: 100.0%
Not only that I am not so sure if I calculated what I wanted (though it reflects my experience), I am also not sure what happes in non standard cases.
Does the URL handle timeouts?
Will it allways throw exception on timeout?
While keeping in mind that this project is supposed to be simple and lightweight, could you tell me if I'm doing it right?
I think hailin suggested you create a raw Socket and connect it to the server instead of using URLConnection. I tried both, and I'm getting much higher latency with your version. I think opening a URLConnection must be doing some additional stuff in the background, though I'm not sure what.
Anyway, here's the version using a Socket (add exception handling as needed):
Socket s = new Socket();
SocketAddress a = new InetSocketAddress("www.google.com", 80);
int timeoutMillis = 2000;
long start = System.currentTimeMillis();
try {
s.connect(a, timeoutMillis);
} catch (SocketTimeoutException e) {
// timeout
} catch (IOException e) {
// some other exception
}
long stop = System.currentTimeMillis();
times.add(stop - start);
try {
s.close();
} catch (IOException e) {
// closing failed
}
This resolves the hostname (www.google.com in the example), establishes a TCP connection on port 80 and adds the milliseconds it took to times. If you don't want the time for the DNS resolution in there, you can create an InetAddress with InetAddress.getByName("hostname") before you start the timer and pass that to the InetSocketAddress constructor.
Edit: InetSocketAddress's constructor also resolves the host name right away, so constructing it from a resolved ip address shouldn't make a difference.

Jedis Exception java.net.ConnectException: Address already in use

I have a Jedis Server and I had made a separate RedisManager for managing the jedis connections. The code for RedisManager is as follows
package RedisServerPackage;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
public class RedisManager {
private static final RedisManager instance = new RedisManager();
private static final JedisPoolConfig poolConfig= new JedisPoolConfig();
private static JedisPool pool = null;
private RedisManager() {}
public final static RedisManager getInstance() {
if(pool == null)
{
poolConfig.setMaxTotal(-1);
pool = new JedisPool(poolConfig,"localhost");
}
return instance;
}
public void release() {
pool.destroy();
}
public Jedis getJedis() {
return pool.getResource();
}
public void returnJedis(Jedis jedis) {
pool.returnResource(jedis);
}
}
Now I execute my code where I have about 1000 clients hitting the server and performing certain operations using the PubSub model. I have monitored the redis-server and found that at a time, maximum 45 clients were active and max blocked clients were around 39. After running the client code for about 5 minutes or so, I get the exception
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:50)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:88)
at RedisServerPackage.RedisManager.getJedis(RedisManager.java:31)
at RedisServerPackage.RedisQueue.dequeue(RedisQueue.java:45)
at RedisServerPackage.QueueProcessor.run(QueueProcessor.java:22)
at java.lang.Thread.run(Thread.java:745)
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Address already in use
at redis.clients.jedis.Connection.connect(Connection.java:148)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:75)
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1572)
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:69)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at redis.clients.util.Pool.getResource(Pool.java:48)
... 5 more
Caused by: java.net.ConnectException: Address already in use
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at redis.clients.jedis.Connection.connect(Connection.java:142)
... 12 more
I am not able to find out as to what is causing this exception. Also, I am reusing the jedis instances. An example code is
public void JedisExample(String temporaryString) {
Jedis jedis = manage.getJedis();
try {
// Some code here
} catch (Exception e) {
System.out.println(e);
}finally{
manager.returnJedis(jedis);
// manage is an instance of RedisManager class provided before.
}
}
I had this exception happening intermittently on MacOS when trying to load test my server app.
Turns out, the problem was related to the fact, that macOS only has 16K ports available that won't be released until socket TIME_WAIT is passed. The default timeout for TIME_WAIT is 15 seconds.
You can check yours via
sysctl net.inet.tcp.msl
To fix it temporarily to allow load testing, I used
sudo sysctl -w net.inet.tcp.msl=1000
this reduced TIME_WAIT to 1 second, allowing to create and release connections faster, which in turn enabled me to get Tomcat to convert REST requests to Redis PUBSUB messages at the rate of about 4000 qps and got 0 errors after 4 hours of bombardment under 16 concurrent Siege threads. Before, about 1% of requests would error out with the exception above.
The author of the question did not state the OS, but I hope this answer might help someone else running into similar situation, because this entry comes on top when searching for such exception in Jedis. Basically, check your TIME_WAIT when load testing, regardless of OS.
UPDATE
Warning. Do not do it in production! Ideally, increase it back to 15 seconds after load testing round on your workstations. Decreasing TIME_WAIT might be dangerous, because sockets become available faster after closing, and some delayed packets might arrive to a newly opened connection, causing unpredictable errors or even compromising security. Read more on the TCP/IP and TIME_WAITs, before you decide to follow the instructions above or consult your networking engineer.

netstat showing last_ack after closing socket

I'm trying one scenario where my local ip is pinging ,server1_ip and server2_ip, but it's causing hogging on server as there are more than 1 connections on same ip's port as below..
[root#local ~]# netstat -antup -p|grep 8000
tcp 1 1 ::ffff:local_ip:58972 ::ffff:server1_ip:8000 LAST_ACK -
tcp 1 1 ::ffff:local_ip:49169 ::ffff:server2_ip:8000 LAST_ACK -
tcp 1 0 ::ffff:local_ip:49172 ::ffff:server2_ip:8000 CLOSE_WAIT 25544/java
tcp 1 0 ::ffff:local_ip:58982 ::ffff:server1_ip:8000 CLOSE_WAIT 25544/java
tcp 1 1 ::ffff:local_ip:58975 ::ffff:server1_ip:8000 LAST_ACK -
tcp 1 1 ::ffff:local_ip:49162 ::ffff:server2_ip:8000 LAST_ACK -
there are 2 threads , on some functionality I need to stop thread and also close socket connection on port 8000.
which is I'm doing with following method which is part of my thread.
protected void disconnect() {
if (this.mSocket != null) {
try {
this.mSocket.shutdownInput();
this.mSocket.shutdownOutput();
this.mOutputStream.flush();
this.mOutputStream.close();
this.mInputStream.close();
this.mSocket.close();
} catch (Exception vException) {
vException.printStackTrace();
}
}
this.mInputStream = null;
this.mOutputStream = null;
this.mSocket = null;
}
but when this method is called it's sending that connection in LAST_ACK state.
Please let me know cause of this and solution on this problem.
CLOSE_WAIT and LAST_ACK are intermediary states of a TCP connection reached just before closing. The tcp connection should eventually reache the CLOSED state: CLOSE_WAIT->LAST_ACK->CLOSED. So there is normal what are you seeing in netstat.
See this diagram of a tcp connection transition state: http://www.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf
The only issue in your code is that you call flush on the OutputStream after shutting down output. You can remove the shutdownInput & shutdownOutput, they are useful when you want to close a single communication way: input or output.

Categories