reactor-netty TcpClient connection pool does not release connections

reactor-netty TcpClient connection pool does not release connections - java

Using reactor-netty ConnectionProvider the connection is not released to the pool, so can not be reused, until disconnected
I found a workaround: disconnecting the connection after the response received, but it is terrible and can be appropriate only in my case when I connect to the neighbour server in the same network
clientsMap.computeIfAbsent("$host:$port") { hostPort ->
TcpClient.create(
ConnectionProvider.builder(hostPort)
.maxConnections(maxConnections)
.pendingAcquireTimeout(Duration.ofSeconds(4))
.maxIdleTime(Duration.ofSeconds(10))
.build()
)
.host(host)
.port(port)
.observe { _, state -> println("state: $state") }
//workaround as one packet is certainly less than 4096 bytes and the connection is stable enough
.option(ChannelOption.RCVBUF_ALLOCATOR, FixedRecvByteBufAllocator(4096))
}.connect().flatMap { connection ->
connection
.outbound()
.sendByteArray(Mono.just(send))
.then()
.thenMany(connection.inbound().receive().asByteArray())
.takeUntil { it.size < 4096 }
.map { it.toHexString() }
.collect(Collectors.joining())
.timeout(timeout)
//the next line causes disconnecting after the response being read
.doFinally { connection.channel().disconnect() }
}
when requesting data the output is
state: [connected]
state: [configured]
state: [disconnecting]
state: [connected]
state: [configured]
state: [disconnecting]
But this solution is much worse than returning connection to the pool and reusing it later
Expected to have a kind of connectionProvider.release(connection) method to forcibly return connection to the pool, and then I'd call it in doFinally, but there is nothing like that
if we comment the doFinally call, then state: [disconnecting] is not written and when more than maxConnections connections are acquired and not disconnected by the remote host yet, we catch
Pool#acquire(Duration) has been pending for more than the configured timeout of 4000ms
tested reactor-netty versions are 1.0.6 - 1.0.13
would appreciate solutions of returning connection to the pool and reusing it instead of disconnecting

Related

How to solve "Socket read timed out" when using hikari connection pool

I am developing an application using play framework (version 2.8.0), java(version 1.8) with an oracle database(version 12C).
There is only zero or one hit to the database in a day, I am getting below error.
java.sql.SQLRecoverableException: IO Error: Socket read timed out
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:919)
at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:2005)
at com.zaxxer.hikari.pool.PoolBase.quietlyCloseConnection(PoolBase.java:138)
at com.zaxxer.hikari.pool.HikariPool.lambda$closeConnection$1(HikariPool.java:447)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Socket read timed out
at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:174)
at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
at oracle.jdbc.driver.T4C7Ocommoncall.doOLOGOFF(T4C7Ocommoncall.java:62)
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:908)
... 6 common frames omitted
db {
default {
driver=oracle.jdbc.OracleDriver
url="jdbc:oracle:thin:#XXX.XXX.XXX.XX:XXXX/XXXXXXX"
username="XXXXXXXXX"
password="XXXXXXXXX"
hikaricp {
dataSource {
cachePrepStmts = true
prepStmtCacheSize = 250
prepStmtCacheSqlLimit = 2048
}
}
}
}
It seems it is causing due to inactive database connection, How can I solve this?
Please let me know if any other information is required?

You can enable TCP keepalive for JDBC - either be setting directive or by adding "ENABLE=BROKEN" into connection string.
Usually Cisco/Juniper cuts off TCP connection when it is inactive for more that on hour.
While Linux kernel starts sending keepalive probes after two hours(tcp_keepalive_time). So if you decide to turn tcp keepalive on, you will also need root, to change this kernel tunable to lower value(10-15 minutes)
Moreover HikariCP should not keep open any connection for longer than 30 minutes - by default.
So if your FW, Linux kernel and HikariCP all use default settings, then this error should not occur in your system.
See HikariCP official documentation
maxLifetime:
This property controls the maximum lifetime of a connection in the
pool. An in-use connection will never be retired, only when it is
closed will it then be removed. On a connection-by-connection basis,
minor negative attenuation is applied to avoid mass-extinction in the
pool. We strongly recommend setting this value, and it should be
several seconds shorter than any database or infrastructure imposed
connection time limit. A value of 0 indicates no maximum lifetime
(infinite lifetime), subject of course to the idleTimeout setting. The
minimum allowed value is 30000ms (30 seconds). Default: 1800000 (30
minutes)

I have added the below configuration for hickaricp in configuration file and it is
working fine.
## Database Connection Pool
play.db.pool = hikaricp
play.db.prototype.hikaricp.connectionTimeout=120000
play.db.prototype.hikaricp.idleTimeout=15000
play.db.prototype.hikaricp.leakDetectionThreshold=120000
play.db.prototype.hikaricp.validationTimeout=10000
play.db.prototype.hikaricp.maxLifetime=120000

Jedis Exception java.net.ConnectException: Address already in use

I have a Jedis Server and I had made a separate RedisManager for managing the jedis connections. The code for RedisManager is as follows
package RedisServerPackage;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
public class RedisManager {
private static final RedisManager instance = new RedisManager();
private static final JedisPoolConfig poolConfig= new JedisPoolConfig();
private static JedisPool pool = null;
private RedisManager() {}
public final static RedisManager getInstance() {
if(pool == null)
{
poolConfig.setMaxTotal(-1);
pool = new JedisPool(poolConfig,"localhost");
}
return instance;
}
public void release() {
pool.destroy();
}
public Jedis getJedis() {
return pool.getResource();
}
public void returnJedis(Jedis jedis) {
pool.returnResource(jedis);
}
}
Now I execute my code where I have about 1000 clients hitting the server and performing certain operations using the PubSub model. I have monitored the redis-server and found that at a time, maximum 45 clients were active and max blocked clients were around 39. After running the client code for about 5 minutes or so, I get the exception
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:50)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:88)
at RedisServerPackage.RedisManager.getJedis(RedisManager.java:31)
at RedisServerPackage.RedisQueue.dequeue(RedisQueue.java:45)
at RedisServerPackage.QueueProcessor.run(QueueProcessor.java:22)
at java.lang.Thread.run(Thread.java:745)
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Address already in use
at redis.clients.jedis.Connection.connect(Connection.java:148)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:75)
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1572)
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:69)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at redis.clients.util.Pool.getResource(Pool.java:48)
... 5 more
Caused by: java.net.ConnectException: Address already in use
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at redis.clients.jedis.Connection.connect(Connection.java:142)
... 12 more
I am not able to find out as to what is causing this exception. Also, I am reusing the jedis instances. An example code is
public void JedisExample(String temporaryString) {
Jedis jedis = manage.getJedis();
try {
// Some code here
} catch (Exception e) {
System.out.println(e);
}finally{
manager.returnJedis(jedis);
// manage is an instance of RedisManager class provided before.
}
}

I had this exception happening intermittently on MacOS when trying to load test my server app.
Turns out, the problem was related to the fact, that macOS only has 16K ports available that won't be released until socket TIME_WAIT is passed. The default timeout for TIME_WAIT is 15 seconds.
You can check yours via
sysctl net.inet.tcp.msl
To fix it temporarily to allow load testing, I used
sudo sysctl -w net.inet.tcp.msl=1000
this reduced TIME_WAIT to 1 second, allowing to create and release connections faster, which in turn enabled me to get Tomcat to convert REST requests to Redis PUBSUB messages at the rate of about 4000 qps and got 0 errors after 4 hours of bombardment under 16 concurrent Siege threads. Before, about 1% of requests would error out with the exception above.
The author of the question did not state the OS, but I hope this answer might help someone else running into similar situation, because this entry comes on top when searching for such exception in Jedis. Basically, check your TIME_WAIT when load testing, regardless of OS.
UPDATE
Warning. Do not do it in production! Ideally, increase it back to 15 seconds after load testing round on your workstations. Decreasing TIME_WAIT might be dangerous, because sockets become available faster after closing, and some delayed packets might arrive to a newly opened connection, causing unpredictable errors or even compromising security. Read more on the TCP/IP and TIME_WAITs, before you decide to follow the instructions above or consult your networking engineer.

netstat showing last_ack after closing socket

I'm trying one scenario where my local ip is pinging ,server1_ip and server2_ip, but it's causing hogging on server as there are more than 1 connections on same ip's port as below..
[root#local ~]# netstat -antup -p|grep 8000
tcp 1 1 ::ffff:local_ip:58972 ::ffff:server1_ip:8000 LAST_ACK -
tcp 1 1 ::ffff:local_ip:49169 ::ffff:server2_ip:8000 LAST_ACK -
tcp 1 0 ::ffff:local_ip:49172 ::ffff:server2_ip:8000 CLOSE_WAIT 25544/java
tcp 1 0 ::ffff:local_ip:58982 ::ffff:server1_ip:8000 CLOSE_WAIT 25544/java
tcp 1 1 ::ffff:local_ip:58975 ::ffff:server1_ip:8000 LAST_ACK -
tcp 1 1 ::ffff:local_ip:49162 ::ffff:server2_ip:8000 LAST_ACK -
there are 2 threads , on some functionality I need to stop thread and also close socket connection on port 8000.
which is I'm doing with following method which is part of my thread.
protected void disconnect() {
if (this.mSocket != null) {
try {
this.mSocket.shutdownInput();
this.mSocket.shutdownOutput();
this.mOutputStream.flush();
this.mOutputStream.close();
this.mInputStream.close();
this.mSocket.close();
} catch (Exception vException) {
vException.printStackTrace();
}
}
this.mInputStream = null;
this.mOutputStream = null;
this.mSocket = null;
}
but when this method is called it's sending that connection in LAST_ACK state.
Please let me know cause of this and solution on this problem.

CLOSE_WAIT and LAST_ACK are intermediary states of a TCP connection reached just before closing. The tcp connection should eventually reache the CLOSED state: CLOSE_WAIT->LAST_ACK->CLOSED. So there is normal what are you seeing in netstat.
See this diagram of a tcp connection transition state: http://www.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf
The only issue in your code is that you call flush on the OutputStream after shutting down output. You can remove the shutdownInput & shutdownOutput, they are useful when you want to close a single communication way: input or output.

Java NIO Selector can select no more than 50 SelectionKeys?

I use siege to stress test my hand built file server, it works pretty well for small files(less than 1KB), while when tested with a 1MB file, it does not work as expected.
The following is the result of the test with a small file:
neevek#~$ siege -c 1000 -r 10 -b http://127.0.0.1:9090/1KB.txt
** SIEGE 2.71
** Preparing 1000 concurrent users for battle.
The server is now under siege.. done.
Transactions: 10000 hits
Availability: 100.00 %
Elapsed time: 9.17 secs
Data transferred: 3.93 MB
Response time: 0.01 secs
Transaction rate: 1090.51 trans/sec
Throughput: 0.43 MB/sec
Concurrency: 7.29
Successful transactions: 10000
Failed transactions: 0
Longest transaction: 1.17
Shortest transaction: 0.00
The following is the result of a test with a 1MB file:
neevek#~$ siege -c 1000 -r 10 -b http://127.0.0.1:9090/1MB.txt
** SIEGE 2.71
** Preparing 1000 concurrent users for battle.
The server is now under siege...[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
When siege terminates with the above errors, my file server still spins with a fixed number of WRITABLE SelectionKey, i.e. Selector.select() keeps returning a fixed number, say 50.
With the above tests, it looks to me that my file server cannot accept no more than 50 concurrent connections, because when running the test with small file, I notice that the server selects 1 or 2 SelectionKeys, when running with big file, it selects up to 50 every time.
I tried to increase backlog in Socket.bind() with no help.
What could be the cause of the problem?
EDIT
More info:
When testing with a 1MB file, I noticed that siege terminated with a Broken pipe error, and the file server only accepted 198 connections, though I specified 1000 concurrent connections x 10 rounds(1000*10=10000) to flood the server.
EDIT 2
I have tested with the following code(a single class) to reproduce the same problem, in this code, I only accept connections, I don't read or write, the siege client terminated with Connection reset or Broken pipe error before connections time out. I also noticed that Selector can only select less than 1000 keys. you may try the code below to witness the problem.
public class TestNIO implements Runnable {
ServerSocketChannel mServerSocketChannel;
Selector mSelector;
public static void main(String[] args) throws Exception {
new TestNIO().start();
}
public TestNIO () throws Exception {
mSelector = Selector.open();
}
public void start () throws Exception {
mServerSocketChannel = ServerSocketChannel.open();
mServerSocketChannel.configureBlocking(false);
mServerSocketChannel.socket().bind(new InetSocketAddress(9090));
mServerSocketChannel.socket().setSoTimeout(150000);
mServerSocketChannel.register(mSelector, SelectionKey.OP_ACCEPT);
int port = mServerSocketChannel.socket().getLocalPort();
String serverName = "http://" + InetAddress.getLocalHost().getHostName() + ":" + port;
System.out.println("Server start listening on " + serverName);
new Thread(this).start();
}
#Override
public void run() {
try {
Thread.currentThread().setPriority(Thread.MIN_PRIORITY);
while (true) {
int num = mSelector.select();
System.out.println("SELECT = " + num + "/" + mSelector.keys().size());
if (num > 0) {
Iterator<SelectionKey> keys = mSelector.selectedKeys().iterator();
while (keys.hasNext()) {
final SelectionKey key = keys.next();
if (key.isValid() && key.isAcceptable()) {
accept(key);
}
}
// clear the selected keys
mSelector.selectedKeys().clear();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private void accept (SelectionKey key) throws IOException {
SocketChannel socketChannel = mServerSocketChannel.accept();
socketChannel.configureBlocking(false);
socketChannel.socket().setSoTimeout(1000000);
socketChannel.socket().setKeepAlive(true);
// since we are connected, we are ready to READ
socketChannel.register(mSelector, SelectionKey.OP_READ);
}
}

It is actually related the the default backlog value set for the ServerSocketChannel
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/ServerSocketChannelImpl.java#138
You can fix the issue by passing the backlog value as a second parameter to the bind method.
mServerSocketChannel.socket().bind(new InetSocketAddress(9090), "backlog value")

Check the ulimit and hard limit of the number of open files (file descriptors)
I'm guessing you're using linux. You can look in limits.conf
/etc/security/limits.conf

This problem may not relate to my code, I run the same test against an nginx server running locally(MacOSX), the same error occurred. So it most likely relates to hardware or the siege client.

Query on ThreadSafeClientConnManager (Apache HttpClient 4.1.1)

Am using ThreadSafeClientConnManager
Apache HttpComponenets-Client4.1.1 for my connection pool.
When am releasing the connection back to the pool i say:
cm.releaseConnection(client,-1,TimeUnit.SECONDS);
cm.closeExpiredConnections(); cm.closeIdleConnections(20,
TimeUnit.SECONDS);
[Here cm is object of ThreadSafeClientConnManager]
and as mentioned in the javadoc releaseConnection(ManagedClientConnection conn, long validDuration,TimeUnit timeUnit) am setting valid duration to -ve (<=0) value.
But when i see server logs i find that:
org.apache.http.impl.conn.DefaultClientConnection] Connection shut down
2011-08-17 14:12:48.992 DEBUG Other Thread-257 org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager Released connection is not reusable.
2011-08-17 14:12:48.992 DEBUG Other Thread-257 org.apache.http.impl.conn.tsccm.ConnPoolByRoute Releasing connection [HttpRoute[{}->http://server-name:port][null]
2011-08-17 14:12:48.992 DEBUG Other Thread-257 [org.apache.http.impl.conn.tsccm.ConnPoolByRoute] Notifying no-one, there are no waiting threads
2011-08-17 14:12:48.993 DEBUG Other Thread-257 [org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager] Closing expired connections
2011-08-17 14:12:48.993 DEBUG Other Thread-257 [shaded.org.apache.http.impl.conn.tsccm.ConnPoolByRoute] Closing expired connections
2011-08-17 14:12:48.993 DEBUG Other Thread-257 [shaded.org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager] Closing connections idle longer than 20 SECONDS
Here i see in the logs that "Released connection is not reusable"
Does that mean that "-1 " is not making the connection reusable and connections are closed instead of returning to pool?
If so can any one please suggest how can i make it reusable.
Thanks in advance.

Per default HTTP connection released back to the manager are considered non-reusable. If a connection remains in a consistent state it should be marked as re-usable by using ManagedClientConnection#markReusable() prior to being released back to the manager.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.