The current connections count keeps increasing in my Elasticache Redis node - java

I am using Jedis in a tomcat web app to connect to an Elascticache Redis node. The app is used by hundreds of users in day time. I am not sure of this is normal or not, but whenever I check the current connections count with cloudwatch metrics, I see the current connections increasing without falling down.
This is my Jedis pool configuration:
public static JedisPool getPool(){
if(pool == null){
JedisPoolConfig config = new JedisPoolConfig();
config.setMinIdle(5);
config.setMaxIdle(35);
config.setMaxTotal(1500);
config.setMaxWaitMillis(3000);
config.setTestOnBorrow(true);
config.setTestWhileIdle(true);
pool = new JedisPool(config, PropertiesManager.getInstance().getRedisServer());
}
return pool;
}
and this is how I always use the pool connections to execute redis commands:
Jedis jedis = JedisUtil.getPool().getResource();
try{
//Redis commands
}
catch(JedisException e){
e.printStackTrace();
throw e;
}finally{
if (jedis != null) JedisUtil.getPool().returnResource(jedis);
}
With this configuration, the count is currently over 200. Am I missing something that is supposed to discard or kill unsused connections ? I set maxIdle to 35 and I expected that the count falls down to 35 when the traffic is very low but this never happened.

we had the same problem. After investigating a little bit more further we came across with this (from redis official documentation - http://redis.io/topics/clients) :
By default recent versions of Redis don't close the connection with the client if the client is idle for many seconds: the connection will remain open forever.
By default, aws provides a timeout value of 0. Therefore, any connection that has been initialised with your redis instance will be kept by redis even if the connection initialised by your client is down.
Create a new cache parameter policy in aws with a timeout different of 0 and then you should be good

In the cache parameter group you can edit timeout. It defaults to 0 which leaves idle connection in redis. If you set it to 100 it will remove connections idle for 100 seconds.

You can check the pool size using JMX. Activating the idle evictor thread is a good idea. You can do so by setting the timeBetweenEvictionRunsMillis parameter on the JedisPoolConfig.
If you don't use transactions (EXEC) or blocking operations (BLPOP, BRPOP), you could stick to one connection if connection count is a concern for you. The lettuce client is thread-safe with one connection

Related

Understanding the lifecycle of a connection managed by PoolingHttpClientConnectionManager in Apache HTTP client

I felt very confused after reading the Connection Management doc of the Apache HTTP components module, and also a few other resources on connection keep alive strategy and connection eviction policy.
There are a bunch of adjectives used in there to describe the state of a connection like stale, idle, available, expired and closed etc. There isn't a lifecycle diagram describing how a connection changes among these states.
My confusion mainly arose from below situation.
I set a ConnectionKeepAliveStrategy that provides a KeepAliveDuration of 5 seconds via below code snippet.
ConnectionKeepAliveStrategy keepAliveStrategy = ( httpResponse, httpContext ) -> {
HeaderElementIterator iterator =
new BasicHeaderElementIterator( httpResponse.headerIterator( HTTP.CONN_KEEP_ALIVE ) );
while ( iterator.hasNext() )
{
HeaderElement header = iterator.nextElement();
if ( header.getValue() != null && header.getName().equalsIgnoreCase( "timeout" ) )
{
return Long.parseLong( header.getValue(), 10) * 1000;
}
}
return 5 * 1000;
};
this.client = HttpAsyncClients.custom()
.setDefaultRequestConfig( requestConfig )
.setMaxConnTotal( 500 )
.setMaxConnPerRoute( 500 )
.setConnectionManager( this.cm )
.setKeepAliveStrategy( keepAliveStrategy )
.build();
The server I am talking to does support connections to be kept alive. When I printed out the pool stats of the connection manager after executing around ~200 requests asynchronously in a single batch, below info was observed.
Total Stats:
-----------------
Available: 139
Leased: 0
Max: 500
Pending: 0
And after waiting for 30 seconds (by then the keep-alive timeout had long been exceeded), I started a new batch of the same HTTP calls. Upon inspecting the connection manager pool stats, the number of available connections are is still 139.
Shouldn't it be zero since the keep-alive timeout had been reached? The PoolStats Java doc states that Available is "the number of idle persistent connections". Are idle persistent connections considered alive?
I think Apache HttpClient: How to auto close connections by server's keep-alive time is a close hit but hope some expert could give an insightful explanation about the lifecycle of a connection managed by PoolingHttpClientConnectionManager.
Some other general questions:
Does the default connection manager used in HttpAsyncClients.createdDefault() handle connection keep-alive strategy and connection eviction on its own?
What are the requirements/limitations that could call for implementing them on a custom basis? Will they contradict each other?
Documenting some of my further findings which might partially fulfill as an answer.
Whether using a ConnectionKeepAliveStrategy to set a timeout on the keep alive session or not, the connections will end up in the TCP state of ESTABLISHED, as inspected via netstat -apt. And I observed that they are automatically recycled after around 5 minutes in my Linux test environment.
When NOT using a ConnectionKeepAliveStrategy, upon a second request batch the established connections will be reused.
When using a ConnectionKeepAliveStrategy and its timeout has NOT been reached, upon a second request batch the established connections will be reused.
When using a ConnectionKeepAliveStrategy and its timeout has been exceeded, upon a second request batch, the established connections will be recycled into the TIME_WAIT state, indicating that client side has decided to close the connections.
This recycling can be actively exercised by performing connectionManager.closeExpiredConnections(); in a separate connection evicting thread, which will lead the connections into TIME_WAIT stage.
I think the general observation is that ESTABLISHED connections are deemed as Available by the connection pool stats, and the connection keep alive strategy with a timeout does put the connections into expiry, but it only takes effect when new requests are processed, or when we specifically instruct the connection manager to close expired connections.
TCP state diagram from Wikipedia for reference.

Spymemcached and Connection Failures

I though Spymemcached does attempt to reestablish connection to the server when this connection get lost.
But I am seeing something different; wondering what I misunderstand or what I do wrong. Here is some sample code:
MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("server:11211"));
while(true)
try {
System.out.println(c.get("hatsts"));
Thread.sleep(10000);
} catch(Exception e) {
e.printStackTrace();
}
It runs initially without problem. Then I pull the network plug. Subsequently, the client detects a network failure and throws following exception:
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
But then, when i re-establish the network, the client does not recover and continues throwing the exception; even after 5 min. I tried SpyMemcached 2.10.6 and 2.9.0.
What am I missing?
The problem here is that because you pulled the network cable the tcp socket on you client still thinks the connection is valid. The tcp keepalive varies from operating system to operating system and can be as high as 30 minutes. As a result the application (in this case Spymemcached) is not notified that the connection is no longer valid by the tcp layer and therefore Spymemcached doesn't try to reconnect.
The way Spymemcached detects this situation is by counting the amount of consecutive operation timeouts. The last time I checked the default value was 99. Once this many ops time out then Spymemcached will reconnect. You can change this value in the ConnectionFactory if you want to set it to some other value. There's a function called getContinuousTimeout() which is where the Spymemcached gets 99 from by default. You can construct your own ConnectionFactory with the ConnectionFactoryBuilder.
Hopefully this is enough information to answer your question and get you going in the right direction. If not let me know and I can add some more details.

How to monitor where my connections go?

I run some tomcat application, use jndi connection pool.
In some time connection pool stops to give connections and application hangs.
Seems because some code receives connection and doesn't return it back to the pool.
How can I monitor - which code does it ?
More common - I want to see what all connections do at the moment.
I cannot change application. But I can adjust Tomcat, maybe add some interceptors.
Most connection pool implementations can be configured to detect connections that are not returned to the pool. E.g. for Tomcat's JDBC connection pool there are various configurations options for "abandoned connections" (connections for which the lease expired). If you search for "Abandoned" on this web-page, you'll find the options:
removeAbandoned
removeAbandonedTimeout
logAbandoned
suspectTimeout
As mentioned on the web-page, these settings will add a little overhead but at least your application will not hang. When testing your application, set a low value for removeAbandonedTimeout and a low value for maxActive so that you can catch unreturned connections early.
I never use the connection pool API itself, I always wrap it in a helper.
That way, I can do this in the helper:
private Exception created = (0 == 1) ? new Exception() : null;
When I run into problems like yours, I just change one character (0 -> 1) to have a stack trace of who created this instance in my debugger.

Putting Jsch into connection pool in details

I put Jsch into commons-pool (with spring pool support) with initial success
http://docs.spring.io/spring/docs/3.2.4.RELEASE/spring-framework-reference/htmlsingle/#aop-ts-pool
However:
Should we pool the channels within the Session instead of pooling the sessions? Each Jsch session creates one thread. Pooling Jsch sessions will create x threads. Pooling channels, there will really be only one Jsch thread.
(commons-pool) what happens if the Jsch session went stale? How to regenerate the session in the context of the commons-pool or using spring pool support? How to detect whether it goes stale?
Thanks
Figured out my own question. I will share my project in the next day or two.
Pooling channels are much more effective. There is really no need to create multiple sessions (if the session connects to the same sftp endpoint).
I implemented a JSch connection pool (pooling channels) with spring pool and commons-pool. I will post to the github in the next day or two. The most important question is, what if the connection went stale.
I found out that based on my implementation of 1 Session - multiple channels, and if the connection went stale, the pooled objects (in this case, the channel) will be stale. The pooled object should be invalidated and deleted from the pool. When the connection comes back up, and when new application thread "borrows" from the pool, new pool objects will be created.
To validate my observation, my not-so-automated test:
a) Create a set (say 10) of app threads checking out channel resource from the pool.
b) Have the thread to sleep 20 seconds
c) Create another set of app threads checking out channel resources from the pool.
At a), set breakpoint when i==7, break the connection by "iptable drop (linux) or pfctl -e; pfctl -f /etc/pf.conf (mac, google how to do!)". This first set of app threads will get exception because the channel is broken.
At b), restart the connection
At c), the 2nd set of app threads will be successfully completing the operation because the broken connection has been restored.

The use of c3p0.idle_test_period.

I'm new to c3op, and confused about the use of :
c3p0.idle_test_period
In this link : HowTo configure the C3P0 connection pool
idleTestPeriod : Must be set in hibernate.cfg.xml (or hibernate.properties), Hibernate default:
0, If this is a number greater than 0, c3p0 will test all idle, pooled but unchecked-out
connections, every this number of seconds.
What is the purpose of this kind of test (idel, pooled connections), and the relationship between c3p0.idle_test_period and c3p0.timeout?
The database server may close a connection on its side after a certain amount of time - causing some error in your application, because it'll attempt to send a query on a connection which is no longer available on the server side.
In order to avoid this you can let the pool periodically check a connection (Think of a ping) for it's validity. This is what idle_test_period is for.
timeout is the timespan after which the pool will remove a connection from the pool, because the connection wasn't checked out (used) for a while and the pool contains more connections than c3pO.min_size.
I think this setting is used in hibernate in order to validate pooled connection after every few seconds . Consider a scenario if on database side, password is changed. Hibernate already pooled connections on old password. So it is security breach having pool with wrong password.So when hibernate will validate after few seconds it . It will invalidate that pooled connection.

Categories