I have a Jedis Server and I had made a separate RedisManager for managing the jedis connections. The code for RedisManager is as follows
package RedisServerPackage;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
public class RedisManager {
private static final RedisManager instance = new RedisManager();
private static final JedisPoolConfig poolConfig= new JedisPoolConfig();
private static JedisPool pool = null;
private RedisManager() {}
public final static RedisManager getInstance() {
if(pool == null)
{
poolConfig.setMaxTotal(-1);
pool = new JedisPool(poolConfig,"localhost");
}
return instance;
}
public void release() {
pool.destroy();
}
public Jedis getJedis() {
return pool.getResource();
}
public void returnJedis(Jedis jedis) {
pool.returnResource(jedis);
}
}
Now I execute my code where I have about 1000 clients hitting the server and performing certain operations using the PubSub model. I have monitored the redis-server and found that at a time, maximum 45 clients were active and max blocked clients were around 39. After running the client code for about 5 minutes or so, I get the exception
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:50)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:88)
at RedisServerPackage.RedisManager.getJedis(RedisManager.java:31)
at RedisServerPackage.RedisQueue.dequeue(RedisQueue.java:45)
at RedisServerPackage.QueueProcessor.run(QueueProcessor.java:22)
at java.lang.Thread.run(Thread.java:745)
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Address already in use
at redis.clients.jedis.Connection.connect(Connection.java:148)
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:75)
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1572)
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:69)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at redis.clients.util.Pool.getResource(Pool.java:48)
... 5 more
Caused by: java.net.ConnectException: Address already in use
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at redis.clients.jedis.Connection.connect(Connection.java:142)
... 12 more
I am not able to find out as to what is causing this exception. Also, I am reusing the jedis instances. An example code is
public void JedisExample(String temporaryString) {
Jedis jedis = manage.getJedis();
try {
// Some code here
} catch (Exception e) {
System.out.println(e);
}finally{
manager.returnJedis(jedis);
// manage is an instance of RedisManager class provided before.
}
}
I had this exception happening intermittently on MacOS when trying to load test my server app.
Turns out, the problem was related to the fact, that macOS only has 16K ports available that won't be released until socket TIME_WAIT is passed. The default timeout for TIME_WAIT is 15 seconds.
You can check yours via
sysctl net.inet.tcp.msl
To fix it temporarily to allow load testing, I used
sudo sysctl -w net.inet.tcp.msl=1000
this reduced TIME_WAIT to 1 second, allowing to create and release connections faster, which in turn enabled me to get Tomcat to convert REST requests to Redis PUBSUB messages at the rate of about 4000 qps and got 0 errors after 4 hours of bombardment under 16 concurrent Siege threads. Before, about 1% of requests would error out with the exception above.
The author of the question did not state the OS, but I hope this answer might help someone else running into similar situation, because this entry comes on top when searching for such exception in Jedis. Basically, check your TIME_WAIT when load testing, regardless of OS.
UPDATE
Warning. Do not do it in production! Ideally, increase it back to 15 seconds after load testing round on your workstations. Decreasing TIME_WAIT might be dangerous, because sockets become available faster after closing, and some delayed packets might arrive to a newly opened connection, causing unpredictable errors or even compromising security. Read more on the TCP/IP and TIME_WAITs, before you decide to follow the instructions above or consult your networking engineer.
Related
I am developing an application using play framework (version 2.8.0), java(version 1.8) with an oracle database(version 12C).
There is only zero or one hit to the database in a day, I am getting below error.
java.sql.SQLRecoverableException: IO Error: Socket read timed out
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:919)
at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:2005)
at com.zaxxer.hikari.pool.PoolBase.quietlyCloseConnection(PoolBase.java:138)
at com.zaxxer.hikari.pool.HikariPool.lambda$closeConnection$1(HikariPool.java:447)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Socket read timed out
at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:174)
at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
at oracle.jdbc.driver.T4C7Ocommoncall.doOLOGOFF(T4C7Ocommoncall.java:62)
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:908)
... 6 common frames omitted
db {
default {
driver=oracle.jdbc.OracleDriver
url="jdbc:oracle:thin:#XXX.XXX.XXX.XX:XXXX/XXXXXXX"
username="XXXXXXXXX"
password="XXXXXXXXX"
hikaricp {
dataSource {
cachePrepStmts = true
prepStmtCacheSize = 250
prepStmtCacheSqlLimit = 2048
}
}
}
}
It seems it is causing due to inactive database connection, How can I solve this?
Please let me know if any other information is required?
You can enable TCP keepalive for JDBC - either be setting directive or by adding "ENABLE=BROKEN" into connection string.
Usually Cisco/Juniper cuts off TCP connection when it is inactive for more that on hour.
While Linux kernel starts sending keepalive probes after two hours(tcp_keepalive_time). So if you decide to turn tcp keepalive on, you will also need root, to change this kernel tunable to lower value(10-15 minutes)
Moreover HikariCP should not keep open any connection for longer than 30 minutes - by default.
So if your FW, Linux kernel and HikariCP all use default settings, then this error should not occur in your system.
See HikariCP official documentation
maxLifetime:
This property controls the maximum lifetime of a connection in the
pool. An in-use connection will never be retired, only when it is
closed will it then be removed. On a connection-by-connection basis,
minor negative attenuation is applied to avoid mass-extinction in the
pool. We strongly recommend setting this value, and it should be
several seconds shorter than any database or infrastructure imposed
connection time limit. A value of 0 indicates no maximum lifetime
(infinite lifetime), subject of course to the idleTimeout setting. The
minimum allowed value is 30000ms (30 seconds). Default: 1800000 (30
minutes)
I have added the below configuration for hickaricp in configuration file and it is
working fine.
## Database Connection Pool
play.db.pool = hikaricp
play.db.prototype.hikaricp.connectionTimeout=120000
play.db.prototype.hikaricp.idleTimeout=15000
play.db.prototype.hikaricp.leakDetectionThreshold=120000
play.db.prototype.hikaricp.validationTimeout=10000
play.db.prototype.hikaricp.maxLifetime=120000
We are using TCP server.sometimes we get following exception
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
I googled it and found out that it happened when exceed timout which is setby setSoTimeout(timeinmilli)method. but, we didnt invoke that methode.
sample tcp server:
class TCPServer
{
public static void main(String args[]){
public void initializeConnectionHandler(String ip, int port) {
try{
ServerSocket serverSocket = new ServerSocket(port, serverSocket_backlog, InetAddress.getByName(ip));
log("Waiting for client on port: " +serverSocket.getLocalPort());
while(true){
Socket socket = serverSocket.accept();
log("Just connected to "+ socket.getRemoteSocketAddress());
Runnable tcpConnectionHandler = new TCPConnectionHandler(serverSocket, socket, workerManager);
new Thread(tcpConnectionHandler).start();
}
}
catch (Exception e){
log("Exception occured while initializing ConnectionHandlers: "+e);
logException(e);
System.exit(0);
}
}
}
}
You're right that this exception should only occur if you've set a timeout by calling setSoTimeout. Nevertheless if you go through the search results you can find occasional discussions about this effect as you've seen it, e.g. in Tomcat where they assumed that they found a bug in the JVM and added a workaround to keep the server running.
Another result (german language, sorry) reporting about this exception with Jenkins indicates that there might be problems when IPv6 is active as well. Solution there was to set the system property java.net.preferIPv4Stack to true.
I'm not a fan of the latter "solution" but concerning the workaround the Tomcat-team implemented, I suggest you do the same. At least, don't stop the whole loop in case of an exception like this.
Internally it is calling socketAccept method of TwoStacksPlainSocketImpl class.
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
Method is native so you can't completely find the reason but below is the native code from openjdk.
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/windows/native/java/net/TwoStacksPlainSocketImpl.c
Where you can find it is throwing mentioned exception.
I have a web app being served by jetty + mysql. I'm running into an issue where my database connection pool gets exhausted, and all threads start blocking waiting for a connection. I've tried two database connection pool libraries: (1) bonecp (2) hikari. Both exhibit the same behavior with my app.
I've done several thread dumps when I see this state, and all the blocked threads are in this state (not picking on bonecp, I'm sure it's something on my end now):
"qtp1218743501-131" prio=10 tid=0x00007fb858295800 nid=0x669b waiting on condition [0x00007fb8cd5d3000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000763f42d20> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:82)
at com.jolbox.bonecp.AbstractConnectionStrategy.getConnection(AbstractConnectionStrategy.java:90)
at com.jolbox.bonecp.BoneCP.getConnection(BoneCP.java:553)
at com.me.Foo.start(Foo.java:30)
...
I'm not sure where to go from here. I was thinking that I would see some stack traces in the thread dump where my code was stuck doing some lengthly operation, not waiting for a connection. For example, if my code looks like this:
public class Foo {
public void start() {
Connection conn = threadPool.getConnection();
work(conn);
conn.close();
}
public void work(Connection conn) {
.. something lengthy like scan every row in the database etc ..
}
}
I would expect one of the threads above to have a stack trace that shows it working away in the work() method:
...
at com.me.mycode.Foo.work()
at com.me.mycode.Foo.start()
but instead they're just all waiting for a connection:
...
at com.jolbox.bonecp.BoneCP.getConnection() // ?
at com.me.mycode.Foo.work()
at com.me.mycode.Foo.start()
Any thoughts on how to continue debugging would be great.
Some other background: the app operates normally for about 45 minutes, mem and thread dumps show nothing out of the ordinary. Then the condition is triggered and the thread count spikes up. I started thinking it might be some combination of sql statements the app is trying to perform which turn into some sort of lock on the mysql side, but again I would expect some of the threads in the stack traces above to show me that they're in that part of the code.
The thread dumps were taken using visualvm.
Thanks
Take advantage of the configuration options for the connection pool (see BoneCPConfig / HikariCPConfig). First of all, set a connection time-out (HikariCP connectionTimeout) and a leak detection time-out (HikariCP leakDetectionThreshold, I could not find the counterpart in BoneCP). There might be more configuration options that dump stack-traces when something is not quite right.
My guess is that your application does not always return a connection to the pool and after 45 minutes has no connection in the pool anymore (and thus blocks forever trying to get a connection from the pool). Treat a connection like opening/closing a file, i.e. always use try/finally:
public void start() {
Connection conn = null;
try {
work(conn = dbPool.getConnection());
} finally {
if (conn != null) {
conn.close();
}
}
}
Finally, both connection pools have options to allow JMX monitoring. You can use this to monitor for strange behavior in the pool.
I question the whole design.
If you have a waiting block in a multithreaded netIO, you need a better implementation of the connection.
I suggest you take a look at non blocking IO (Java.nio, channels package), or granulate your locks.
I use siege to stress test my hand built file server, it works pretty well for small files(less than 1KB), while when tested with a 1MB file, it does not work as expected.
The following is the result of the test with a small file:
neevek#~$ siege -c 1000 -r 10 -b http://127.0.0.1:9090/1KB.txt
** SIEGE 2.71
** Preparing 1000 concurrent users for battle.
The server is now under siege.. done.
Transactions: 10000 hits
Availability: 100.00 %
Elapsed time: 9.17 secs
Data transferred: 3.93 MB
Response time: 0.01 secs
Transaction rate: 1090.51 trans/sec
Throughput: 0.43 MB/sec
Concurrency: 7.29
Successful transactions: 10000
Failed transactions: 0
Longest transaction: 1.17
Shortest transaction: 0.00
The following is the result of a test with a 1MB file:
neevek#~$ siege -c 1000 -r 10 -b http://127.0.0.1:9090/1MB.txt
** SIEGE 2.71
** Preparing 1000 concurrent users for battle.
The server is now under siege...[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
[error] socket: read error Connection reset by peer sock.c:460: Connection reset by peer
When siege terminates with the above errors, my file server still spins with a fixed number of WRITABLE SelectionKey, i.e. Selector.select() keeps returning a fixed number, say 50.
With the above tests, it looks to me that my file server cannot accept no more than 50 concurrent connections, because when running the test with small file, I notice that the server selects 1 or 2 SelectionKeys, when running with big file, it selects up to 50 every time.
I tried to increase backlog in Socket.bind() with no help.
What could be the cause of the problem?
EDIT
More info:
When testing with a 1MB file, I noticed that siege terminated with a Broken pipe error, and the file server only accepted 198 connections, though I specified 1000 concurrent connections x 10 rounds(1000*10=10000) to flood the server.
EDIT 2
I have tested with the following code(a single class) to reproduce the same problem, in this code, I only accept connections, I don't read or write, the siege client terminated with Connection reset or Broken pipe error before connections time out. I also noticed that Selector can only select less than 1000 keys. you may try the code below to witness the problem.
public class TestNIO implements Runnable {
ServerSocketChannel mServerSocketChannel;
Selector mSelector;
public static void main(String[] args) throws Exception {
new TestNIO().start();
}
public TestNIO () throws Exception {
mSelector = Selector.open();
}
public void start () throws Exception {
mServerSocketChannel = ServerSocketChannel.open();
mServerSocketChannel.configureBlocking(false);
mServerSocketChannel.socket().bind(new InetSocketAddress(9090));
mServerSocketChannel.socket().setSoTimeout(150000);
mServerSocketChannel.register(mSelector, SelectionKey.OP_ACCEPT);
int port = mServerSocketChannel.socket().getLocalPort();
String serverName = "http://" + InetAddress.getLocalHost().getHostName() + ":" + port;
System.out.println("Server start listening on " + serverName);
new Thread(this).start();
}
#Override
public void run() {
try {
Thread.currentThread().setPriority(Thread.MIN_PRIORITY);
while (true) {
int num = mSelector.select();
System.out.println("SELECT = " + num + "/" + mSelector.keys().size());
if (num > 0) {
Iterator<SelectionKey> keys = mSelector.selectedKeys().iterator();
while (keys.hasNext()) {
final SelectionKey key = keys.next();
if (key.isValid() && key.isAcceptable()) {
accept(key);
}
}
// clear the selected keys
mSelector.selectedKeys().clear();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private void accept (SelectionKey key) throws IOException {
SocketChannel socketChannel = mServerSocketChannel.accept();
socketChannel.configureBlocking(false);
socketChannel.socket().setSoTimeout(1000000);
socketChannel.socket().setKeepAlive(true);
// since we are connected, we are ready to READ
socketChannel.register(mSelector, SelectionKey.OP_READ);
}
}
It is actually related the the default backlog value set for the ServerSocketChannel
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/ServerSocketChannelImpl.java#138
You can fix the issue by passing the backlog value as a second parameter to the bind method.
mServerSocketChannel.socket().bind(new InetSocketAddress(9090), "backlog value")
Check the ulimit and hard limit of the number of open files (file descriptors)
I'm guessing you're using linux. You can look in limits.conf
/etc/security/limits.conf
This problem may not relate to my code, I run the same test against an nginx server running locally(MacOSX), the same error occurred. So it most likely relates to hardware or the siege client.
We are porting a simple Java application between Tandem NonStop systems, from G-Series to H-Series. Java version is 1.5.0_02.
When performing basic I/O tasks like getting output stream from or opening a client socket, we receive exceptions like
java.io.IOException: Value out of range
or
java.net.SocketException: Value out of range
("value out of range" is Tandem native jargon for, well, quite everything I suppose).
Has anybody got similar issues? i.e. I/O corruption while for example messing with JNI?
I suppose there is something wrong with the system, but where might it be?
Thank you.
EDIT:
adding snippets as requested
sample snippet (a) - using Runtime.exec () (adapted)
Properties envVars = new Properties();
Process p = r.exec("/bin/env");
envVars.load(p.getInputStream());
Stack trace (a):
java.io.IOException: Value out of range (errno:4034)
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:194)
at java.lang.UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:221)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at util.Environment.getVariables(Environment.java:39)
Last line fails, and output gets redirected to console (!).
sample snippet (b) - using HttpURLConnection:
public WorkerThread (HttpURLConnection conn, String requestData, Logger logger)
{
this.conn = conn;
...
}
public void run ()
{
OutputStream out = conn.getOutputStream ();
}
Stack trace (b):
java.net.SocketException: Value out of range (errno:4034)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:507)
at sun.net.NetworkClient.doConnect(NetworkClient.java:155)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:365)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:477)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:280)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:337)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:176)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:736)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:162)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:828)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:230)
Case (a) can be avoided because it was a workaround for other issues with previous JRE version (!), but same behaviour with sockets is really nasty.
Error code 4034 seem to indicate that a specific server is not running in your NonStop cluster. Are you sure that your system is setup properly?
Update: The problem was caused by a spurious .so library.