We are using Jersey Server-Sent Events (SSE) to allow remote components of our application to listen to events raised by our Jersey/Tomcat server. This works great.
However, it is crucial that our server have an accurate list of currently-connected listeners (our remote components). To this end, our server sends a tiny message to each caller (via eventOutput.write) once every five seconds. If our remote component is shut down while SSE-connected, or if the remote computer is powered off while SSE-connected, our server's eventOutput.write throws the ClientAbortException/SocketException exception shown below. That's perfect: we catch the exception, mark that caller as no longer connected, and move on.
Now, for the problem. As I mentioned, eventOutput.write throws an exception in cases where our remote component software is not running, or where the computer it runs on has been powered down. However, there are two cases where calling eventOutput.write to a no-longer-connected computer does NOT throw an exception: 1) if the Ethernet cable of the remote computer is simply pulled while the caller is SSE-connected, and 2) if the network adapter in the remote computer is turned off (i.e., by an administrative action) while the caller is SSE-connected. In these two cases, we can call eventOutput.write to the remote computer every five seconds for hours and no exception is thrown. This makes it impossible to detect that the remote computer is no longer connected.
I see that EventOutput (and ChunkedOutput) has very few methods and properties, but I wonder if there is any way to configure or use it that will cause an exception to be thrown when writing to a remote computer that has been disconnected by having its Ethernet cable pulled or network adapter turned off.
And here is the (good/useful) exception we get in cases where eventOutput.write DOES throw the exception we want:
org.apache.catalina.connector.ClientAbortException: null
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) ~[catalina.jar:7.0.53]
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) ~[catalina.jar:7.0.53]
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) ~[catalina.jar:7.0.53]
at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.flush(ResponseWriter.java:303) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.message.internal.CommittingOutputStream.flush(CommittingOutputStream.java:292) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:240) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.internal.Errors.process(Errors.java:242) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:347) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput.flushQueue(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput.write(ChunkedOutput.java:180) ~[jaxrs-ri-2.13.jar:2.13.]
at com.appserver.webservice.AgentSsePollingManager$ConnectionChecker.run(AgentSsePollingManager.java:174) ~[AgentSsePollingManager$ConnectionChecker.class:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_71]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_71]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_71]
at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_71]
at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) ~[tomcat-coyote.jar:7.0.53]
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.http11.InternalOutputBuffer.flush(InternalOutputBuffer.java:119) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:799) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.Response.action(Response.java:174) ~[tomcat-coyote.jar:7.0.53]
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:366) ~[catalina.jar:7.0.53]
... 19 common frames omitted
I do not think it will be possible to account for all the possible failures by adding the code around SSE sockets even if Jersey adds all information accessible from the socket interface. The only viable solution is proper two way communication. In case of SSE chunked output stream, pulled cable does not cause any interruption because nothing is supposed to tell it that remote host is now unreachable (until OS closes the socket).
Your first step is right - implement heartbeats every N seconds. Then all you need to do is to report back with another tiny http call every so often that you still listen. It is up to you to do acknowledgments every 5 seconds or every minute - depends on how fast do you need a problem detection.
You can do it in the same Jersey Resource by implementing #POST (in RESTful terms it reads "create new ack that you receive events").
Note: browsers are good at re-establishing SSE connection on their own in case of network interrupts, no need to fiddle with it.
Related
I know there are a ton of these posts, but this is a little different. We are using vended code for part of our data processing system, and part of the system sends emails to clients if certain events take place on data insertion or deletion. Recently we have started getting address already in use exceptions. We checked the repository history, and nothing has changed in our code in the last 6 months for this system. We have already tried the typical solutions for this issue including increasing the number of connections allowed to the port with little success. We had a meeting with the vendor, and I asked if anything had changed in their code, and if they would assure that all connections in their code are explicitly closed. They indicated that they are explicitly closing all sockets. However they didn't show us the code so there is no way for us to know if this is true other than taking their word for it. So, the only thing I can think of to do is continue to increase the number of connections to the port until we stop getting bind exceptions. So, what is the industry standard for max number of connections to port 25; is there one? Also if anyone has any other suggestions I would greatly appreciate it? Thanks so much in advance, Robert
20210505112127.716 ERROR m.fiserv.ppx.business.notification.EmailNotifier : MessagingException from notify
javax.mail.MessagingException: Could not connect to SMTP host: SERVER.URL.COM, port: 25;
nested exception is:
java.net.BindException: Address already in use: connect
at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1545)
at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:453)
Caused by:
java.net.BindException: Address already in use: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:90)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:380)
20210505131529.950 ERROR erv.ppx.web.controller.AuditReportViewController : Error while generating HTML
net.sf.jasperreports.engine.JRException: Error writing to OutputStream writer : CorpAdminAuditReport
at net.sf.jasperreports.engine.export.JRHtmlExporter.exportReport(JRHtmlExporter.java:496)
at com.fiserv.ppx.web.controller.AuditReportViewController.generateReport(AuditReportViewController.java:184)
Caused by:
com.ibm.wsspi.webcontainer.ClosedConnectionException: OutputStream encountered error during write
at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.write(WCCByteBufferOutputStream.java:188)
at com.ibm.ws.webcontainer.srt.SRTOutputStream.write(SRTOutputStream.java:97)
20210505140706.240 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3834)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3904)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.sso.controller.SSOController : SSO Configuration error
java.lang.NullPointerException
at com.fiserv.ppx.business.db.PPXDbTransactionManager.<init>(PPXDbTransactionManager.java:60)
at com.fiserv.ppx.sso.impl.SSOLoginAuthenticator.authenticateSSOUser(SSOLoginAuthenticator.java:157)
We are running an ignite cluster with 3 nodes which pre-loads the data from 3rd party database (using custom cache store). when we try to connect to the cluster using java thin client and if the request reaches the cluster before data loading gets completed, we are getting unknown pair exception and some unstable behavior.
Is there anyway we can block the client request (TCP socket connection) till the data loading gets completed?
I tried with different life cycle events (NODE_START_COMPLETED) but no luck.
Stack trace
Caused by: org.apache.ignite.binary.BinaryInvalidTypeException: Unknown pair [platformId=0, typeId=-845247802]
at org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:707)
at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1757)
at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
at org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:798)
at org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143)
at org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinary(CacheObjectUtils.java:177)
at org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinaryIfNeeded(CacheObjectUtils.java:67)
at org.apache.ignite.internal.processors.cache.CacheObjectContext.unwrapBinaryIfNeeded(CacheObjectContext.java:125)
at org.apache.ignite.internal.processors.cache.GridCacheContext.unwrapBinaryIfNeeded(GridCacheContext.java:1773)
at org.apache.ignite.internal.processors.cache.GridCacheContext.unwrapBinaryIfNeeded(GridCacheContext.java:1761)
at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter.put(GridCacheStoreManagerAdapter.java:573)
at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter.putAll(GridCacheStoreManagerAdapter.java:627)
at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.batchStoreCommit(IgniteTxAdapter.java:1507)
at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:589)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.localFinish(GridNearTxLocal.java:3646)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:475)
... 41 common frames omitted
Caused by: java.lang.ClassNotFoundException: Unknown pair [platformId=0, typeId=-845247802]
at org.apache.ignite.internal.MarshallerContextImpl.getClassName(MarshallerContextImpl.java:394)
at org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:344)
at org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:698)
... 56 common frames omitted
There is no way to forbid thin clients to connect to a cluster using Ignite API at the moment. I created a JIRA ticket for this improvement: https://issues.apache.org/jira/browse/IGNITE-12237
The unknown pair exception doesn't seem to be caused by thin clients connecting at a wrong time though. Usually it's caused by a missing marshaller directory in the work path.
I am writing an HTTP client with Netty 4.1.12.Final and I have unit tests simulating the crash of the HTTP server in order to be able to handle it.
I noticed that, when it happens, the exceptionCaught callback method of my inbound handler is called with:
java.io.IOException: Une connexion existante a dû être fermée par l’hôte distant
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1100)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:372)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
Where the english equivalent of the exception message is quite probably:
java.io.IOException: An existing connection was forcibly closed by the remote host
Since this callback method is also called when an exception is thrown from my channelRead0 method of my inbound handler, I am asking a few questions:
Should I always consider an IOException "received" in the exceptionCaught callback as an indication that there is no point in continuing using the channel?
Since channelRead0 is declared to throw Exception, should I catch all IOException inside it in order to be sure that, when "receiving" an IOException in the exceptionCaught callback, it is related to the Channel?
Is there a way to know if an exception "received" in the exceptionCaught callback is related to I/O operations or to handlers operations?
Thank you for any hint!
1) If we are talking about a TCP connection then yes every IOException will result in having the connection closed automatically by netty as there is no way to recover.
2) I think I don't understand the question completely as each exception passed through the exceptionCaught(...) method is related the the channel which can be obtained by ctx.channel()
3) No there is no way in general. That said if its a TCP connection and its an IOException and its triggered by the actual transport we will close the connection.
Below is a simple code snippet that shows how to connect to a VoltDB server.
ClientConfig clientConfig = new ClientConfig();
Client client = ClientFactory.createClient(clientConfig);
String server = "192.168.43.32";
client.createConnection(server);
Based on my experiments, if the server is down or just not connectable from network layer, it will take about 75 seconds to get the response.
SEVERE: Failed to connect to 192.168.43.32, in 75,359 ms
java.net.ConnectException: Operation timed out
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:458)
at sun.nio.ch.Net.connect(Net.java:450)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:154)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:142)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:134)
at org.voltdb.client.Distributer.createConnectionWithHashedCredentials(Distributer.java:878)
at org.voltdb.client.ClientImpl.createConnectionWithHashedCredentials(ClientImpl.java:189)
at org.voltdb.client.ClientImpl.createConnection(ClientImpl.java:682)
at src.java.tutorial.voltdb.integration.ConnectionTest.main(ConnectionTest.java:27)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Is there any ways to set the time out time, so the application needs not to wait for such a long time. A successful connection normally takes just tens of milliseconds, so I think if the connection cannot be established within 1000 milliseconds, something is definitely wrong already.
I have tried the setting of below
clientConfig.setConnectionResponseTimeout(1000);
In this case, it has no effects at all. So I guess it is not for this purpose.
Normally when the database is down and your client tries to connect it will get an immediate Connection refused exception, for example:
Exception in thread "main" java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:364)
at sun.nio.ch.Net.connect(Net.java:356)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
at java.nio.channels.SocketChannel.open(SocketChannel.java:184)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:165)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:153)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:145)
at org.voltdb.client.Distributer.createConnectionWithHashedCredentials(Distributer.java:890)
at org.voltdb.client.ClientImpl.createConnectionWithHashedCredentials(ClientImpl.java:191)
at org.voltdb.client.ClientImpl.createConnection(ClientImpl.java:684)
at benchmark.Benchmark.<init>(Benchmark.java:17)
at benchmark.Benchmark.main(Benchmark.java:78)
In general, a "java.net.ConnectException: Connection timed out" can occur if there is a firewall that prevents the client from receiving any sort of response, or there could be other causes. The first thing to check might be if you have any firewall or network settings that would prevent access to port 21212 (the default VoltDB database connection port).
The ClientConfig setConnectionResponseTimeout() setting is used to cause a live connection to be closed if it hasn't received a response from a procedure call or a ping for the given number of milliseconds, but it is not used for creating a new connection.
I have a Quartz Job that executes a Stored Procedure in my MySQL database once every 5 minutes, and for some reason, 1 out of 3 executions fails and gives this weird exception. I have searched and searched for what this exception means, but I could not find a solution. Here is the full stack trace:
java.sql.SQLException: Could not retrieve transation read-only status server
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:951)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:941)
at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3939)
at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3910)
at com.mysql.jdbc.PreparedStatement.checkReadOnlySafeStatement(PreparedStatement.java:1258)
at com.mysql.jdbc.CallableStatement.checkReadOnlySafeStatement(CallableStatement.java:2656)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1278)
at com.mysql.jdbc.CallableStatement.execute(CallableStatement.java:920)
at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:3044)
at org.deadmandungeons.website.tasks.RankUpdateTask.execute(RankUpdateTask.java:30)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 1,198,219 milliseconds ago. The last packet sent successfully to the server was 950,420 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor43.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3673)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3562)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4113)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2812)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2761)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612)
at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3933)
... 9 more
Caused by: java.net.SocketException: Connection timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3116)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3573)
... 17 more
So I figured it is timing out because it thinks the MySQL server is in read-only status?
This only happens for this quartz job, and not any other time when I communicate with the database. This execution is of course happening in another thread, but I don't think that would have anything to do with it.
Why would it think the server was in read-only mode?
Also, I don't think "transation" is a word, so there's that...
Sorry for posting on old thread,
As stack trace says
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
This implies the link between JDBC and DB is broken.As per your observation you say 1 out of 3 job invocations fails.
You have these jobs scheduled every 5 minutes and as per trace the last successful message sent to server is ~15 minutes before.
Hence I suspect either
You are procedure is not returning (waiting on something)
The JDBC connection has been invalidated by the firewall/ proxy
It will interesting to see the how connections are managed, As per logs I see you are using c3p0.
You can try setting unreturnedConnectionTimeout and debugUnreturnedConnectionStackTraces. This will give you more insight into connection leaks or db calls which are taking long.
Research takes nowhere, as you guys said, but the error shows what seems to be a Database being populated by two applications at the same time.
Do you have admin privileges on this MySQL server? If you do, you should try setting
FLUSH TABLES WITH READ LOCK;
SET GLOBAL READ_ONLY=ON;
as a test to reproduce the error. Just to warn you, this command makes your database unwritable, so you will not be able to add data in it until you revert this configuration, obviously with
SET GLOBAL READ_ONLY=0;
UNLOCK TABLES;
If the result of this test is positive (same error had been reproduced), you should try isolating applications that are storing data on your database, to find out which one is conflicting with Quartz.
I'm sorry for being vague, but I hope it gives you some help...