Can Tomcat interrupt request-handling thread execution and cause Service unavailable?

Can Tomcat interrupt request-handling thread execution and cause Service unavailable? - java

We have a mysterious error in our Tomcat and webapps log. A Java web application is running on Tomcat 6, uses Oracle 11g database. All request are logged. We use Commons DBCP for database connection pooling. One feature of our application is that all connection are request-scoped. We implemented that with Spring's SmartDataSource. Releasing connections is wrapped in try/finally, so even if there's an error during the request, the connection gets released. This works well the whole time we tested it.
One day our server returned Error 503 "Service unavailable". In the logs we only found one exception: org.apache.commons.dbcp.AbandonedTrace$AbandonedObjectException. So it seems that Common DBCP abandoned connection collector found an abandoned connection and reported it. It is not error by itself, it is a suggestion that an error is somewhere else.
The trace tells us the exact time and code when the connection was acquired. We explored that time in logs and there was a request that began but never ended. This would explain why the connection was not released, but what bothers us is how try/finally could have been abrupted?
I think this is a Tomcat issue, because there was no other exception in our application logs and because the error code that the server returned was not the ususal Error 500 "Internal server error".
Does anyone have any suggestions why this could happen? It is possible that Tomcat interrupts a thread so that try/finally are ignored?

Related

WARNING c3p0: Another error has occurred: Connection is Closed

We are using c3p0 as the connection pool in our application with Microsoft SQL Database. The connections are tested on checkout with validation query so that application doesn't work with stale connections.
Recently, we have started seeing following warning in the application logs (a lot of these messages are present in sequence). Anyone have seen this sort of exception and what does it mean?
2017-03-29 09:34:24 [WARNING] [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
2017-03-29 09:34:24 [WARNING] [c3p0] Another error has occurred [ com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed. ] which will not be reported to listeners!
2017-03-29 09:34:24 com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
2017-03-29 09:34:24 at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:190)
2017-03-29 09:34:24 at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:388)
2017-03-29 09:34:24 at com.microsoft.sqlserver.jdbc.SQLServerConnection.prepareStatement(SQLServerConnection.java:2166)
2017-03-29 09:34:24 at com.microsoft.sqlserver.jdbc.SQLServerConnection.prepareStatement(SQLServerConnection.java:1853)
2017-03-29 09:34:24 at com.mchange.v2.c3p0.impl.NewProxyConnection.prepareStatement(NewProxyConnection.java:1076)
My concerns are:
Does this warning (or exception message) mean that the query had actually failed to execute and the code will throw the exception?
Is it just a warning message that is logged by c3p0 because we test connection on checkout and since the connection is closed, it will now acquire a new connection from the database and the application will run without any issue?
Any help will be appreciated. Thanks!

So, there's not enough information here to say what the initial cause of the problem was. Anything could have happened, a network outage, whatever. Testing a Connection on checkout ensures that the Connection worked at the time of checkout, but once in client-land, nothing prevents a break. It should be very, unless you are keeping Connections checked out for long periods of time. (Don't do that! With a Connection pool, adopt a just-in-time, quick checkout, immediate check-in strategy.)
Anyway, some attempt by the application to use the Connection threw an Exception. c3p0 internally checked the Connection then, decided the Connection was broken, and emitted an event (specified by the JDBC spec, but of interest only to internal listeners) indicating a Connection error. c3p0 responds to this by marking the Connection for destruction rather than check-in when the application is done.
The application, despite having seen the first Exception, continued to use the Connection. A second Exception occurred (yes, this Connection really is broken). That's what c3p0 is logging here. It's ignoring the second Exception, not signaling a Connection error, because a Connection error has already been signalled for this Connection. But it's a bit surprised and annoyed to find that the Connection is still in use ;)
All exceptions are relayed to the application. Silently swallowing up problems is the very opposite of c3p0's philosophy. But whatever your application was doing with this Connection triggered an Exception, and your application kept doing other things that triggered more.
That doesn't necessarily mean that anything is wrong. An application may tentatively interpret an Exception as something other than a Connection failure. Perhaps an Exception occurred because of a constraint violation, and if so, there is a workaround? If it were something like that, here the application would find further evidence that, yes, the Connection is broken, because this next use of the Connection, after a previous Exception had been handled, will continue to fail.
If I were you, I'd review the application code that triggers this stack trace, and look particularly for Exception handling in prior steps that might be too forgiving, that might catch an Exception and continue when it should instead abort. Again, that's not necessarily the case -- it could be that your application is doing exactly what it should, it's appropriately retrying or attempting to continue after a potentially recoverable error, and it's robust to the possibility that the retry will fail too, in which case you'll just harmlessly see these stack traces in your logs, hopefully very rarely, when already-checked-out Connections fail. But I'd definitely review your Exception handling logic in this code path, during the step that triggered the stack trace, and importantly during prior steps which would have triggered the first Exception. Usually one Exception aborts a database codepath (except for an eventual rollback() and close()), here you are barreling on to a second, which may well be awesome, but make sure it is what you want to do.
If you are seeing this a lot, make sure Connection testing on checkout really is configured properly, then try to minimize the period during which the Connection is checked out, then try to understand why your network or something at the server side might be failing occasionally.

Getting "Write attempt on defunct connection" Error From Datastax Cassandra Java Driver

I have a web service application using Cassandra 2.0 and Datastax java driver 2.0.2. I sometimes get the stacktrace below when trying to write to/read from database, especially if the application has been sitting there for a while (like overnight). This error usually goes away when I retry, however, sometimes it persists and I have to restart the web app to get rid of the error.
I wonder if this is some sort of "stale connection" issue. However, the Datastax java driver documentation indicates it is supposed to keep the connection alive.
I did a google search on the error message and only two (!) hits were given by google. They are related. This is the answer in one of the google result:
Sylvain Lebresne Apr 2 You're running into
https://datastax-oss.atlassian.net/browse/JAVA-250. We'll fix it soon
hopefully (I have some half-finished patch that I need to finish), but
currently, if you restart a whole cluster without doing queries during
the restat, it can sometimes happen that you'll get this before the
cluster properly reconnect. In the meantime and as a workaround, you
can always make sure to run a few trivial queries while you're doing
the cluster restart to avoid it.
However this does not look like my scenario because we are not restarting the cluster at all. I wonder if anyone has some insights about this error?
Stacktrace:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ec2-54-197-xxx-xxx.compute-1.amazonaws.com/54.197.xxx.xxx:9042 (com.datastax.driver.core.ConnectionException: [ec2-54-197-xxx-xxx.compute-1.amazonaws.com/54.197.xxx.xxx:9042] Write attempt on defunct connection))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)

I have what I believe is the exact same issue (Write attempt on defunct connection) on my development machine intermittently.
It seems to happen when my dev machine goes to sleep while the server is up. Obviously there's no power management in the AWS cluster you're running, but it gives you a hint - the key is that something is breaking your control connection or intermittently preventing network connectivity between your hosts.
You should see the reconnection thread in your logs:
21:34:51.616 [Reconnection-1] ERROR c.d.driver.core.ControlConnection - [Control connection] Cannot connect to any host, scheduling retry in 2000 milliseconds
The next request after this will always succeed in my experience.
TL; DR - check for networking issues or any intermittent shutdown of servers that could break the control connection. The driver should do a better job of re-establishing broken control connections, sounds like they're working on it for JAVA-250

An SQLException was provoked, java.lang.InterruptedException, am I running out of db connections?

So we run a Hibernate, Spring, Spring Webflow stack. From what I've read so far it might also be important to know we use c3p0-0.9.1.2.
Over the last couple of days we've noticed the server suddenly stop. Users cannot log into the website, nothing appears to happen, the browser simply sits loading the page forever. The server logs also simply halt.
When we notice this we shutdown the tomcat instance and all of a sudden quite a few of the following errors get logged;
13:05:57.492 [TP-Processor7] WARN o.h.util.JDBCExceptionReporter - SQL Error: 0, SQLState: null
13:05:57.492 [TP-Processor7] ERROR o.h.util.JDBCExceptionReporter - An SQLException was provoked by the following failure: java.lang.InterruptedException
Any ideas what these mean? Google hasn't been too helpful. Are we leaking db connections somewhere and the pool cannot gain a new session?
We have just put in a couple new Spring Webflow flows and are experiencing a slightly increased amount of website traffic but we haven't seen this behaviour before.

I suspect those InterruptExceptions come from the actual shutdown of those threads by the container, and simply indicate that those threads are existant when Tomcat shuts down.
Instead, I would grab a thread dump from Tomcat when it next freezes. I would also get a DBA to tell you what's happening in the database. From the above I'm guessing you're hung on a database resource, but a thread dump and analysis from a DBA will certainly point you in the right direction.
Here's a Thread Dump JSP as an alternative means of generating thread dumps.

How to fix error: [BEA][SQLServer JDBC Driver]No more data available to read

My java application does use DB Connection pooling. One of the functionality started failing today with this error:
[BEA][SQLServer JDBC Driver]No more data available to read
This doesn't occur daily. Once I restart my application server things look fine for some days and this error comes back again.
Anyone encountered this error? Reasons might vary, but I would like to know those various reasons to mitigate my issue.

Is it possible that the database or network connection has briefly had an outage? You might expect any currently open result sets then to become invalid with resulting errors.
I've never seen this particular error, but then I don't work with BEA or SQL Server, but a quick google does show other folks suggesting such a cause.
When you're using a connection pool, if you do get such a glitch, then all connections in teh pool become "stale" or invalid. My application server (WebSphere) has the option to discard the entire connection pool after particular errors are detected. The result then is that one unlucky request sees the error, but then subsequent requests get a new connection and recover. If you don't discard the whole pool then you get a failure as each stale connection is used and discarded.
I suggest you investigate to see a). whether your app server has such a capability b). how you application responds if the database is bounced, if this replicates the error then maybe you've found the cause.

Jboss Server Exception

Jboss server was throwing an Exception all of a sudden "You are trying to use a connection factory that has been shut down: ManagedConnectionFactory is null". No changes made to the datasources, prior to this. Everything got normal after a server bounce...
What are all the possibilities for this?

After digging into this issue further, we found there was a dependency which was not responding, that increased the thread count.There was also memory leak. Due to this server got restarted by itself... From the logs it was clear the server got restarted and that could be the reason this exception. As mentioned in the above comments the search pages from the google shows that this exception could be thrown when there is issue loading the datasources... Everything was fine after bouncing the server... Thanks all...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.