I have a grails application that uses a quartz job to automatically augment documents with data supplied from an external service. The quartz job uses a non-transactional Service to query and update documents from mongodb. The actual querying and updating uses mongo's native driver (no GORM). The quartz job and Service do not return database connections to the connection pool resulting in the error "Connection wait timeout" once all connections have been consumed.
I can fix the problem by adding a call to DB.requestDone() after querying and updating in the spawned thread. I am not sure about the ramifications of using requestDone for this purpose.
Are there negative consequences for calling requestDone without ever calling requestStart?
Are there any threading issues with requestStart/requestDone. For example, what happens if another thread is in the middle of querying Mongo when requestDone is called?
Is there a better way to ensure a database connection is returned to the connection pool?
FYI, I tried adding cursor.close() but that did not resolve the problem.
Related
We are using H2 started as database server process and listening on standard TCP/IP port 9092.
Our application is deployed in a Tomcat using connection pooling. We do a purge during idle time which at the end results in closing all connections to H2. From time to time we observe errors when the application tries to open the connection to H2 again:
SCHEDULERSERVICE schedule: Exception: Database may be already in use: "Waited for database closing longer than 1 minute". Possible solutions: close all other connection(s); use the server mode [90020-199]
org.h2.jdbc.JdbcSQLNonTransientConnectionException: Database may be already in use: "Waited for database closing longer than 1 minute". Possible solutions: close all other connection(s); use the server mode [90020-199]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:617)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:427)
at org.h2.message.DbException.get(DbException.java:205)
at org.h2.message.DbException.get(DbException.java:181)
at org.h2.engine.Engine.openSession(Engine.java:209)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:178)
at org.h2.engine.Engine.createSession(Engine.java:161)
at org.h2.server.TcpServerThread.run(TcpServerThread.java:160)
at java.lang.Thread.run(Thread.java:748)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:617)
at org.h2.engine.SessionRemote.done(SessionRemote.java:607)
at org.h2.engine.SessionRemote.initTransfer(SessionRemote.java:143)
at org.h2.engine.SessionRemote.connectServer(SessionRemote.java:431)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:317)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:169)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:148)
at org.h2.Driver.connect(Driver.java:69)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
The problem occurs when the Tomcat connection pool closes all idle connection (unused) and one connection still in use is closed afterwards.
The next attempt to open a new connection fails, a retry is successfully after some wait time.
Under which circumstances does this exception happen?
What does the exception mean?
Are there any recommendations to follow to avoid the problem?
It sounds to me that H2 does a database close after the last connection has been closed.
When does the database close occure?
How can database closures been controlled?
Thx in advance
Thorsten
Embedded database in web applications needs careful handling of its lifecycle.
You can add a javax.servlet.ServletContextListener implementation (marked with #WebListener annotation or included into web.xml) and add explicit database shutdown to its contextDestroyed() methods.
You can force database shutdown here with connection.createStatement().execute("SHUTDOWN"). If your application needs to write something to database during unload, it should do it before that command.
Without the explicit shutdown H2 closes the database when all connections are closed, if some other behavior wasn't configured explicitly (with parameters in JDBC URL, for example). For example, DB_CLOSE_DELAY sets the additional delay, maybe your application uses that setting and therefore H2 doesn't close the database immediately, or application doesn't close all connections immediately.
Anyway, when you're trying to update the web application of the fly, Tomcat tries to initialize the new version before its old version is unloaded. If H2 is in classpath of the web application itself, the new version will be unable to connect to the database during short period of time when the new version is already online but the old version isn't unloaded yet.
If you don't like it, you can run the standalone H2 Server process and use remote connections to it in your web applications.
Another option is to move H2 to the classpath of Tomcat itself and configure the connection pool as resource in the server.xml, in that case it shouldn't be affected by the lifecycle of your applications.
In both these cases you shouldn't use the SHUTDOWN command.
UPDATED
With client-server connections to a remote server such exception means that server decided to close the database because there are no active connection. This operation can't be interrupted and reverted in the middle. On attempt to open a new connection to the same database during this process it waits at most for 1 minute for completion of this process to re-open the database again. This timeout is not configurable.
There are two possible solutions.
DB_CLOSE_DELAY setting can be used with some large value in seconds. When all connections are closed, database will stay online for the specified number of seconds. -1 also can be used to set an infinite timeout.
You can try to speed up the shutdown process, but you have to figure out what takes so much time by yourself. The file compaction procedure is limited to 200 milliseconds by default, it may take a longer time, but I think it shouldn't be that long. Maybe you have a lot of temporary objects or uncommitted data. Maybe you have a very high fragmentation of database file. It's hard to say what's going wrong without further investigation.
I am trying to achieve application continuity with Oracle 12c database & Oracle UCP(Universal Connection Pool). As per the official documentation, I have implemented the following in my application. I am using ojdbc8.jar along with the equivalent ons.jar and the ucp.jar in my application.
PoolDataSource pds = oracle.ucp.jdbc.PoolDataSourceFactory.getPoolDataSource();
Properties as per oracle documentation:
pds.setConnectionFactoryClassName("oracle.jdbc.replay.OracleDataSourceImpl");
pds.setUser("username");
pds.setPassword("password");
pds.setInitialPoolSize(10);
pds.setMinPoolSize(10);
pds.setMaxPoolSize(20);
pds.setFastConnectionFailoverEnabled(true);
pds.setONSConfiguration("nodes=IP_1:ONS_PORT_NUMBER,IP_2:ONS_PORT_NUMBER");
pds.setValidateConnectionOnBorrow(true);
pds.setURL("jdbc:oracle:thin:#my_scan_name.my_domain_name.com:PORT_NUMBER/my_service_name");
// I have also tried using the TNS-Like URL as well. //
However, I am not able to acheive application continuity. I have some inflight transactions that I expect to replay when I bring down the RAC node on which my database service is running. What I observe is that my service migrates to the next available RAC node in the cluster, however, my in-flight transactions fail. What expect to happen over here is that the drivers will automatically restart the failed in-flight transactions. However, I dont see this happening. The queries that I fire are the database, sometimes I see them being triggered again on the database side, but we see Connection Closed Exception on the client side
According to some documentation, application continuity allows the application to mask outages from the user. My doubt here is whether my understanding that the application continuity will replay the SQL Statement that were in-flight when the outage occured is correct or is the the true meaning of application continuity something else.
I have refered to some blogs such as this,
https://martincarstenbach.wordpress.com/2013/12/13/playing-with-application-continuity-in-rac-12c/
The example mentioned here does not seem to be intended for replaying of in-flight SQL statements.
Is application continuity capable or replaying the in-flight SQL statements during a outage, or is FCF and application continuity only restore the state of the connection object and make it usable by the user post the outage has occured. If the earlier is true, then please guide me if I am missing anything in the application level settings in my code that is keeping me from achieving replay.
Yes your understanding is correct. With the replay driver, Application Continuity can replay in-flight work so that an outage is invisible to the application and the application can continue, hence the name of the feature. The only thing that's visible from the application is a slight delay on the JDBC call that hit the outage. What's also visible is an increase in memory usage on the JDBC side because the driver maintains a queue of calls. What happens under the covers is that, upon outage, your physical JDBC connection will be replaced by a brand new one and the replay driver will replay its queue of calls.
Now there could be cases where replay fails. For example replay will fail if the data has changed. Replay will also be disabled if you have multiple transactions within a "request". A "request" starts when a connection is borrowed from the pool and ends when it's returned back to the pool. Typically a "request" matches a servlet execution. If within this request you have more than one "commit" then replay will be disabled at runtime and the replay driver will stop queuing. Also note that auto-commit must be disabled.
[I'm part of the Oracle team that designed and implemented this feature]
I think the jdbc connection string could be your problem:
pds.setURL("jdbc:oracle:thin:#my_scan_name.my_domain_name.com:PORT_NUMBER/my_service_name");
You are using a so called EZConnect String but this is not supported with AC.
Alias (or URL) = (DESCRIPTION=
(CONNECT_TIMEOUT= 120)(RETRY_COUNT=20) RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS_LIST=(LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=primary-scan)(PORT=1521)))
(ADDRESS_LIST=(LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=secondary-scan)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=gold-cloud)))
The web application allows to request huge amount of information and to cope with the load and timeout I have JMS. Every time data export functionality is called it is routed via JMS and web query is released.
Onward, JMS calls Oracle stored procedure and it takes for while - 5-10 min. to execute it. My initial thought was that the call is asynchronous, because the query is released. However, Weblogic JMS has 15s timeout value for database connection. So, after while it kills the connection because there is no data in the pipe (Oracle stored procedure is busy to pull necessary data).
So far I found the following solutions:
Increase timeout. The support at data center were not very happy and pointed out, that there is a problem with app design. The point is, that the query has to be asynchronous on all layers, including JMS->Oracle.
Make the stored procedure as a job and close JMS->Oracle connection once the call is initiated. The problem with this approach is that I need to ping Oracle DB constantly in order to find out when the job is completed.
The same as second, but to try callback JMS. However, short reading gave my impression, that such solution isn't very popular because it won't be general (hard-coded values, etc).
What would be your suggestions? Thank you in advance.
If the stored procedure can not be optimized, then I would simply extend the timeout. I think the timeout on JMS can be overridden by some Java annotation, for a particular task. So you do not even have to modify global Weblogic setting.
You course there are ways how to call the procedure anachronistically. Either by using AQ (are you using Advanced queuing and an JMS provider) or by by submitting it as an scheduler job. But there is a risk that you can kill the database if you submit to many jobs to be executed in parallel. Both DBMS_JOB and (newer preferred) DBMS_SCHEDULER_JOB have ways how to restrict the number of concurrent sessions running. So instead of calling the stored procedure directly you will call another wrapper procedure, which will submit a single-shot non-repeating DBMS_SCHEDURER_JOB. Then the scheduler will execute your procedure as a scheduler task. But this solution you have to negotiate with your DBAs. There is a view in Oracle where you can check status
You you were using Oracle Advanced Queuing, you could also submit a job into the Oracle AQ. And then having a single PL/SQL stored procedure (running as "infinite" scheduler job), which will withdraw messages from AQ one by one, and executing the desired stored procedure. When the procedure finishes it can submit it's response into another AQ. And since AQ can act as JMS provider on Weblogic your application will be notified via JMS about procedure finish.
I created a job which uses reader of type
org.springframework.batch.item.database.HibernateCursorItemReader to execute a query.
The problem is database connection in this case is hitting connection limit (I have a oracle error ORA-12519, TNS:no appropriate service handler found) and, surprisingly, I noticed exit_code=EXECUTING and status=STARTED on BATCH_STEP_EXECUTION table.
If I run again the job it will respond "A job execution for this job is already running" and if I issue -restart on this task, it complains with message "No failed or stopped execution found for job".
How does spring batch manages these fatal failure situations? Do I have to remove these execution information manually or is there a reset option?
Thank you for any help
the current release of Spring Batch (2.2.0) doesn't appear to have an out of the box solution for this situation. as discussed in this question, 'manual' intervention in the database may be required. alternatively, if this is a particular job that is hanging (that is, you know the job name), you can do the following as well;
use the JobExplorer.findRunningJobExecutions(jobName)
go through the list of executions and 'fail' them (JobExecution.upgradeStatus(BatchStatus.FAILED))
save the change using JobRepository.update(jobExecution)
Just an FYI why this problem of connection limit occurs when using a CursorItemReader (JDBCCursorItemReader or HibernateCursorItemReader)
The cursorItemReader opens a separate connection even if there is already a connection opened for the transaction (Reader -> Processors -> Writer). So, each step execution needs two connections even if it is in a single transaction and hitting the same db. This causes the connection bottleneck and so the number of db connections should be double the number of threads configured in thread pool to execute the steps in parallel.
This can also be resolve if you provide a separate connection to your CursorReader.
JdbcPagingItemReader is another implementation of ItemReader which uses the same connection opened for the transaction.
It seems to me that we have some code that is not starting a transaction yet for read-only operations our queries via JPA/Hibernate as well as straight SQL seem to work. A hibernate/jpa session would have been opened by our framework but for a few spots in legacy code we found no transactions were being opened.
What seems to end up happening is that the code usually runs as long as it does not use EntityManager.persist and EntityManager.merge. However once in awhile (maybe 1/10) times the servlet container fails with this error...
Failed to load resource: the server responded with a status of 500 (org.hibernate.exception.JDBCConnectionException: The last packet successfully received from the server was 314,024,057 milliseconds ago. The last packet sent successfully to the server was 314,024,057 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.)
As far as I can tell only the few spots in our application code that do not have transactions started before a query is made will have this problem. Does anybody else think it could be the non-transactional query running that causes this behaviour?
FYI here is our stack...
-Guice
-Guice-Persist
-Guice-Servlet
-MySql 5.1.63
-Hibernate/C3P0 4.1.4.Final
-Jetty
Yes, I think.
If you start a query without opening a transaction, this transaction will be opened automatically by the underlying layer. This connection, with an opened transaction, will be returned to the connection pool and given to another user, that will receive a connection to an already-opened transaction, and that could lead to inconsistent state.
Here in my company we had a lot of problems in the past with read-only non-transactional queries, and adjusted our framework to handle this.
Besides that, we talked to BoneCP developers and they accepted to develop a set of features to help handle this problem, like auto-rollback uncommitted transactions returned to the pool, and print a stack trace of what method forgot to commit the transaction.
This matter was discussed here:
http://jolbox.com/forum/viewtopic.php?f=3&t=98