I have a java program running 24/7. It accesses mysql database every 3 seconds only from 9.am to 3.PM. In this case when should I should open and close the MySql connection?
Should I open and close every 3 sec?
Should I open at 9.am and close at 3.pm?
Should I open once when the program start and never close it. But reconnect when connection is closed automatically and exceptions is thrown?
Why don't you simply use a connection pool. If that is too tedious since the connection will be frequently used you can reuse the same one imho.
While it is true that setting up and tearing down a MySQL connection is relatively cheap (when compared to, for example, Oracle), doing it every 3 seconds is a waste of resources. I'd cache the connection and save the overhead of creating a new database connection every time.
This depends very much on the situation. Do you connect over a WAN, is the MySQL server shared with other applications or will you be the only user (or at least will your application create most of the load?) If the database is mostly yours and it is near enough, there is little benefit in setting up and tearing down the connection daily.
This is what most applications do and this is what I'd recommend you do by default.
If you do not want to leave connections open overnight, you might able to configure your connection pool to open connections on demand and close them when they have been idle for a certain period of time -- say, 15 minutes. This would give you the benefit of being able to query the database whenever you wish and not having too many idle connections.
Related
If we use any connection pooling framework or Tomcat JDBC pool then how much it is costly to open and close the DB connection?
Is it a good practice to frequently open and close the DB connection whenever DB operations are required?
Or same connection can be carried across different methods for DB operations?
Jdbc Connection goes through the network and usually works over TCP/IP and optionally with SSL. You can read this post to find out why it is expensive.
You can use a single connection across multiple methods for different db operations because for each DB operations you would need to create a Statement to execute.
Connection pooling avoids the overhead of creating Connections during a request and should be used whenever possible. Hikari is one of the fastest.
The answer is - its almost always recommended to re-use DB Connections. Thats the whole reason why Connection Pools exist. Not only for the performance, but also for the DB stability. For instance, if you don't limit the number of connections and mistakenly open 100s of DB connections, the DB might go down. Also lets say if DB connections don't get closed due to some reason (Out of Memory error / shut down / unhandled exception etc), you would have a bigger issue. Not only would this affect your application but it could also drag down other services using the common DB. Connection pool would contain such catastrophes.
What people don't realize that behind the simple ORM API there are often 100s of raw SQLs. Imagine running these sqls independent of connection pools - we are talking about a very large overhead.
I couldn't fathom running a commercial DB application without using Connection Pools.
Some good resources on this topic:
https://www.cockroachlabs.com/blog/what-is-connection-pooling/
https://stackoverflow.blog/2020/10/14/improve-database-performance-with-connection-pooling/
Whether the maintenance (opening, closing, testing) of the database connections in a DBConnection Pool affects the working performance of the application depends on the implementation of the pool and to some extent on the underlying hardware.
A pool can be implemented to run in its own thread, or to initialise all connections during startup (of the container), or both. If the hardware provides enough cores, the working thread (the "business payload") will not be affected by the activities of the pool at all.
Other connection pools are implemented to create a new connection only on demand (a connection is requested, but currently there is none available in the pool) and within the thread of the caller. In this case, the creation of that connection reduces the performance of the working thread – this time! It should not happen too often, otherwise your application needs too many connections and/or does not return them fast enough.
But whether you really need a Database Connection Pool at all depends from the kind of your application!
If we talk about a typical server application that is intended to run forever and to serve a permanently changing crowd of multiple clients at the same time, it will definitely benefit from a connection pool.
If we talk about a tool type application that starts, performs a more or less linear task in a defined amount of time, and terminates when done, then using a connection pool for the database connection(s) may cause more overhead than it provides advantages. For such an application it might be better to keep the connection open for the whole runtime.
Taking the RDBMS view, both does not make a difference: in both cases the connections are seen as open.
If you have performance as a key parameter then better to switch to the Hikari connection pool. If you are using spring-boot then by default Hikari connection pool is used and you do not need to add any dependency. The beautiful thing about the Hikari connection pool is its entire lifecycle is managed and you do not have to do anything.
Also, it is always recommended to close the connection and let it return to the connection pool so that other threads can use it, especially in multi-tenant environments. The best way to do this is using "try with resources" and that connection is always closed.
try(Connection con = datasource.getConnection()){
// your code here.
}
To create your data source you can pass the credentials and create your data source for example:
DataSource dataSource = DataSourceBuilder.create()
.driverClassName(JDBC_DRIVER)
.url(url)
.username(username)
.password(password)
.build();
Link: https://github.com/brettwooldridge/HikariCP
If you want to know the answer in your case, just write two implementations (one with a pool, one without) and benchmark the difference.
Exactly how costly it is, depends on so many factors that it is hard to tell without measuring
But in general, a pool will be more efficient.
The costly is always a definition of impact.
Consider, you have following environment.
A web application with assuming a UI-transaction (user click) and causes a thread on the webserver. This thread is coupled to one connection/thread on the database
10 connections per 60000ms / 1min or better to say 0.167 connections/s
10 connections per 1000ms / 1sec => 10 connections/s
10 connections per 100ms / 0.1sec => 100 connections/s
10 connections per 10ms / 0.01sec => 1000 connections/s
I have worked in even bigger environments.
And believe me the more you exceed the 100 conn/s by 10^x factors the more pain you will feel without having a clean connection pool.
The more connections you generate in 1 second the higher latency you generate and the higher impact is it for the database. And the more bandwidth you will eat for recreating over and over a new "water pipeline" for dropping a few drops of water from one side to the other side.
Now getting back, if you have to access a existing connection from a connection pool it is a matter of micros or few ms to access the database connection. So considering one, it is no real impact at all.
If you have a network in between, it will grow to probably x10¹ to x10² ms to create a new connection.
Considering now the impact on your webserver, that each user blocks a thread, memory and network connection it will impact also your webserver load. Typically you run into webserver (e.g. revProxy apache + tomcat, or tomcat alone) thread pools issues on high load environments, if the connections get exhausted or they need too long time (10¹, 10² millis) to create
Now considering also the database.
If you have open connection, each connection is typically mapped to a thread on a DB. So the DB can use thread based caches to make prepared statements and to reuse pre-calculated access plan to make the accesses to data on database very fast.
You may loose this option if you have to recreate the connection over and over again.
But as said, if you are in up to 10 connections per second you shall not face any bigger issue without a connection pool, except the first additional delay to access the DB.
If you get into higher levels, you will have to manage the resources better and to avoid any useless IO-delay like recreating the connection.
Experience hints:
it does not cost you anything to use a connection pool. If you have issues with the connection pool, in all my previous performance tuning projects it was a matter of bad configuration.
You can configure
a connection check to check the connection (use a real SQL to access a real db field). so on every new access the connection gets checked and if defective it gets kicked from the connection pool
you can define a lifetime of a connections, so that you get new connection after a defined time
=> all this together ensure that even if your admins are doing crap and do not inform you (killing connection / threads on DB) the pool gets quickly rebuilt and the impact stays very low. Read the docs of the connection pool.
Is one connection pool better as the other?
A clear no, it is only getting a matter if you get into high end, or into distributed environments/clusters or into cloud based environments. If you have one connection pool already and it is still maintained, stick to it and become a pro on your connection pool settings.
We are using H2 started as database server process and listening on standard TCP/IP port 9092.
Our application is deployed in a Tomcat using connection pooling. We do a purge during idle time which at the end results in closing all connections to H2. From time to time we observe errors when the application tries to open the connection to H2 again:
SCHEDULERSERVICE schedule: Exception: Database may be already in use: "Waited for database closing longer than 1 minute". Possible solutions: close all other connection(s); use the server mode [90020-199]
org.h2.jdbc.JdbcSQLNonTransientConnectionException: Database may be already in use: "Waited for database closing longer than 1 minute". Possible solutions: close all other connection(s); use the server mode [90020-199]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:617)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:427)
at org.h2.message.DbException.get(DbException.java:205)
at org.h2.message.DbException.get(DbException.java:181)
at org.h2.engine.Engine.openSession(Engine.java:209)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:178)
at org.h2.engine.Engine.createSession(Engine.java:161)
at org.h2.server.TcpServerThread.run(TcpServerThread.java:160)
at java.lang.Thread.run(Thread.java:748)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:617)
at org.h2.engine.SessionRemote.done(SessionRemote.java:607)
at org.h2.engine.SessionRemote.initTransfer(SessionRemote.java:143)
at org.h2.engine.SessionRemote.connectServer(SessionRemote.java:431)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:317)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:169)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:148)
at org.h2.Driver.connect(Driver.java:69)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
The problem occurs when the Tomcat connection pool closes all idle connection (unused) and one connection still in use is closed afterwards.
The next attempt to open a new connection fails, a retry is successfully after some wait time.
Under which circumstances does this exception happen?
What does the exception mean?
Are there any recommendations to follow to avoid the problem?
It sounds to me that H2 does a database close after the last connection has been closed.
When does the database close occure?
How can database closures been controlled?
Thx in advance
Thorsten
Embedded database in web applications needs careful handling of its lifecycle.
You can add a javax.servlet.ServletContextListener implementation (marked with #WebListener annotation or included into web.xml) and add explicit database shutdown to its contextDestroyed() methods.
You can force database shutdown here with connection.createStatement().execute("SHUTDOWN"). If your application needs to write something to database during unload, it should do it before that command.
Without the explicit shutdown H2 closes the database when all connections are closed, if some other behavior wasn't configured explicitly (with parameters in JDBC URL, for example). For example, DB_CLOSE_DELAY sets the additional delay, maybe your application uses that setting and therefore H2 doesn't close the database immediately, or application doesn't close all connections immediately.
Anyway, when you're trying to update the web application of the fly, Tomcat tries to initialize the new version before its old version is unloaded. If H2 is in classpath of the web application itself, the new version will be unable to connect to the database during short period of time when the new version is already online but the old version isn't unloaded yet.
If you don't like it, you can run the standalone H2 Server process and use remote connections to it in your web applications.
Another option is to move H2 to the classpath of Tomcat itself and configure the connection pool as resource in the server.xml, in that case it shouldn't be affected by the lifecycle of your applications.
In both these cases you shouldn't use the SHUTDOWN command.
UPDATED
With client-server connections to a remote server such exception means that server decided to close the database because there are no active connection. This operation can't be interrupted and reverted in the middle. On attempt to open a new connection to the same database during this process it waits at most for 1 minute for completion of this process to re-open the database again. This timeout is not configurable.
There are two possible solutions.
DB_CLOSE_DELAY setting can be used with some large value in seconds. When all connections are closed, database will stay online for the specified number of seconds. -1 also can be used to set an infinite timeout.
You can try to speed up the shutdown process, but you have to figure out what takes so much time by yourself. The file compaction procedure is limited to 200 milliseconds by default, it may take a longer time, but I think it shouldn't be that long. Maybe you have a lot of temporary objects or uncommitted data. Maybe you have a very high fragmentation of database file. It's hard to say what's going wrong without further investigation.
Let's say I am storing data of Person(id, country_id, name). And let's say user just sent the id and country_id and we send back the name.
Now I have one db and 2 webserver and each webserver keeps a connection pool (e.g. c3p0) of 20 connection.
That means db is maintaining 40 connections and each webserver is maintaining 20 connections.
Analyzing the above system we can see that we used connection pool because people say "creating db connection is expensive"
This all make sense
Now let's say I shard table data on country_id, so now there may be 200 db, also assuming our app is popular now and we need to have 50 webserver.
Now the above strategy of connection pooling fails as if each webserver is keeping 20 connections in the pool for each db.
that means each webserver will have 20*200 db = 4000 connection
and each db will have 50 web server *20 = 1000 connection.
This doesn't sound good, so I got the question that why use connection pooling what is the overhead of creating 1 connection per web request?
So I run a test where I saw that DriverManager.getConnection() takes a average of 20 ms on localhost.
20 ms extra per request is not a game killer
Question1: Is there any other downside of using 1 connection per web request ?
Question2: People all over internet say "db connection is expensive". What are the different expenses?
PS: I also see pinterest doing same https://medium.com/#Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f
Other than Connection creation & Connection close cycle being a time consuming task ( i.e. being costly ) , pooling is also done to control the number of simultaneous open connections to your database since there is a limit on number of simultaneous connections that a db server can handle. When you do , one connection per request , you loose that control and your application is always at risk of crashing at peak load.
Secondly, you would unnecessarily tie your web server capacity with your database capacity and target is also to treat db connection management not as a developer concern but an infrastructure concern. Would you like to give control to open a database connection for production application to developer as per his/her code?
In traditional monolithic application servers like Weblogic, JBoss, WebSphere etc, Its sys admin who will create a connection pool as per db serer capacity and pass on JNDI name to developers to use to.Developer's job is to only get connection using that JNDI.
Next comes if database is shared among various independent applications then pooling lets you know as what you are giving out to which application. Some apps might be more data intensive and some might not be that intensive.
Traditional problem of resource leak i.e when developers forget to cleanly close their connection is also taken care of with pooling.
All in all - idea behind pooling is to let developers be concerned only about using a connection and do their job and not being worried about opening and closing it. If a connection is not being used for X minutes, it will be returned to pool per configuration.
If you have a busy web site and every request to the database opens and closes a connection, you are dead in the water.
The 20ms you measured are for a localhost connection. I don't think that all your 50 web servers will be on localhost...
Apart from the time it takes to establish and close a database connection, it also uses resources on the database server. This is mostly the CPU, but there could also be contention on kernel data structures.
Also, if you allow several thousand connections, there is nothing that keeps them from all gettings busy at the same time, in which case your database server will be overloaded and unresponsive unless it has several thousand cores (and even then you'd be limited by lock contention).
Your solution is an external connection pool like pgBouncer.
We are having a problem with too many Oracle processes being created (over 2,000) when connections are limited to 1,100 (using C3P0)
Two questions:
What's the relationship between an Oracle process and a JDBC connection? Is one Oracle process created for each session? Is one created for every JDBC statement? No relationship at all?
Did you ever face this scenario, where you are creating more processes than JDBC connections?
Any comment would be really appreciated.
There is one session per connection. This sounds like you have a connection leak, somewhere you're opening a new connection and not closing properly. One possibility is that you open, use and close a connection inside a try block and are handling an exception in a catch, or returning early for someother reason. If so you need to make sure the connection close is done in finally or it may not happen, leaving the connection (and thus session) hanging. Opening two connections in the same scope without an explicit close in between can also do this.
I'm not familiar with C3PO so don't know how connections are handled, or where and how your 1100 limit is imposed; if it (or you) have a connection pool and the 1100 you refer to is the maximm pool size, then this doesn't sound like the issue as you'd hit the pool cap before the session cap.
You can look in v$session to confirm that all the sessions are coming from JDBC, and there isn't something else connecting.
Maybe you want to check if your server runs in dedicated or shared mode (you probably want to switch it to shared mode if you want to decrease the number of active processes).
You can check that by doing
select server from v$session
More information about process architecture
http://docs.oracle.com/cd/B19306_01/server.102/b14220/process.htm
Shared/Dedicated server mode
http://docs.oracle.com/cd/B10501_01/server.920/a96521/manproc.htm
I manage my connections by JDBC connection pool (BoneCP) and I always close the connection, the preparedStatement und the ResultSet.
But, when my programm is running for several days, the mysql-server gets slower and slower (for testing, I let my programm insert an entry every second). After 2 days, there were several seconds between the entries and that is why I think that the mysql server is getting slower and can handle the incomming transaction. Am I right?
The mysql server also uses much more of RAM and does not release the resources. So does anyone know, how I could find the error causing this behaviour? Thanks in advice!
Use the MySQL Workbench to detect open connections. It also gives you a host of options to see performance of your database server.
Also [I might be mistaken about this part of your question], when you say
I use connection pooling
why do you close the connection? Isn't that the opposite of the purpose of connection pooling?