I have a application which requires Connection pooling, as the server has several clients communicating to it at same time, which may be around 10k. And when i limit maxActive =200, updation of the database slows.
Application programming i have did in Java.
Connection pooling i am doing with the help of Tomcat Context.xml.
Database i am using SQL server 2005.
Please help me in doing the correct way of pooling, such that my application does not slows down.
There is no concept of correct way of pooling, you have to find out by inspecting with how many active connections you are able to get maximum throughput.
1) Check any inactive connections which are not closed.
2) Do some analysis to find out the root causes or when it's eating up lot of connections.
Ideally for any project the most common parameters I have seen is 200 - 300 active connections, if it exceeds more than that it's more likely an enterprise application for which you have to rely on infrastructure rather than programmatically.
After a certain threshold you should look at clustering of databases as you can tweak it to a certain extent once you have identified there are no places in program to optimize.
Related
If we use any connection pooling framework or Tomcat JDBC pool then how much it is costly to open and close the DB connection?
Is it a good practice to frequently open and close the DB connection whenever DB operations are required?
Or same connection can be carried across different methods for DB operations?
Jdbc Connection goes through the network and usually works over TCP/IP and optionally with SSL. You can read this post to find out why it is expensive.
You can use a single connection across multiple methods for different db operations because for each DB operations you would need to create a Statement to execute.
Connection pooling avoids the overhead of creating Connections during a request and should be used whenever possible. Hikari is one of the fastest.
The answer is - its almost always recommended to re-use DB Connections. Thats the whole reason why Connection Pools exist. Not only for the performance, but also for the DB stability. For instance, if you don't limit the number of connections and mistakenly open 100s of DB connections, the DB might go down. Also lets say if DB connections don't get closed due to some reason (Out of Memory error / shut down / unhandled exception etc), you would have a bigger issue. Not only would this affect your application but it could also drag down other services using the common DB. Connection pool would contain such catastrophes.
What people don't realize that behind the simple ORM API there are often 100s of raw SQLs. Imagine running these sqls independent of connection pools - we are talking about a very large overhead.
I couldn't fathom running a commercial DB application without using Connection Pools.
Some good resources on this topic:
https://www.cockroachlabs.com/blog/what-is-connection-pooling/
https://stackoverflow.blog/2020/10/14/improve-database-performance-with-connection-pooling/
Whether the maintenance (opening, closing, testing) of the database connections in a DBConnection Pool affects the working performance of the application depends on the implementation of the pool and to some extent on the underlying hardware.
A pool can be implemented to run in its own thread, or to initialise all connections during startup (of the container), or both. If the hardware provides enough cores, the working thread (the "business payload") will not be affected by the activities of the pool at all.
Other connection pools are implemented to create a new connection only on demand (a connection is requested, but currently there is none available in the pool) and within the thread of the caller. In this case, the creation of that connection reduces the performance of the working thread – this time! It should not happen too often, otherwise your application needs too many connections and/or does not return them fast enough.
But whether you really need a Database Connection Pool at all depends from the kind of your application!
If we talk about a typical server application that is intended to run forever and to serve a permanently changing crowd of multiple clients at the same time, it will definitely benefit from a connection pool.
If we talk about a tool type application that starts, performs a more or less linear task in a defined amount of time, and terminates when done, then using a connection pool for the database connection(s) may cause more overhead than it provides advantages. For such an application it might be better to keep the connection open for the whole runtime.
Taking the RDBMS view, both does not make a difference: in both cases the connections are seen as open.
If you have performance as a key parameter then better to switch to the Hikari connection pool. If you are using spring-boot then by default Hikari connection pool is used and you do not need to add any dependency. The beautiful thing about the Hikari connection pool is its entire lifecycle is managed and you do not have to do anything.
Also, it is always recommended to close the connection and let it return to the connection pool so that other threads can use it, especially in multi-tenant environments. The best way to do this is using "try with resources" and that connection is always closed.
try(Connection con = datasource.getConnection()){
// your code here.
}
To create your data source you can pass the credentials and create your data source for example:
DataSource dataSource = DataSourceBuilder.create()
.driverClassName(JDBC_DRIVER)
.url(url)
.username(username)
.password(password)
.build();
Link: https://github.com/brettwooldridge/HikariCP
If you want to know the answer in your case, just write two implementations (one with a pool, one without) and benchmark the difference.
Exactly how costly it is, depends on so many factors that it is hard to tell without measuring
But in general, a pool will be more efficient.
The costly is always a definition of impact.
Consider, you have following environment.
A web application with assuming a UI-transaction (user click) and causes a thread on the webserver. This thread is coupled to one connection/thread on the database
10 connections per 60000ms / 1min or better to say 0.167 connections/s
10 connections per 1000ms / 1sec => 10 connections/s
10 connections per 100ms / 0.1sec => 100 connections/s
10 connections per 10ms / 0.01sec => 1000 connections/s
I have worked in even bigger environments.
And believe me the more you exceed the 100 conn/s by 10^x factors the more pain you will feel without having a clean connection pool.
The more connections you generate in 1 second the higher latency you generate and the higher impact is it for the database. And the more bandwidth you will eat for recreating over and over a new "water pipeline" for dropping a few drops of water from one side to the other side.
Now getting back, if you have to access a existing connection from a connection pool it is a matter of micros or few ms to access the database connection. So considering one, it is no real impact at all.
If you have a network in between, it will grow to probably x10¹ to x10² ms to create a new connection.
Considering now the impact on your webserver, that each user blocks a thread, memory and network connection it will impact also your webserver load. Typically you run into webserver (e.g. revProxy apache + tomcat, or tomcat alone) thread pools issues on high load environments, if the connections get exhausted or they need too long time (10¹, 10² millis) to create
Now considering also the database.
If you have open connection, each connection is typically mapped to a thread on a DB. So the DB can use thread based caches to make prepared statements and to reuse pre-calculated access plan to make the accesses to data on database very fast.
You may loose this option if you have to recreate the connection over and over again.
But as said, if you are in up to 10 connections per second you shall not face any bigger issue without a connection pool, except the first additional delay to access the DB.
If you get into higher levels, you will have to manage the resources better and to avoid any useless IO-delay like recreating the connection.
Experience hints:
it does not cost you anything to use a connection pool. If you have issues with the connection pool, in all my previous performance tuning projects it was a matter of bad configuration.
You can configure
a connection check to check the connection (use a real SQL to access a real db field). so on every new access the connection gets checked and if defective it gets kicked from the connection pool
you can define a lifetime of a connections, so that you get new connection after a defined time
=> all this together ensure that even if your admins are doing crap and do not inform you (killing connection / threads on DB) the pool gets quickly rebuilt and the impact stays very low. Read the docs of the connection pool.
Is one connection pool better as the other?
A clear no, it is only getting a matter if you get into high end, or into distributed environments/clusters or into cloud based environments. If you have one connection pool already and it is still maintained, stick to it and become a pro on your connection pool settings.
Let's say I am storing data of Person(id, country_id, name). And let's say user just sent the id and country_id and we send back the name.
Now I have one db and 2 webserver and each webserver keeps a connection pool (e.g. c3p0) of 20 connection.
That means db is maintaining 40 connections and each webserver is maintaining 20 connections.
Analyzing the above system we can see that we used connection pool because people say "creating db connection is expensive"
This all make sense
Now let's say I shard table data on country_id, so now there may be 200 db, also assuming our app is popular now and we need to have 50 webserver.
Now the above strategy of connection pooling fails as if each webserver is keeping 20 connections in the pool for each db.
that means each webserver will have 20*200 db = 4000 connection
and each db will have 50 web server *20 = 1000 connection.
This doesn't sound good, so I got the question that why use connection pooling what is the overhead of creating 1 connection per web request?
So I run a test where I saw that DriverManager.getConnection() takes a average of 20 ms on localhost.
20 ms extra per request is not a game killer
Question1: Is there any other downside of using 1 connection per web request ?
Question2: People all over internet say "db connection is expensive". What are the different expenses?
PS: I also see pinterest doing same https://medium.com/#Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f
Other than Connection creation & Connection close cycle being a time consuming task ( i.e. being costly ) , pooling is also done to control the number of simultaneous open connections to your database since there is a limit on number of simultaneous connections that a db server can handle. When you do , one connection per request , you loose that control and your application is always at risk of crashing at peak load.
Secondly, you would unnecessarily tie your web server capacity with your database capacity and target is also to treat db connection management not as a developer concern but an infrastructure concern. Would you like to give control to open a database connection for production application to developer as per his/her code?
In traditional monolithic application servers like Weblogic, JBoss, WebSphere etc, Its sys admin who will create a connection pool as per db serer capacity and pass on JNDI name to developers to use to.Developer's job is to only get connection using that JNDI.
Next comes if database is shared among various independent applications then pooling lets you know as what you are giving out to which application. Some apps might be more data intensive and some might not be that intensive.
Traditional problem of resource leak i.e when developers forget to cleanly close their connection is also taken care of with pooling.
All in all - idea behind pooling is to let developers be concerned only about using a connection and do their job and not being worried about opening and closing it. If a connection is not being used for X minutes, it will be returned to pool per configuration.
If you have a busy web site and every request to the database opens and closes a connection, you are dead in the water.
The 20ms you measured are for a localhost connection. I don't think that all your 50 web servers will be on localhost...
Apart from the time it takes to establish and close a database connection, it also uses resources on the database server. This is mostly the CPU, but there could also be contention on kernel data structures.
Also, if you allow several thousand connections, there is nothing that keeps them from all gettings busy at the same time, in which case your database server will be overloaded and unresponsive unless it has several thousand cores (and even then you'd be limited by lock contention).
Your solution is an external connection pool like pgBouncer.
I have a Java app with more than 100 servers. Currently each server opens up connections to 7 database schemas in a relational databases (logging, this, that, the other). All schemas connect to the same DB cluster, but its all effectively one database instance.
The server managed connection pools open up a handfull of connections (1 - 5) on each database schema per instance, then double that on a redundant pool. So each server open up a minimum of 30 database connections and can grow to a maximum of several hundred per server, and again, there are more than 100 servers.
All in all, the minimum number of database connections used are 3000, and this can grow to ludricous.
Obviously this is just plain wrong. The database cluster can only effeciently handle X concurrent requests and any number of requests > X introduces unnecessary contention and slows the whole lot down. (X is unknown, but it is way smaller than the 3000 minimum concurrent connections).
I want to lower the total connections used by implementing the following strategy:
Connect to one schema only (Application-X), have 6 connections per pool maximum.
Write a layer above the pool that will switch to the schema I want. The getConnection(forSchema) function will take a parameter for the target schema (eg. logging), will get a connection that could last be pointing to any schema, and issue a schema switch SQL statement (set search_path to 'target_schema').
Please do not comment on whether this approach is right or wrong. Because 'it depends' needs to be considered, such comments will not add value.
My question is whether there is a DB pool implementation out there that already does this - allows me to have one set of connections and automatically places me at the right schema, or better yet - tracks whether a pooled connection is available for your target schema before making a decision to go ahead and switch the schema (saves a DB round trip).
I would also like to hear from anyone else who has a similar issue (real-world experience) if you solved it in a different way.
Having learned the hard way myself, the best way to stabilize the number of database connections between a web application and a database is to put a reverse proxy in front of the web application.
Here's why it works:
A slow part of a web request can be returning the data to the client. If there's a lot of data or the user is on a slow connection, the connection can remain open to the web server where the data dribbles out to the client slowly. Meanwhile, the application server continues to hold a database connection open to the backend server. While the database connection may only needed for a fraction of the transaction, it's tied up until client disconnects from the application server.
Now, consider what happens when a reverse proxy is added in front of the app server. When the app server has a response prepared, it can quickly reply back to the reverse proxy in front of it, and free up the database connection behind it. The reverse proxy can then handle slowly dribbling out responses to uses, without keeping a related database connection tied up.
Before I made this architectural change, there were a number of traffic spikes that resulted in death spirals: the database handle usage would spike to exhaustion, and things would go downhill from there.
After the change, the number of the database handles required was both far less and far more stable.
If a reverse proxy is not already part of your architecture, I recommend it as a first step to control the number of database connections you require.
Not being a database administrator (even less of a MS database admin :), I have received complaints that a piece of code I've written leaves "sleeping connections" behind in the database.
My code is Java, and uses Apache Commons DBCP for connection pooling. I also use Spring's JdbcTemplate to manage the connection's state, so not closing the connections is out of the question (since the library is doing that for me).
My main question is, from a DBA's point of view, can these connections cause outages or poor performance?
This question is related, currently the settings were left as they were there (infinite active/idle connections in the pool).
Really, to answer your question, an idea of the number of these "sleeping" connections would be good. It also matters whether this server's primary purpose is serving your application, or whether your application is one of many. Also relevant is whether there are multiple instances of your app (eg on multiple web servers), or whether it's just the one.
In my experience, there is little to no overhead associated with idle connections on modern hardware, as long as you don't reach into the hundreds. That said, looking at your previous question, allowing the pool to spawn an unbounded number of connections does not sound wise - I'd recommend setting a cap, even if you set it at a hundreds.
I can tell you from at least one painful situation with leaking connection pools, that having a thousand open connections to a single SQL server is expensive, even if they're idle. I seem to recall the server started losing it (failing to accept new connections, simple queries timing out, etc) when nearing the 2,000-connection range (this was SQL 2000 on mid-range hardware a few years ago).
Hope this helps!
Apache DBCP has maxIdle connections settings to 8 and maxActive settings to 8. This means that 8 number of active connections and 8 numbers of idle connections can exist in the pool. DBCP reuses the connections when the call for connection is made. You can set this according to your requirement. You can refer to the document below:
DBCP Configuration - Apache
Which of these approaches is better: connection pooling or per-thread JDBC connections?
Connection Pooling for sure and almost always.
Creating new database connection is very costly for performance. And different DB engines (depending on licensing or just settings) has different maximum number of connections (sometimes it even 1, usually not more then 50).
The only reason to use Per-Thread connections is if you know that there are certain small number of persistent threads (10 for example). I can't imagine this situation in real world.
Definitely connection pooling. Absolutely no reason to create a new connection for each thread. But, it might make sense to use the same connection for an entire HTTP request for example (especially if you need transactions).
You can easily configure the connection pooling framework to have a min and max number of connections depending on the database that you are using. But before going too high with the max number of connections, try using caching if you have performance issues.
For web apps, connection pooling is generally the right answer for reasons other have already offered.
For most desktop apps running against a database, a connection pool is no good since you need only one connection and having multiple connections consumes resources on the DB server. (Multiply that by the number of users.) Here the choice is between a single persistent connection or else just create the connection on demand. The first leads to faster queries since you don't have the overhead of building up and tearing down the connection. The second is slower but also less demanding on the DB server.