mixing basic DataSource with connection pooling DataSource: when to call close()? - java

I'm in the process of adding connection pooling to our java app.
The app can work with different rdbmses, and both as a desktop app and as a headless webservice. The basic implementation works fine in desktop mode (rdbms = derby). When running as a webservice (rdbms = mysql), we see that connection pooling is needed to get good concurrent behaviour.
My initial approach was to use dependency injection to decide between a basic DataSource or a connection pooling DataSource at startup time.
The problem in this scenario is that I don't know when to call connection.close(). My understanding is that when using connection pooling, you should close() after each query so the connection object can be recycled. But the current non-pooling implementation tries to hang on to the Connection object as long as possible.
Is there a solution to this problem? Is there a way to recycle a basic connection object? What happens if I don't call connection.close() with a thread pooled connection object?
Or is it simply a bad design to mix these two connection strategies?

If I understand you correctly, essentially you are doing your own pooling implementation by holding on to the connection as long as possible. If this strategy is successful (that is the program is behaving as you describe it), then you have your own pool already. The only thing adding a pool will gain you is to improve the response time when you make a new connection (as you won't really be making one, you will be getting it from the pool), which is apparently not happening all that often.
So, I would cycle back to the assumption underlying this question: Is it in fact the case that your concurrent performance problems are related to database pooling? If you are using transactions in the MySQL and not the Derby, that could be a big cause of concurrency issues, as an example of a different potential cause.
To answer your question directly, use a database pool all the time. They are very little overhead, and of course change the code to release connections quickly (that is, when the request is done, not as long as the user has a screen open, for example) rather than holding on to them forever.

Related

How costly is opening and closing of a DB connection in Connection Pool?

If we use any connection pooling framework or Tomcat JDBC pool then how much it is costly to open and close the DB connection?
Is it a good practice to frequently open and close the DB connection whenever DB operations are required?
Or same connection can be carried across different methods for DB operations?
Jdbc Connection goes through the network and usually works over TCP/IP and optionally with SSL. You can read this post to find out why it is expensive.
You can use a single connection across multiple methods for different db operations because for each DB operations you would need to create a Statement to execute.
Connection pooling avoids the overhead of creating Connections during a request and should be used whenever possible. Hikari is one of the fastest.
The answer is - its almost always recommended to re-use DB Connections. Thats the whole reason why Connection Pools exist. Not only for the performance, but also for the DB stability. For instance, if you don't limit the number of connections and mistakenly open 100s of DB connections, the DB might go down. Also lets say if DB connections don't get closed due to some reason (Out of Memory error / shut down / unhandled exception etc), you would have a bigger issue. Not only would this affect your application but it could also drag down other services using the common DB. Connection pool would contain such catastrophes.
What people don't realize that behind the simple ORM API there are often 100s of raw SQLs. Imagine running these sqls independent of connection pools - we are talking about a very large overhead.
I couldn't fathom running a commercial DB application without using Connection Pools.
Some good resources on this topic:
https://www.cockroachlabs.com/blog/what-is-connection-pooling/
https://stackoverflow.blog/2020/10/14/improve-database-performance-with-connection-pooling/
Whether the maintenance (opening, closing, testing) of the database connections in a DBConnection Pool affects the working performance of the application depends on the implementation of the pool and to some extent on the underlying hardware.
A pool can be implemented to run in its own thread, or to initialise all connections during startup (of the container), or both. If the hardware provides enough cores, the working thread (the "business payload") will not be affected by the activities of the pool at all.
Other connection pools are implemented to create a new connection only on demand (a connection is requested, but currently there is none available in the pool) and within the thread of the caller. In this case, the creation of that connection reduces the performance of the working thread – this time! It should not happen too often, otherwise your application needs too many connections and/or does not return them fast enough.
But whether you really need a Database Connection Pool at all depends from the kind of your application!
If we talk about a typical server application that is intended to run forever and to serve a permanently changing crowd of multiple clients at the same time, it will definitely benefit from a connection pool.
If we talk about a tool type application that starts, performs a more or less linear task in a defined amount of time, and terminates when done, then using a connection pool for the database connection(s) may cause more overhead than it provides advantages. For such an application it might be better to keep the connection open for the whole runtime.
Taking the RDBMS view, both does not make a difference: in both cases the connections are seen as open.
If you have performance as a key parameter then better to switch to the Hikari connection pool. If you are using spring-boot then by default Hikari connection pool is used and you do not need to add any dependency. The beautiful thing about the Hikari connection pool is its entire lifecycle is managed and you do not have to do anything.
Also, it is always recommended to close the connection and let it return to the connection pool so that other threads can use it, especially in multi-tenant environments. The best way to do this is using "try with resources" and that connection is always closed.
try(Connection con = datasource.getConnection()){
// your code here.
}
To create your data source you can pass the credentials and create your data source for example:
DataSource dataSource = DataSourceBuilder.create()
.driverClassName(JDBC_DRIVER)
.url(url)
.username(username)
.password(password)
.build();
Link: https://github.com/brettwooldridge/HikariCP
If you want to know the answer in your case, just write two implementations (one with a pool, one without) and benchmark the difference.
Exactly how costly it is, depends on so many factors that it is hard to tell without measuring
But in general, a pool will be more efficient.
The costly is always a definition of impact.
Consider, you have following environment.
A web application with assuming a UI-transaction (user click) and causes a thread on the webserver. This thread is coupled to one connection/thread on the database
10 connections per 60000ms / 1min or better to say 0.167 connections/s
10 connections per 1000ms / 1sec => 10 connections/s
10 connections per 100ms / 0.1sec => 100 connections/s
10 connections per 10ms / 0.01sec => 1000 connections/s
I have worked in even bigger environments.
And believe me the more you exceed the 100 conn/s by 10^x factors the more pain you will feel without having a clean connection pool.
The more connections you generate in 1 second the higher latency you generate and the higher impact is it for the database. And the more bandwidth you will eat for recreating over and over a new "water pipeline" for dropping a few drops of water from one side to the other side.
Now getting back, if you have to access a existing connection from a connection pool it is a matter of micros or few ms to access the database connection. So considering one, it is no real impact at all.
If you have a network in between, it will grow to probably x10¹ to x10² ms to create a new connection.
Considering now the impact on your webserver, that each user blocks a thread, memory and network connection it will impact also your webserver load. Typically you run into webserver (e.g. revProxy apache + tomcat, or tomcat alone) thread pools issues on high load environments, if the connections get exhausted or they need too long time (10¹, 10² millis) to create
Now considering also the database.
If you have open connection, each connection is typically mapped to a thread on a DB. So the DB can use thread based caches to make prepared statements and to reuse pre-calculated access plan to make the accesses to data on database very fast.
You may loose this option if you have to recreate the connection over and over again.
But as said, if you are in up to 10 connections per second you shall not face any bigger issue without a connection pool, except the first additional delay to access the DB.
If you get into higher levels, you will have to manage the resources better and to avoid any useless IO-delay like recreating the connection.
Experience hints:
it does not cost you anything to use a connection pool. If you have issues with the connection pool, in all my previous performance tuning projects it was a matter of bad configuration.
You can configure
a connection check to check the connection (use a real SQL to access a real db field). so on every new access the connection gets checked and if defective it gets kicked from the connection pool
you can define a lifetime of a connections, so that you get new connection after a defined time
=> all this together ensure that even if your admins are doing crap and do not inform you (killing connection / threads on DB) the pool gets quickly rebuilt and the impact stays very low. Read the docs of the connection pool.
Is one connection pool better as the other?
A clear no, it is only getting a matter if you get into high end, or into distributed environments/clusters or into cloud based environments. If you have one connection pool already and it is still maintained, stick to it and become a pro on your connection pool settings.

Opening a new database connection for every client that connects to the server application?

I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you
Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)
I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.
Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.
Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.

Do multiple open/close connections to SQl via Java effect performance?

I am working on a basic code to access a query in a database for an alert system. I understand that the database at work (Oracle based) automatically creates a pool of connections and I wanted to know if I connect, execute the query and close the connection every 5-15seconds would the performance drop dramatically and was it the correct way to do it or would I have to leave the connection open until the infinite loop is closed?
I have someone at work telling me that closing the connection would result in the database having to lookup a query each time from scratch but if I leave it open the query will be in a cache somewhere on the database.
ResultSet rs1 = MyStatement.executeQuery(QUERY_1);
while (rs1.next()){
// do something
}
rs1.close();
rs1 = MyStatement.executeQuery(QUERY_2);
while (rs1.next()){
// do something
}
rs1.close();
Oracle can't pool connections, this has to be done on the client side
You aren't closing any connections in the code example you posted
Opening a connection is a rather slow process, so use either a fixed set of connections (typically the set has size one for things like fat client applications) or a connection pool, that pools open connections for reuse. See this question for what might be viable options for connection pools: Connection pooling options with JDBC: DBCP vs C3P0 If you are running on an application server it will probably already provide a connection pooling solution. check the documentation.
closing stuff like the resultset in your code or a connection (not in your code) should be done in a finally block. Doing the closing (and the neccessary exception handling correct is actually rather difficult. Consider using something like the JdbcTemplate of Spring (http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/jdbc.html)
If you aren't using stuff like VPN (virtual private database) Oracle will cache execution plans of statements no matter if they come from the same connection or not. Also data accessed lately is kept in memory to make queries accessing similar data fast. So the performance decrease is really coming from the latency of establishing the connection. There is some overhead on the DB side for creating connections which in theory could affect the performance of the complete database, but it is likely to be irrelevant.
Every time a client connects to the database that connection has to be authenticated. This is obviously an overhead. Furthermore the database listener can only process a limited number of connections at the same time; if the number of simultaneous connection attempts exceeds that threshold they get put into a queue. That is also an overhead.
So the general answer is, yes, opening and closing connections is an expensive operation.
It is always beneficial to be using DB Connection pools especially if you are using a Java EE app server. Also using the connection pool which is out of the box in the Java EE app server is optimal as it will be optimized and performance tested by the App server development team.

Multithreaded access to JDBC connection for only select case

If I assign the same connection to multiple threads which only executes select statements(no CRUD), is this scenario thread safe ? Each thread creates its own preparedstatement from the same connection and executes it.
Although it seems thread safe since each thread works its own instance of resultsets and prepared statement objects, they are still using the same connection to the database. I am specifically interested in Oracle JDBC driver behavior.
Thank you in advance.
Oracle docs does not prohibit multi-threaded access. Instead they allow it with the statement "If multiple threads must share a connection, use a disciplined begin-using/end-using protocol."
I think this caution may be due to problematic cases when one thread insert/updates and commit/rollsback while the other thread also inserts/updates its own data and issues a commit or rollback
But, in the original poster's case,all threads only issue select statements... still the resultset.next() will travel to database to retrieve the rows using the same tcp/ip stream to database...which is where the confusion begins...
The behaviour is undefined for 'select-only' case.
I Googled artound and found this:
"Oracle® Database JDBC Developer's Guide and Reference"
JDBC and Multithreading
"The Oracle JDBC drivers provide full support for, and are highly optimized for, applications that use Java multithreading. Controlled serial access to a connection, such as that provided by connection caching, is both necessary and encouraged. However, Oracle strongly discourages sharing a database connection among multiple threads. Avoid allowing multiple threads to access a connection simultaneously. If multiple threads must share a connection, use a disciplined begin-using/end-using protocol."
Also bear in mind that when you do update the database, you would need to read from your update connection for transaction isolation to work properly.
As far as I'm aware, getting a connection from a pool is a relatively cheap process.
--EDIT
If you are worried about the number of connections to the server have a look at "Oracle connection mananger"
"Oracle Connection Manager enables large numbers of users to connect to a single server by acting as a connection concentrator to "funnel" multiple client database sessions across a single network connection."
I'm fairly certain that the jdbc spec requires Connections to be thread-safe (but not Statements/ResultSets), so this will work. However, some jdbc driver implementations are less efficient with shared Connections, so you should definitely test whether you are getting decent performance. if not, you may need to switch to multiple Connections.

Reader in .NET and ResultSet in Java

When I read about the reader functionality of .NET then i came to know that we can open only one reader on one db connection. So it is compulsory for me to use connection pools so my application works without crashing and in this case I also can't apply singleton pattern for my connection. But now when I start to develop the same application in Java then I find that I can do many things with the same connection as well as I can also apply singleton pattern on connection. I just need to create statement object every time.
Now my questions are:
Why does .NET have this limitation on db connection object?
If Java allows this then why should we use connection pools?
No, you can't do arbitrarily many things at once with a single Connection in Java. You might get away with some things, but in general a Connection in Java should only be used for a single thing at a time (by a single thread). That's even more important if transactions come into play, because they are bound to the Connection.
From the sound of it it looks like .NET acts exactly the same, but you were lucky with what you tried in Java.
The correct approach in Java is to use a connection pool and every time you do some business operation, grab a connection from the pool, do you database operations with that connection, then return the connection to the pool (by simply calling .close(), the nice thing is that you don't really need to care if the connection is from a pool or gets re-created each time you request one).
I suspect that in .NET it's very similar.
Because the actual connection to the database is limited in that way. The connection object in Java just opens more database connections when needed. There is a setting that makes the connection object in .NET do the same thing, but I reccomend that you use the default mode, as it gives you better control over the resources.
Because the connection object in Java will get the database connections from the pool. If it would open a new connection to the database, you would see quite a performance difference whenever you use the connection object for more than one reader.

Categories