Is there currently any JDBC connection Pool implementation, which can handle requests with different users. I.e. query should be amended by user from which it is going to be sent - and pool finds the suitable idle connection for this specific user, or creates new one (possibly discarding some idle connection created some time ago for other user).
Probably the question was already asked in some form, but it seems without positive results... At least I couldn't find them (see here multi-user jdbc connection pool ( Spring-jdbc ))
More thorough explanation:
The matter is that we have a java rest-service, which serves for sending requests to database (requests themselves are coming from various small scripts). The service is needed as a wrapper around connection pool (so the scripts are executed faster). Another requirement is that number of connections being executed should not exceed certain limit (few dozens) - so the service ensures such behavior by processing queries asynchronously.
However recently it appeared there are security limitations in the organization, and few tables/databases in this source should be accessed with different user.
Straightforward solution is to have two services (or single service with two pools). However this is hard to make efficient, regarding total connection limit. E.g. we can create two pools with half-limit, but then it will process data in half speed maximum for given user (even if other user is idle now).
So I'm searching for ready pool implementation or hints on makeshift implementation. This doesn't look extremely hard, but reinventing the wheel is not nice if some handy data structure exists.
So, to conclude, I do understand that using single source with different users is not the best idea - but it seems we are limited by specific infrastructure requirements.
Related
I've seen two ways to deal with database connections:
1) Connection pool
2) Bind connection to a thread (when we have fixed and constant threads count)
But I don't undestand what is the purpose of using #2. What are the advantagase of the second behaviour over the first one?
If you're working with a single thread or a very small set of threads (that need database functionality), binding a connection to a thread would act like a poor man's connection pool. Instead of checking out a connection from the pool every time you use it, you would just use the single connection bound to the thread. This would allow for quick execution of database queries, even with code that hasn't been very well designed.
However in many cases you're not working with a single thread or a small set of threads. As soon as you're developing an application with even dozens of simultaneous users, you're better off working with a connection pool as it will become impossible to dedicate a connection to every thread (see next paragraph).
Some people also have the misunderstanding that a connection pool can and should have a lot of connections (100 or more), even though it's often more advantageous to have fewer. Since all of the connections use the database's resources, the effect is similar to having a store with a single cash register. It's not more efficient to have 10 doors to the store instead of 1, since it will just fill up with customers but the payments won't happen any faster.
Due to some previous questions that I've had answered about the synchronous nature of MySQL I'm starting to question the reason people use Connection pools, and if in my scenario I should move to a pool.
Currently my application keeps a single connection active. There's only a single connection, statement, and result set being used in my application that's recycled. All of my database tasks are placed in a queue and executed back to back on a seperate thread. One thread for database queries, One connection for database access. In the event that the connection has an issue, it will dispose of the connection and create a new one.
From my understanding regardless of how many queries are sent to MySQL to be processed they will all be processed synchornously in the order they are received. It does not matter if these queries are coming from a single source or multiple, they will be executed in the order received.
With this being said, what's the point in having multiple connections and threads to smash queries into the database's processing queue, when regardless it's going to process them one by one anyway. A query is not going to execute until the query before it has completed processing, and like-wise in my scenario where I'm not using a pool, the next query is not going to be executed until the previous query has completed processing.
Now you may say:
The amount of time spent on processing the results provided by the MySQL query will increase the amount of time between queries being executed.
That's obviously correct, which is why I have a worker thread that handles the results of a query. When a query is completed, I convert the results into Map<> format and release the statement/resultset from memory and start processing the next query. The Map<> is sent off to a separate Worker thread for processing, so it doesn't congest the query execution thread.
Can anyone tell me if the way I'm doing things is alright, and if I should take the time to move to a pool of connections rather than a persistent connection. The most important thing is why. I'm starting this thread strictly for informational purposes.
EDIT: 4/29/2016
I would like to add that I know what a connection pool is, however I'm more curious about the benefits of using a pool over a single persistent connection when the table locks out requests from all connections during query processing to begin with.
Just trying this StackOverflow thing out but,
In every connection to a database, most of the time, it's idle. When you execute a query in the connection to INSERT or UPDATE a table, it locks the table, preventing concurrent edits. While this is good and all, preventing data overwriting or corruption, this means that no other connections may make edits while the first connection/query is still running.
However, starting a new connection takes time, and in larger infrastructures trying to skim and skin off all excess time wastage, this is not good. As such, connection pools are a whole group of connections left in the idle state, ready for the next query.
Lastly, if you are running a small project, there's usually no reason for a connection pool but if you are running a large site with UPDATEs and INSERTs flying around every millisecond, a connection pool reduces overhead time.
Slightly related answer:
a pool can do additional "connection health checks" (by examining SQL exception codes)
and refresh connections to reduce memory usage (see the note on "maxLifeTime" in the answer).
But all those things might not outweigh the simpler approach using one connection.
Another factor to consider is (blocking) network I/O times. Consider this (rough) scenario:
client prepares query --> client sends data over the network
--> server receives data from the network --> server executes query, prepares results
--> server sends data over the network --> client receives data from the network
--> client prepares resultset
If the database is local (on the same machine as the client) then network times are barely noticeable.
But if the database is remote, network I/O times can become measurable and impact performance.
Assuming the isolation level is at "read committed", running select-statements in parallel could become faster.
In my experience, using 4 connections at the same time instead of 1 generally improves performance (or throughput).
This does depend on your specific situation: if MySQL is indeed just mostly waiting on locks to get released,
adding additional connections will not do much in terms of speed.
And likewise, if the client is single-threaded, the client may not actually perceive any noticeable speed improvements.
This should be easy enough to test though: compare execution times for one program with 1 thread using 1 connection to execute an X-amount of select-queries
(i.e. re-use your current program) with another program using 4 threads with each thread using 1 separate connection
to execute the same X-amount of select-queries divided by the 4 threads (or just run the first program 4 times in parallel).
One note on the connection pool (like HikariCP): the pool must ensure no transaction
remains open when a connection is returned to the pool and this could mean a "rollback" is send each time a connection is returned to the pool
(closed) when auto-commit is off and no "commit" or "rollback" was send previously.
This in turn can increase network I/O times instead of reducing it. So make sure to test with either
auto-commit on or make sure to always send a commit or rollback after your query or set of queries is done.
Connection pool and persistent connection are not the same thing.
One is the limit of the number of SQL connections, the other is single Pipe issues.
The problem is generally the time taken to transfer the SQL output to the server than the query execution time. So if you open two cli SQL clients and fire two queries, one with a large output and one with a small output (in that sequence), the smaller one finishes first while the larger one is still scrolling its output.
The point here is that multiple connection does solve problems for cases like the above.
When you have multiple front end requests asking for queries, you may prefer persistent connections because it gives you the benefit of multiplex over different connections (large versus small outputs) and prevents the overhead of session setup/teardown.
Connection pool APIs have inbuilt error checks and handling but most APIs still expect you to manually declare if you want a persistent connection or not.
So in effect there are 3 variables, pool, persistence and config parameters via the API. One has to make a mix and match of pool size, persistence and number of connections to suite one's environment
I have a List which contains a lot of objects.
The problem is that i have to process these objects (process includes cloning, deep copy, and making DB calls, running business logic etc etc.
Doing this in a normal fashion, first come first serve is really time consuming and in a web application , this generally results in transaction timeouts at the server side (as this processing is anync from client perspective).
How do i process those objects so as to take minimal time and not overload the DB.
I'm using java 7 on server environment.
I'm already using a messaging solution , rabbitmq, which gets me the item and its quantity. problem occurs when i try to deep copy items to mimic real items (business logic every item should be uniquely processed) and save them to DB.
After some discussions, the viable solution is using a ABQ (array blocking queues) which is processed by a pool of threads.
Following are the thought out benefits:
1) we wont have to manage the 3rd party queues created e.g. rabbitmq
2) At any point in time the blocking queue wont have all the items to be processed as the consumer threads will be simultaneously processing them, so it will leave lesser memory footprint.
#cody123 i'm using spring batch for retry mechanisms in this case.
After another round of profiling i found that the bottle neck was the DB connection pool having low number of max connections.
I deduced this by running the same transaction without db thread pool and it went perfectly well and completed without any exception.
However combining the previous approach i.e. managing an ABQ and light commits with HA DB will be the best solution.
Trying to figure out how to manage/use long-living DB connections. I have too little experience of this kind, as I have used DB only with small systems (up to some 150 concurrent users, each one had its own DB user/pass, so there were up to 150 long-living DB connections at any time) or web pages (each page request has its own DB connection that lasts for less than a second, so number of concurrent DB conncetions isn't huge).
This time there will be a Java server and Flash client. Java connects to PostgreSQL. Connections are expected to be long-living, i.e., they're expected to start when Flash client connects to Java server and to end when Flash client disconnects. Would it be better to share single connection between all users (clients) or to make private connection for every client? Or some other solution would be better?
*) Single/shared connection:
(+) pros
only one DB connection for whole system
(-) cons:
transactions can't be used (e.g., "user1.startTransaction(); user1.updateBooks(); user2.updateBooks(); user1.rollback();" to a single shared connection would rollback changes that are done by user2)
long queries of one user might affect other users (not sure about this, though)
*) Private connections:
(+) pros
no problems with transactions :)
(-) cons:
huge number of concurrent connections might be required, i.e., if there are 10000 users online, 10000 DB connections are required, which seems to be too high number :) I don't know anything about expected number of users though, as we are still in process of researching and planning.
One solution would be to introduce timeouts, i.e., if DB connection is not used for 15/60/900(?) seconds, it gets disconnected. When user again needs a DB, it gets reconnected. This seems to be a good solution for me, but I would like to know what might be the reasonable limits for this, e.g., what might be the max number of concurrent DB connections, what timeout should be used etc.
Another solution would be to group queries into two "types" - one type that can safely use single shared long-living connection (e.g., "update user set last_visit = now() where id = :user_id"), and another type that needs a private short-living connection (e.g., something that can potentially do some heavy work or use transactions). This solution does not seem to be appealing for me, though if that's the way it should be done, I could try to do this...
So... What do other developers do in such cases? Are there any other reasonable solutions?
I don't use long-lived connections. I use a connection pool to manage connections, and I keep them only for as long as it takes to perform an operation: get the connection, perform my SQL operation, return the connection to the pool. It's much more scalable and doesn't suffer from transaction problems.
Let the container manage the pool for you - that's what it's for.
By using single connection, you also get very low performance because the database server will only allocate one connection for you.
You definitely need a connection pool. If you app runs inside an application server, use the container pool. Or you can use a connection pool library like c3p0.
I have a Java program consisting of about 15 methods. And, these methods get invoked very frequently during the exeuction of the program. At the moment, I am creating a new connection in every method and invoking statements on them (Database is setup on another machine on the network).
What I would like to know is: Should I create only one connection in the main method and pass it as an argument to all the methods that require a connection object since it would significantly reduce the number of connections object in the program, instead of creating and closing connections very frequently in every method.
I suspect I am not using the resources very efficiently with the current design, and there is a lot of scope for improvement, considering that this program might grow a lot in the future.
Yes, you should consider re-using connections rather than creating a new one each time. The usual procedure is:
make some guess as to how many simultaneous connections your database can sensibly handle (e.g. start with 2 or 3 per CPU on the database machine until you find out that this is too few or too many-- it'll tend to depend on how disk-bound your queries are)
create a pool of this many connections: essentially a class that you can ask for "the next free connection" at the beginning of each method and then "pass back" to the pool at the end of each method
your getFreeConnection() method needs to return a free connection if one is available, else either (1) create a new one, up to the maximum number of connections you've decided to permit, or (2) if the maximum are already created, wait for one to become free
I'd recommend the Semaphore class to manage the connections; I actually have a short article on my web site on managing a resource pool with a Semaphore with an example I think you could adapt to your purpose
A couple of practical considerations:
For optimum performance, you need to be careful not to "hog" a connection while you're not actually using it to run a query. If you take a connection from the pool once and then pass it to various methods, you need to make sure you're not accidentally doing this.
Don't forget to return your connections to the pool! (try/finally is your friend here...)
On many systems, you can't keep connections open 'forever': the O/S will close them after some maximum time. So in your 'return a connection to the pool' method, you'll need to think about 'retiring' connections that have been around for a long time (build in some mechanism for remembering, e.g. by having a wrapper object around an actual JDBC Connection object that you can use to store metrics such as this)
You may want to consider using prepared statements.
Over time, you'll probably need to tweak the connection pool size
You can either pass in the connection or better yet use something like Jakarta Database Connection Pooling.
http://commons.apache.org/dbcp/
You should use a connection pool for that.
That way you could ask for the connection and release it when you are finish with it and return it to the pool
If another thread wants a new connection and that one is in use, a new one could be created. If no other thread is using a connection the same could be re-used.
This way you can leave your app somehow the way it is ( and not passing the connection all around ) and still use the resources properly.
Unfortunately first class ConnectionPools are not very easy to use in standalone applications ( they are the default in application servers ) Probably a microcontainer ( such as Sping ) or a good framework ( such as Hibernate ) could let you use one.
They are no too hard to code one from the scratch though.
:)
This google search will help you to find more about how to use one.
Skim through
Many JDBC drivers do connection pooling for you, so there is little advantage doing additional pooling in this case. I suggest you check the documentation for you JDBC driver.
Another approach to connection pools is to
Have one connection for all database access with synchronised access. This doesn't allow concurrency but is very simple.
Store the connections in a ThreadLocal variable (override initialValue()) This works well if there is a small fixed number of threads.
Otherwise, I would suggest using a connection pool.
If your application is single-threaded, or does all its database operations from a single thread, it's ok to use a single connection. Assuming you don't need multiple connections for any other reason, this would be by far the simplest implementation.
Depending on your driver, it may also be feasible to share a connection between threads - this would be ok too, if you trust your driver not to lie about its thread-safety. See your driver documentation for more info.
Typically the objects below "Connection" cannot safely be used from multiple threads, so it's generally not advisable to share ResultSet, Statement objects etc between threads - by far the best policy is to use them in the same thread which created them; this is normally easy because those objects are not generally kept for too long.