Trying to figure out how to manage/use long-living DB connections. I have too little experience of this kind, as I have used DB only with small systems (up to some 150 concurrent users, each one had its own DB user/pass, so there were up to 150 long-living DB connections at any time) or web pages (each page request has its own DB connection that lasts for less than a second, so number of concurrent DB conncetions isn't huge).
This time there will be a Java server and Flash client. Java connects to PostgreSQL. Connections are expected to be long-living, i.e., they're expected to start when Flash client connects to Java server and to end when Flash client disconnects. Would it be better to share single connection between all users (clients) or to make private connection for every client? Or some other solution would be better?
*) Single/shared connection:
(+) pros
only one DB connection for whole system
(-) cons:
transactions can't be used (e.g., "user1.startTransaction(); user1.updateBooks(); user2.updateBooks(); user1.rollback();" to a single shared connection would rollback changes that are done by user2)
long queries of one user might affect other users (not sure about this, though)
*) Private connections:
(+) pros
no problems with transactions :)
(-) cons:
huge number of concurrent connections might be required, i.e., if there are 10000 users online, 10000 DB connections are required, which seems to be too high number :) I don't know anything about expected number of users though, as we are still in process of researching and planning.
One solution would be to introduce timeouts, i.e., if DB connection is not used for 15/60/900(?) seconds, it gets disconnected. When user again needs a DB, it gets reconnected. This seems to be a good solution for me, but I would like to know what might be the reasonable limits for this, e.g., what might be the max number of concurrent DB connections, what timeout should be used etc.
Another solution would be to group queries into two "types" - one type that can safely use single shared long-living connection (e.g., "update user set last_visit = now() where id = :user_id"), and another type that needs a private short-living connection (e.g., something that can potentially do some heavy work or use transactions). This solution does not seem to be appealing for me, though if that's the way it should be done, I could try to do this...
So... What do other developers do in such cases? Are there any other reasonable solutions?
I don't use long-lived connections. I use a connection pool to manage connections, and I keep them only for as long as it takes to perform an operation: get the connection, perform my SQL operation, return the connection to the pool. It's much more scalable and doesn't suffer from transaction problems.
Let the container manage the pool for you - that's what it's for.
By using single connection, you also get very low performance because the database server will only allocate one connection for you.
You definitely need a connection pool. If you app runs inside an application server, use the container pool. Or you can use a connection pool library like c3p0.
Related
Is there currently any JDBC connection Pool implementation, which can handle requests with different users. I.e. query should be amended by user from which it is going to be sent - and pool finds the suitable idle connection for this specific user, or creates new one (possibly discarding some idle connection created some time ago for other user).
Probably the question was already asked in some form, but it seems without positive results... At least I couldn't find them (see here multi-user jdbc connection pool ( Spring-jdbc ))
More thorough explanation:
The matter is that we have a java rest-service, which serves for sending requests to database (requests themselves are coming from various small scripts). The service is needed as a wrapper around connection pool (so the scripts are executed faster). Another requirement is that number of connections being executed should not exceed certain limit (few dozens) - so the service ensures such behavior by processing queries asynchronously.
However recently it appeared there are security limitations in the organization, and few tables/databases in this source should be accessed with different user.
Straightforward solution is to have two services (or single service with two pools). However this is hard to make efficient, regarding total connection limit. E.g. we can create two pools with half-limit, but then it will process data in half speed maximum for given user (even if other user is idle now).
So I'm searching for ready pool implementation or hints on makeshift implementation. This doesn't look extremely hard, but reinventing the wheel is not nice if some handy data structure exists.
So, to conclude, I do understand that using single source with different users is not the best idea - but it seems we are limited by specific infrastructure requirements.
I've seen two ways to deal with database connections:
1) Connection pool
2) Bind connection to a thread (when we have fixed and constant threads count)
But I don't undestand what is the purpose of using #2. What are the advantagase of the second behaviour over the first one?
If you're working with a single thread or a very small set of threads (that need database functionality), binding a connection to a thread would act like a poor man's connection pool. Instead of checking out a connection from the pool every time you use it, you would just use the single connection bound to the thread. This would allow for quick execution of database queries, even with code that hasn't been very well designed.
However in many cases you're not working with a single thread or a small set of threads. As soon as you're developing an application with even dozens of simultaneous users, you're better off working with a connection pool as it will become impossible to dedicate a connection to every thread (see next paragraph).
Some people also have the misunderstanding that a connection pool can and should have a lot of connections (100 or more), even though it's often more advantageous to have fewer. Since all of the connections use the database's resources, the effect is similar to having a store with a single cash register. It's not more efficient to have 10 doors to the store instead of 1, since it will just fill up with customers but the payments won't happen any faster.
Due to some previous questions that I've had answered about the synchronous nature of MySQL I'm starting to question the reason people use Connection pools, and if in my scenario I should move to a pool.
Currently my application keeps a single connection active. There's only a single connection, statement, and result set being used in my application that's recycled. All of my database tasks are placed in a queue and executed back to back on a seperate thread. One thread for database queries, One connection for database access. In the event that the connection has an issue, it will dispose of the connection and create a new one.
From my understanding regardless of how many queries are sent to MySQL to be processed they will all be processed synchornously in the order they are received. It does not matter if these queries are coming from a single source or multiple, they will be executed in the order received.
With this being said, what's the point in having multiple connections and threads to smash queries into the database's processing queue, when regardless it's going to process them one by one anyway. A query is not going to execute until the query before it has completed processing, and like-wise in my scenario where I'm not using a pool, the next query is not going to be executed until the previous query has completed processing.
Now you may say:
The amount of time spent on processing the results provided by the MySQL query will increase the amount of time between queries being executed.
That's obviously correct, which is why I have a worker thread that handles the results of a query. When a query is completed, I convert the results into Map<> format and release the statement/resultset from memory and start processing the next query. The Map<> is sent off to a separate Worker thread for processing, so it doesn't congest the query execution thread.
Can anyone tell me if the way I'm doing things is alright, and if I should take the time to move to a pool of connections rather than a persistent connection. The most important thing is why. I'm starting this thread strictly for informational purposes.
EDIT: 4/29/2016
I would like to add that I know what a connection pool is, however I'm more curious about the benefits of using a pool over a single persistent connection when the table locks out requests from all connections during query processing to begin with.
Just trying this StackOverflow thing out but,
In every connection to a database, most of the time, it's idle. When you execute a query in the connection to INSERT or UPDATE a table, it locks the table, preventing concurrent edits. While this is good and all, preventing data overwriting or corruption, this means that no other connections may make edits while the first connection/query is still running.
However, starting a new connection takes time, and in larger infrastructures trying to skim and skin off all excess time wastage, this is not good. As such, connection pools are a whole group of connections left in the idle state, ready for the next query.
Lastly, if you are running a small project, there's usually no reason for a connection pool but if you are running a large site with UPDATEs and INSERTs flying around every millisecond, a connection pool reduces overhead time.
Slightly related answer:
a pool can do additional "connection health checks" (by examining SQL exception codes)
and refresh connections to reduce memory usage (see the note on "maxLifeTime" in the answer).
But all those things might not outweigh the simpler approach using one connection.
Another factor to consider is (blocking) network I/O times. Consider this (rough) scenario:
client prepares query --> client sends data over the network
--> server receives data from the network --> server executes query, prepares results
--> server sends data over the network --> client receives data from the network
--> client prepares resultset
If the database is local (on the same machine as the client) then network times are barely noticeable.
But if the database is remote, network I/O times can become measurable and impact performance.
Assuming the isolation level is at "read committed", running select-statements in parallel could become faster.
In my experience, using 4 connections at the same time instead of 1 generally improves performance (or throughput).
This does depend on your specific situation: if MySQL is indeed just mostly waiting on locks to get released,
adding additional connections will not do much in terms of speed.
And likewise, if the client is single-threaded, the client may not actually perceive any noticeable speed improvements.
This should be easy enough to test though: compare execution times for one program with 1 thread using 1 connection to execute an X-amount of select-queries
(i.e. re-use your current program) with another program using 4 threads with each thread using 1 separate connection
to execute the same X-amount of select-queries divided by the 4 threads (or just run the first program 4 times in parallel).
One note on the connection pool (like HikariCP): the pool must ensure no transaction
remains open when a connection is returned to the pool and this could mean a "rollback" is send each time a connection is returned to the pool
(closed) when auto-commit is off and no "commit" or "rollback" was send previously.
This in turn can increase network I/O times instead of reducing it. So make sure to test with either
auto-commit on or make sure to always send a commit or rollback after your query or set of queries is done.
Connection pool and persistent connection are not the same thing.
One is the limit of the number of SQL connections, the other is single Pipe issues.
The problem is generally the time taken to transfer the SQL output to the server than the query execution time. So if you open two cli SQL clients and fire two queries, one with a large output and one with a small output (in that sequence), the smaller one finishes first while the larger one is still scrolling its output.
The point here is that multiple connection does solve problems for cases like the above.
When you have multiple front end requests asking for queries, you may prefer persistent connections because it gives you the benefit of multiplex over different connections (large versus small outputs) and prevents the overhead of session setup/teardown.
Connection pool APIs have inbuilt error checks and handling but most APIs still expect you to manually declare if you want a persistent connection or not.
So in effect there are 3 variables, pool, persistence and config parameters via the API. One has to make a mix and match of pool size, persistence and number of connections to suite one's environment
I have a MySQL database with ~8.000.000 records. Since I need to process them all I use a BlockingQueue which as Producer reads from the database and puts 1000 records in a queue. The Consumer is the processor that takes records from the queue.
I am writing this in Java, however I'm stuck to figure out how I can (in a clean, elegant way) read from my database and 'suspend' reading once the BlockingQueue is full. After this the control is being handed to the Consumer until there are free spots available again in the BlockingQueue. From here on the Producer should continue reading in records from the database.
Is it clean/elegant/efficient keeping my database connection open inorder for it to continuously read? Or should, once the control is shifted from Producer to Consumer, close the connection, store the id of the record read so far and later open the connection and start reading from that id? The latter seems to me not really good since my database will have to open/close a lot! However, the former is not so elegant in my opinion either?
With persistent connections:
You cannot build transaction processing effectively
Impossible user sessions on the same connection
The applications are not scalable.
With time you may need to extend it and it will require management/tracking of persistent connections
If the script, for whatever reason, could not release the lock on the table, then any following scripts will block indefinitely and one should restart the db server.
Using transactions, transaction block will also pass to the next script (using the same connection) if script execution ends before the transaction block completes, etc.
Persistent connections do not bring anything that you can do with non-persistent connections.
Then, why to use them, at all?
The only possible reason is performance, to use them when overhead of creating a link to your MySQL Server is high. And this depends on many factors like:
Database type
Whether MySQL server is on the same machine and, if not, how far? might be out of your local network /domain?
How much overloaded by other processes the machine on which MySQL sits
One always can replace persistent connections with non-persistent connections. It might change the performance of the script, but not its behavior!
Commercial RDBMS might be licensed by the number of concurrent opened connections and here the persistent connections can mis serve.
If you are using a bounded BlockingQueue by passing a capacity value in the constructor, then the producer will block when it attempts to call put() until the consumer removes an item by calling take().
It would help to know more about when or how the program is going to execute to decide how to deal with database connections. Some easy choices are: have the producer and all consumers get an individual connection, have a connection pool for all consumers while the producer holds the a connection, or have all producers and consumers use a connection pool.
You can facilitate minimizing the number of connections by using something such as Spring to manage your connection pool and transactions; however, it would only be necessary in some execution situations.
I have to develop a high scalable webservice, but the connection pool size (Oracle DB) is set to 50.
Having this size means that the number of concurrent request served will be max 50 ,otherwise the no new connections will be available right ?
But by configuration is possible for the Weblogic or Glassfish server to accept more then 50 requests simultaneously ?
I read that the server accepts the request which are 'queued' until a thread is handling them.
Regarding 'scalability' I have a question mark as well because the average DB calls take 1,2 sec. + the soap overhead...==> in a 2,3 sec response time on each call.
Can I estimate how many concurrent users the server will support (Weblogic or Glasfish 4gb) ?
Thank you
Having a maximum of 50 connections in the pool doesn't mean you can only handle 50 users at any one time. Each page request should generate queries that can interleave with each other: so while you can only have 50 queries running at any one time, should be able to handle many more page requests. This can be helped by making sure you only connect to the database for short periods.
The use of connection pools is primarily to avoid the cost of setting up new connections all the time (plus prepared statements are cached etc.), so the intention is to re-use them as frequently as possible.
When you say the average DB call takes 1.2 secs: if this a single query I think you want to look at the query or table indexes to reduce this time (otherwise I'm afraid you are going to get scalability problems no matter what), but if it is multiple queries then they should interleave with other requests quite happily.
As regards queuing: weblogic will queue queries, but you can set a timeout so the query is returned unfulfilled after a set time. You can then decide to try again or tell the user the system is busy and perhaps try again later.
When you are talking about web service, you need to keep a optimum balance between your connection pool and concurrent requests. For the concept you can refer: https://dzone.com/articles/optimum-database-connection-pool-size