I need some help analyzing a problem I am encountering during the last weeks.
Every here and then my application hangs. Basically it is based on postgres but for some very speedy interactions we transfer critical data into mongodb which is kept synced. This works out - its pretty fast and we do not habe problems keeping it synced.
We use java 1.6 and spring 3.2. I instantiate one Mongo-class and #Autowire it in about 15 business logic classes.
Now my problem is: After about 2 days I get exceptions:
com.mongodb.DBPortPool$ConnectionWaitTimeOut: Connection wait timeout after 120000 ms
at com.mongodb.DBPortPool.get(DBPortPool.java:186)
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:327)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:212)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:189)
at com.mongodb.DBApiLayer$Result._advance(DBApiLayer.java:452)
at com.mongodb.DBApiLayer$Result.hasNext(DBApiLayer.java:418)
at com.mongodb.DBCursor._hasNext(DBCursor.java:503)
at com.mongodb.DBCursor.hasNext(DBCursor.java:523)
I suppose, the pool is running out of connections.
Now my question:
Do I always have to call db.requestDone()?
According to this page:
http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency - first paragraph, I read:
The Mongo object maintains an internal pool of connections to the
database (default pool size of 10). For every request to the DB (find,
insert, etc) the java thread will obtain a connection from the pool,
execute the operation, and release the connection.
To me this sounded like the connection is being passed back whenever the thread is closed, which should have happened after the request has finished.
We do not have any long-living threads in our app accessing mongodb.
Thanks for any hint!
Jan
We ran into this problem when we failed to close our cursors after reading. After adding a cursor.close()
after all our cursor operations, the problem went away
Related
UPDATE Added Read/Write Throughput, IOPS, and Queue-Depth graphs metrics and marked graph at time-position where errors I speak of started
NOTE: Hi, just looking for suggestions of what could possibly be causing this issue from experienced DBA or database developers (or anyone that would have knowledge for that matter). Some of the logs/data I have are sensitive, so I cannot repost here but I did my best to provide screen shots and data from my debugging so it would allow people to help me. Thank you.
Hello, I have a Postgres RDS database (version 12.7 engine) that is hosted on Amazon (AWS). This database is "hit" or called by a API client (Spring Boot/Web/Hibernate/JPA Java API) thousands of times per hour. It is only executing one 1 hibernate sql query on the backend that is on a Postgres View across 5 tables. queryDB instance (class = db.m5.2xlarge) specs are:
8 vCPU
32 GB RAM
Provisioned IOPS SSD Storage Type
800 GiB Storage
15000 Provisioned IOPS
The issue I am seeing is on Saturdays I wake up to many logs of JDBCConnectionExceptions and I noticed my API Docker containers (Defined as Service-Task on ECS) which are hosted on AWS Elastic Container Service (ECS) will start failing and return a HTTP 503 error, e.g.
org.springframework.dao.DataAccessResourceFailureException: Unable to acquire JDBC Connection; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
Upon checking AWS RDS DB status, I can see also the sessions/connections increase dramatically, as seen in image below with ~600 connections. It will keep increasing, seeming to not stop.
Upon checking the postgres database pg_locks and pg_stat_activity tables when I started getting all these JDBCConnectionExceptions and the DB Connections jumped to around ~400 (at this specific time), I did indeed see many of my API queries logged with interesting statuses. I exported the data to CSV and have included an excerpt below:
wait_event_type wait_event state. query
--------------- ------------ --------------------------------------------- -----
IO DataFileRead active (480 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
IO DataFileRead idle (13 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
IO DataFilePreFetch active (57 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
IO DataFilePreFetch idle (2 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
Client ClientRead idle (196 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
Client ClientRead active (10 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
LWLock BufferIO idle (1 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
LWLock BufferIO active (7 times logged in pg_stat_activity) SELECT * ... FROM ... query from API on postgres View
If I look at my pg_stats_activity table when my API and DB are running and stable, the majority of the rows from the API query are simply Client ClientRead idle status, so I feel something is wrong here.
You can see the below "performance metrics" on the DB at the time this happened (i.e. roughly 19:55 UTC or 2:55PM CST), the DataFileRead and DataFilePrefetch are astronomically high and keep increasing, which backs up the pg_stat_activity data I posted above. Also, as I stated above, during normal DB use when it is stable, the API queries will simply be in Client ClientRead Idle status in pg_stat_activity table, the the numerous DataFileRead/Prefetches/IO and ExclusiveLocks confuses me.
I don't expect anyone to debug this for me, though I would appreciate it if a DBA or someone who has experienced similiar could narrow down the issue possibly for me. I honestly wasn't sure if it was an API query taking too long (wouldn't make sense, because API has ben running stable for years), something running on the Postgres DB without my knowledge on Saturday (I really think something like this is going on), or a bad postgresql Query coming into the DB that LOCKS UP the resources and causes a deadlock (doesn't completely make sense to me as I read Postgres resolves deadlocks on its own). Also, as I stated before, all the API calls that make an SQL query on the backend are just doing SELECT ... FROM ... on a Postgres VIEW, and from what I understand, you can do concurrent SELECTS with ExclusiveLocks so.....
Would take any advice here or suggestions for possible causes of this issue
Read-Throughput (first JdbcConnectionException occured around 2:58PM CST or 14:58, so I marked the graph where READ throughput starts to drop since the DB queries are timing out and API containers are failing)
Write-Throughput (API only READS so I'm assuming spikes here are for writing to Replica RDS to keep in-sync)
Total IOPS (IOPS gradually increasing from morning i.e. 8AM, but that is expected as API calls were increasing, but these total counts of API calls match other days when there are 0 issues so doesn't really point to cause of this issue)
Queue-Depth (you can see where I marked graph and where it spikes is exactly around 14:58 or 2:58PM where first JdbcConnectionExceptions start occuring, API queries start timing out, and Db connections start to increase exponentially)
EBS IO Balance (burst balance basically dropped to 0 at this time as-well)
Performance Insights (DataFileRead, DataFilePrefetch, buffer_io, etc)
This just looks like your app server is getting more and more demanding and the database can't keep up. Most of the rest of your observations are just a natural consequence of that. Why it is happening is probably best investigated from the app server, not from the database server. Either it is making more and more requests, or each one is takes more IO to fulfill. (You could maybe fix this on the database by making it more efficient, like adding a missing index, but that would require you sharing the query and/or its execution plan).
It looks like your app server is configured to maintain 200 connections at all times, even if almost all of them are idle. So, that is what it does.
And that is what ClientRead wait_event is, it is just sitting there idle trying to read the next request from the client but is not getting any. There are probably a handful of other connections which are actively receiving and processing requests, doing all the real work but occupying a small fraction of pg_stat_activity. All of those extra idle connections aren't doing any good. But they probably aren't doing any real harm either, other than making pg_stat_activity look untidy, and confusing you.
But once the app server starts generating requests faster than they can be serviced, the in-flight requests start piling up, and the app server is configured to keep adding more and more connections. But you can't bully the disk drives into delivering more throughput just by opening more connections (at least not once you have met a certain threshold where it is fully saturated). So the more active connections you have, the more they have to divide the same amount of IO between them, and the slower each one gets. Having these 700 extra connections all waiting isn't going to make the data arrive faster. Having more connections isn't doing any good, and is probably doing some harm as it creates contention and dealing with contention is itself a resource drain.
The ExclusiveLocks you mention are probably the locks each active session has on its own transaction ID. They wouldn't be a cause of problems, just an indication you have a lot of active sessions.
The BufferIO is what you get when two sessions want the exact same data at the same time. One asks for the data (DataFileRead) and the other asks to be notified when the first one is done (BufferIO).
Some things to investigate.
Query performance can degrade over time. The amount of data being requested can increase, especially with date predicated ones. Look at Performance Insights you can see how many blocks are read(disk/io), hit(from the buffer) You want as much hit as possible. The loss of burst balance is a real indicator that this is something that is happening. Its not an issue during the week as you have less requests.
The actual amount of shared buffers you have to service these queries, the default is 25% of RAM, you could tweak this to be higher, some say 40%.. Its a dark art and you will unlikely find an answer outside of tweak and test.
Vacuum and analyzing your tables. Data comes from somewhere right? With updates and deletes and inserts tables grow get full of garbage etc. At a certain point the autovacuum processes aren't enough at default levels. You can tweak these to be more agressive, manually fire at night etc.
Index management, same as above.
Autovacuum docs
Resource Consumption
Based on what you've shared I would guess your connections are not being properly closed.
I am having lots of connection problems when working with Microsoft server and creating the connection from java.
I need to communicate with db in average every 2 sec, for verity of things. Most of my queries are within the 500 milliseconds.
About every 15 min or so, I am having a connection drop and one of my queries is failing. I got a retry mechanism, that always work within 3 tries.
My only problem is that, 500 milliseconds query turns to be 2 sec or longer when there is a connection drop.
What would be the best approach of connection to SQL server, the way I do it now:
create Connection
create Statement
execute it
and close both the statement and the connection
Or should I keep the connection opened and only create multiple statements for each of my queries?
Let us take one issue at a time, first the Connection.
Connection to a database is resource intensive, to create and to hold on to. It would be wiser to create a pool of connections and hold on to them until your application stops. A connection pool manager may be of great help in this context. I have personally used Apache's DBCP and found to be convenient and efficient. There could other alternatives too. When you need a connection borrow one from the pool and return it when its use is complete.
Due to some previous questions that I've had answered about the synchronous nature of MySQL I'm starting to question the reason people use Connection pools, and if in my scenario I should move to a pool.
Currently my application keeps a single connection active. There's only a single connection, statement, and result set being used in my application that's recycled. All of my database tasks are placed in a queue and executed back to back on a seperate thread. One thread for database queries, One connection for database access. In the event that the connection has an issue, it will dispose of the connection and create a new one.
From my understanding regardless of how many queries are sent to MySQL to be processed they will all be processed synchornously in the order they are received. It does not matter if these queries are coming from a single source or multiple, they will be executed in the order received.
With this being said, what's the point in having multiple connections and threads to smash queries into the database's processing queue, when regardless it's going to process them one by one anyway. A query is not going to execute until the query before it has completed processing, and like-wise in my scenario where I'm not using a pool, the next query is not going to be executed until the previous query has completed processing.
Now you may say:
The amount of time spent on processing the results provided by the MySQL query will increase the amount of time between queries being executed.
That's obviously correct, which is why I have a worker thread that handles the results of a query. When a query is completed, I convert the results into Map<> format and release the statement/resultset from memory and start processing the next query. The Map<> is sent off to a separate Worker thread for processing, so it doesn't congest the query execution thread.
Can anyone tell me if the way I'm doing things is alright, and if I should take the time to move to a pool of connections rather than a persistent connection. The most important thing is why. I'm starting this thread strictly for informational purposes.
EDIT: 4/29/2016
I would like to add that I know what a connection pool is, however I'm more curious about the benefits of using a pool over a single persistent connection when the table locks out requests from all connections during query processing to begin with.
Just trying this StackOverflow thing out but,
In every connection to a database, most of the time, it's idle. When you execute a query in the connection to INSERT or UPDATE a table, it locks the table, preventing concurrent edits. While this is good and all, preventing data overwriting or corruption, this means that no other connections may make edits while the first connection/query is still running.
However, starting a new connection takes time, and in larger infrastructures trying to skim and skin off all excess time wastage, this is not good. As such, connection pools are a whole group of connections left in the idle state, ready for the next query.
Lastly, if you are running a small project, there's usually no reason for a connection pool but if you are running a large site with UPDATEs and INSERTs flying around every millisecond, a connection pool reduces overhead time.
Slightly related answer:
a pool can do additional "connection health checks" (by examining SQL exception codes)
and refresh connections to reduce memory usage (see the note on "maxLifeTime" in the answer).
But all those things might not outweigh the simpler approach using one connection.
Another factor to consider is (blocking) network I/O times. Consider this (rough) scenario:
client prepares query --> client sends data over the network
--> server receives data from the network --> server executes query, prepares results
--> server sends data over the network --> client receives data from the network
--> client prepares resultset
If the database is local (on the same machine as the client) then network times are barely noticeable.
But if the database is remote, network I/O times can become measurable and impact performance.
Assuming the isolation level is at "read committed", running select-statements in parallel could become faster.
In my experience, using 4 connections at the same time instead of 1 generally improves performance (or throughput).
This does depend on your specific situation: if MySQL is indeed just mostly waiting on locks to get released,
adding additional connections will not do much in terms of speed.
And likewise, if the client is single-threaded, the client may not actually perceive any noticeable speed improvements.
This should be easy enough to test though: compare execution times for one program with 1 thread using 1 connection to execute an X-amount of select-queries
(i.e. re-use your current program) with another program using 4 threads with each thread using 1 separate connection
to execute the same X-amount of select-queries divided by the 4 threads (or just run the first program 4 times in parallel).
One note on the connection pool (like HikariCP): the pool must ensure no transaction
remains open when a connection is returned to the pool and this could mean a "rollback" is send each time a connection is returned to the pool
(closed) when auto-commit is off and no "commit" or "rollback" was send previously.
This in turn can increase network I/O times instead of reducing it. So make sure to test with either
auto-commit on or make sure to always send a commit or rollback after your query or set of queries is done.
Connection pool and persistent connection are not the same thing.
One is the limit of the number of SQL connections, the other is single Pipe issues.
The problem is generally the time taken to transfer the SQL output to the server than the query execution time. So if you open two cli SQL clients and fire two queries, one with a large output and one with a small output (in that sequence), the smaller one finishes first while the larger one is still scrolling its output.
The point here is that multiple connection does solve problems for cases like the above.
When you have multiple front end requests asking for queries, you may prefer persistent connections because it gives you the benefit of multiplex over different connections (large versus small outputs) and prevents the overhead of session setup/teardown.
Connection pool APIs have inbuilt error checks and handling but most APIs still expect you to manually declare if you want a persistent connection or not.
So in effect there are 3 variables, pool, persistence and config parameters via the API. One has to make a mix and match of pool size, persistence and number of connections to suite one's environment
This question already has answers here:
How to manage db connections on server?
(3 answers)
Closed 7 years ago.
I have a Java server and PostgreSQL database.
There is a background process that queries (inserts some rows) the database 2..3 times per second. And there is a servlet that queries the database once per request (also inserts a row).
I am wondering should I have separate Connection instances for them or share a single Connection instance between them?
Also does this even matter? Or is PostgreSQL JDBC driver internally just sending all requests to a unified pool anyway?
One more thing should I make a new Connection instance for every servlet request thread? Or share a Connection instance for every servlet thread and keep it open the entire up time?
By separate I mean every threads create their own Connection instances like this:
Connection connection = DriverManager.getConnection(url, user, pw);
If you use a single connection and share it, only one thread at a time can use it and the others will block, which will severely limit how much your application can get done. Using a connection pool means that the threads can have their own database connections and can make concurrent calls to the database server.
See the postgres documentation, "Chapter 10. Using the Driver in a Multithreaded or a Servlet Environment":
A problem with many JDBC drivers is that only one thread can use a
Connection at any one time --- otherwise a thread could send a query
while another one is receiving results, and this could cause severe
confusion.
The PostgreSQL™ JDBC driver is thread safe. Consequently, if your
application uses multiple threads then you do not have to worry about
complex algorithms to ensure that only one thread uses the database at
a time.
If a thread attempts to use the connection while another one is using
it, it will wait until the other thread has finished its current
operation. If the operation is a regular SQL statement, then the
operation consists of sending the statement and retrieving any
ResultSet (in full). If it is a fast-path call (e.g., reading a block
from a large object) then it consists of sending and retrieving the
respective data.
This is fine for applications and applets but can cause a performance
problem with servlets. If you have several threads performing queries
then each but one will pause. To solve this, you are advised to create
a pool of connections. When ever a thread needs to use the database,
it asks a manager class for a Connection object. The manager hands a
free connection to the thread and marks it as busy. If a free
connection is not available, it opens one. Once the thread has
finished using the connection, it returns it to the manager which can
then either close it or add it to the pool. The manager would also
check that the connection is still alive and remove it from the pool
if it is dead. The down side of a connection pool is that it increases
the load on the server because a new session is created for each
Connection object. It is up to you and your applications'
requirements.
As per my understanding,You should defer this task to the container to manage connection pooling for you.
As you're using Servlets,which will be running in a Servlet container, and all major Servlet containers that I'm aware of provide connection pool management.
See Also
Best way to manage database connection for a Java servlet
I am considering creating a HBase table when my application starts up and leaving it open as long as my application is running. My application may run indefinitely.
What happens if I never close the HBase table?
Is there a maximum time the connection can be open/idle before it need to be reinitialized?
How is the connection closed if the system crashed?
I have HBase The Definitive Guide but I have not found the information I am looking for in there. If there are any online references for this then please provide them.
This was extracted from "HBase in Action" page 25:
"Closing the table when you’re finished with it allows the underlying
connection resources to be returned to the pool."
This blog post is about timeouts in HBase. Generally speaking, there is a lot of them:
ZK session timeout (zookeeper.session.timeout)
RPC timeout (hbase.rpc.timeout)
RecoverableZookeeper retry count and retry wait (zookeeper.recovery.retry, zookeeper.recovery.retry.intervalmill)
Client retry count and wait (hbase.client.retries.number, hbase.client.pause)
You may try to raise them a bit and set a really high value for retry count. This can make your sessions be alive for a very long period of time.
When the system of HBase client crashed, the connection is closed by timeout.