I have a Spring Boot (v2.0.8) application which makes use of a HikariCP (v2.7.9) Pool (connecting to MariaDB) configured with:
minimumIdle: 1
maximumPoolSize: 10
leakDetectionThreshold: 30000
The issue is that our production component, once every few weeks, is repeatedly throwing SQLTransientConnectionException " Connection is not available, request timed out after 30000ms...". The issue is that it never recovers from this and consistently throws the exception. A restart of the componnent is therefore required.
From looking at the HikariPool source code, it would seem that this is happening because every time it is calling connectionBag.borrow(timeout, MILLISECONDS) the poolEntry is null and hence throws the timeout Exception. For it to be null, the connection pool must have no free entries i.e. all PoolEntry in the sharedList are marked IN_USE.
I am not sure why the component would not recover from this since eventually I would expect a PoolEntry to be marked NOT_IN_USE and this would break the repeated Exceptions.
Possible scenarios I can think of:
All entries are IN_USE and the DB goes down temporarily. I would expect Exceptions to be thrown for the in-flight queries. Perhaps at this point the PoolEntry status is never reset and therefore is stuck at IN_USE. In this case I would have thought if an Exception is thrown the status is changed so that the connection can cleared from the pool. Can anyone confirm if this is the case?
A flood of REST requests are made to the component which in turn require DB queries to be executed. This fills the connection pool and therefore subsequent requests timeout waiting for previous requests to complete. This makes sense however I would expect the component to recover once the requests complete, which it is not.
Does anyone have an idea of what might be the issue here? I have tried configuring the various timeouts that are in the Hikari documentation but have had no luck diagnosing / resolving this issue. Any help would be appreciated.
Thanks!
Scenario 2 is most likely what is happening. I ran into the same issue when using it with cloud dataflow and receiving a large amount of connection requests. The only solution I found was to play with the config to find a combination that worked for my use case.
I'll leave you my code that works for 50-100 requests per second and wish you luck.
private static DataSource pool;
final HikariConfig config = new HikariConfig();
config.setMinimumIdle(5);
config.setMaximumPoolSize(50);
config.setConnectionTimeout(10000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
config.setJdbcUrl(JDBC_URL);
config.setUsername(JDBC_USER);
config.setPassword(JDBC_PASS);
pool = new HikariDataSource(config);
Related
I am working on a legacy project which uses Java 8, Spring, HikariCP, and MySQL. Microservices' methods are triggered with a Kafka topic and start a reporting operation. Almost all triggered methods have this and some of them have the same usage inside their blocks.
new ForkJoinPool().submit(() -> { users.parallelStream().forEach(user ->
The application creates 8-9k threads and all of them try to get or create a record. However, the database couldn't handle these requests and started to throw exceptions and Zabbix sends mails about heap memory usage above %90:
Caused by: java.sql.SQLTransientConnectionException: HikariPool-2 -
Connection is not available, request timed out after 30000ms.
When I check the database and see the variable for max_connections = 600, but this is not enough.
I want to set a limit for thread count for the application level.
I tried setting these parameters but the thread size doesn't decrease.
SPRING_TASK_EXECUTION_POOL_QUEUE-CAPACITY , SPRING_TASK_EXECUTION_POOL_MAX-SIZE, -Djava.util.concurrent.ForkJoinPool.common.parallelism
Is there any property to solve this problem?
I have changed all new ForkJoinPool() to ForkJoinPool.commonPool() and use this parameter to control thread creation -Djava.util.concurrent.ForkJoinPool.common.parallelism after that I have fixed my problem.
I am running a batch job in AWS which consumes messages from a SQS queue and writes them to a Kafka topic using akka. I've created a Sqs Async Client with the following parameters:
private static SqsAsyncClient getSqsAsyncClient(final Config configuration, final String awsRegion) {
var asyncHttpClientBuilder = NettyNioAsyncHttpClient.builder()
.maxConcurrency(100)
.maxPendingConnectionAcquires(10_000)
.connectionMaxIdleTime(Duration.ofSeconds(60))
.connectionTimeout(Duration.ofSeconds(30))
.connectionAcquisitionTimeout(Duration.ofSeconds(30))
.readTimeout(Duration.ofSeconds(30));
return SqsAsyncClient.builder()
.region(Region.of(awsRegion))
.httpClientBuilder(asyncHttpClientBuilder)
.endpointOverride(URI.create("https://sqs.us-east-1.amazonaws.com/000000000000")).build();
}
private static SqsSourceSettings getSqsSourceSettings(final Config configuration) {
final SqsSourceSettings sqsSourceSettings = SqsSourceSettings.create().withCloseOnEmptyReceive(false);
if (configuration.hasPath(ConfigPaths.SqsSource.MAX_BATCH_SIZE)) {
sqsSourceSettings.withMaxBatchSize(10);
}
if (configuration.hasPath(ConfigPaths.SqsSource.MAX_BUFFER_SIZE)) {
sqsSourceSettings.withMaxBufferSize(1000);
}
if (configuration.hasPath(ConfigPaths.SqsSource.WAIT_TIME_SECS)) {
sqsSourceSettings.withWaitTime(Duration.of(20, SECONDS));
}
return sqsSourceSettings;
}
But, whilst running my batch job I get the following AWS SDK exception:
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.
The exception still seems to occur even after I try tweaking the parameters mentioned here:
Consider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate. Increasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process. If you already are fully utilizing your network interface or cannot further increase your connection count, increasing the acquire timeout gives extra time for requests to acquire a connection before timing out. If the connections doesn't free up, the subsequent requests will still timeout. If the above mechanisms are not able to fix the issue, try smoothing out your requests so that large traffic bursts cannot overload the client, being more efficient with the number of times you need to call AWS, or by increasing the number of hosts sending requests
Has anyone run into this issue before?
I encountered the same issue, and I ended up firing 100 async batch requests then wait for those 100 to get cleared before firing another 100 and so on.
I'm dealing with the Tomcat configuration on springboot.
Let's supposse i have the following configuration:
server:
tomcat:
min-spare-threads: ${min-tomcat-threads:20}
max-threads: ${max-tomcat-threads:20}
accept-count: ${accept-concurrent-queue:1}
max-connections: ${max-tomcat-connections:100}
I have a simple RestController with this code:
public String request(#Valid #RequestBody Info info) {
log.info("Thread sleeping");
Thread.sleep(8000);
return "OK";
}
Then i make the following test:
I send 200 HTTP request per second.
I check the log and as I expected I see 100 simultaneous executions and after 8 seconds I see the last one (queued).
Other executions are rejected.
The main problem that i have with this is that if i have a timeout control on client call (for example, 5 seconds), the queued operation will be processed on server anyways even if it was rejected on client.
I want to avoid this situation, so I tried:
server:
tomcat:
min-spare-threads: ${min-tomcat-threads:20}
max-threads: ${max-tomcat-threads:20}
accept-count: ${accept-concurrent-queue:0}
max-connections: ${max-tomcat-connections:100}
But this "0" is totally ignored (i think in this case it means "infinite").
So, my question is:
¿Is it possible to configure Tomcat to don't queue operations if the max-connections limit is reached?
Or maybe
¿Is it possible to configure Tomcat to reject any operation queued?
Thank you very much in advance.
Best regards.
The value of the acceptCount parameter is passed directly to the operating system: e.g. for UNIX-es it is passed to listen. Since an incoming connection is always put in the OS queue before the JVM accepts it, values lower than 1 make no sense. Tomcat explicitly ignores such values and keeps its default 100.
However, the real queue in Tomcat are the connections that where accepted from the OS queue, but are not being processed due to a lack of processing threads (maxThreads). You might have at most maxConnections - maxThreads + 1 such connections. In your case it's 81 connections waiting to be processed.
Here is the thing,
I'm creating an SQS Connection. I'm using the same connection to create consumers to listen to two different queues(Q1, Q2).
Enabling and disabling to queue is handled by the Admin user of application through a UI.
So, Whenever I disable Q1 consumer, I shouldn't close the connection, and close the connection only when both Q1 & Q2 Consumers are disabled, I can't afford to write complex code to check if both consumers are disabled.
Is there a way to check idle time of an open SQSConnection.
or
I would like to know the cost of keeping an SQSConnection open all the time
or
How about opening two different connections
here is how I'm creating the connection
SQSConnectionFactory connectionFactory = new SQSConnectionFactory(
new ProviderConfiguration(), ((AmazonSQSClientBuilder)
AmazonSQSClientBuilder.standard().withRegion(sqsRegion)).
withCredentials(
_getCredentialsProvider(awsSecretKey, awsAccessKey)));
_connection = connectionFactory.createConnection();
The entire question, here, seems premised on the unfortunate name SQSConnectionFactory, which isn't what this really is. A more accurate name might have been something like SQSConfiguredClientFactory.
None of the createConnection methods set-up the physical connection to SQS
https://github.com/awslabs/amazon-sqs-java-messaging-lib/blob/master/src/main/java/com/amazon/sqs/javamessaging/SQSConnectionFactory.java
...because SQS doesn't actually use established/continuous "connections."
The service API interactions take place over HTTPS, with transient connections being created, kept alive, and destroyed as other methods (e.g. receiveMessage(queueUrl)) need them.
So with regard to your questions: 1. connections are not left "open" in any meaningful/relevant sense, so there is nothing to check; 2. the only cost comes from actually using the connections to send/receive/delete messages; and 3. this seems unnecessary for the reasons indicated above.
Problem Statement
We have been using H2 in embedded mode for a while now. It has a connection pool configured above it. Following is the current pool configuration:
h2.datasource.min-idle=10
h2.datasource.initial-size=10
h2.datasource.max-active=200
h2.datasource.max-age=600000
h2.datasource.max-wait=3000
h2.datasource.min-evictable-idle-time-millis=60000
h2.datasource.remove-abandoned=true
h2.datasource.remove-abandoned-timeout=60
h2.datasource.log-abandoned=true
h2.datasource.abandonWhenPercentageFull=100
H2 config:
spring.h2.console.enabled=true
spring.h2.console.path=/h2
h2.datasource.url=jdbc:h2:file:~/h2/cartdb
h2.server.properties=webAllowOthers
spring.h2.console.settings.web-allow-others=true
h2.datasource.driver-class-name=org.h2.Driver
*skipping username and password properties.
We have verified that the above configuration takes effect by logging the pool properties.
The issue with this setup is that we are observing regular(though intermittent) connection pool exhaustion and once the pool hits the max limit it starts throwing the following exception for some queries.
SqlExceptionHelper.logExceptions(SqlExceptionHelper.java:129) - [http-apr-8080-exec-38] Timeout: Pool empty. Unable to fetch a connection in 3 seconds, none available[size:200; busy:200; idle:0; lastwait:3000].
And thereafter it fails to recover from this state even after many hours until we restart the web server(tomcat in this case).
H2 driver dependency:
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.4.196</version>
<scope>runtime</scope>
</dependency>
Query Pattern & Throughput
We use h2 to load up some data for every request, then execute a few(about 50) SELECT queries and finally delete the data. This results into consistent 30k-40k calls per minute(except off hours) on h2(according to new relic monitoring).
Every read operation acquires a new connection and releases the same after execution.
EntityManager entityManager = null;
try {
entityManager = entityManagerFactory.createEntityManager();
Query query = entityManager.createNativeQuery(sqlQuery);
query.setParameter("cartId", cartId);
List<String> resultList = query.getResultList();
return resultList;
} finally {
if(null != entityManager) { entityManager.close(); }
}
Observations
After application restart the pool utilization is minimal until at one moment when the pool utilization abruptly shoots up and eventually reaches max limit. This happens over the course of 1-2 days.
Once the pool hits the maximum connection limit, the borrowed connection count increases at a faster pace as compared the the returned connection count which otherwise remains very close to one another.
At the same time the abandoned connection count also starts increasing along with the abandon logs.
Interestingly the query response times remains the same after pool exhaustion. So this kind of rules out slow query.
This issue has happened at even the oddest of the hours when the traffic is minimum. So it has no relation to the traffic.
Please guide us in the right direction to solve this issue.
UPDATE
Recently we discovered the following causes in our stack trace when one such incident occured:
Caused by: org.h2.jdbc.JdbcSQLException: Database may be already in
use: null. Possible solutions: close all other connection(s); use the
server mode [90020-196]
Caused by: java.lang.IllegalStateException:The file is locked:
nio:/root/h2/cartdb.mv.db [1.4.196/7]
Caused by: java.nio.channels.OverlappingFileLockException
So after digging into this we have decided to move to in-memory mode as we don't require to persist the data beyond the application's life time. As a result, the file lock should not occur thereby reducing or eradicating this issue.
Will come back and update in either case.
Since the last update on the question:
After observing the performance for quite some time we have come to the conclusion that using H2 in file-mode(embedded) was somehow leading to file lock exceptions periodically(though irregular).
Since our application does not require to persist the data beyond the application's lifetime, we decided to move to pure in-memory mode.
Though the file lock exception's mystery still needs to disclosed.