We are working for an ecommerce built with Hybris framework and currently we have an issue with database connection (I suppose) and no idea on how to solve it. It happens only on production environment and only on servers that are used by ESB (2 servers in a total of 40).
Basically, sometimes (1-3/day), we discover sessions waiting for some idle session (SEL*NET message from client). We can only manually kill the holder in order to free these sessions.
All the servers share the same application code and the main difference between ESB and Frontend servers is in the controllers that are called and in the requests count.
ESB Server: 10 requests per minute
Frontend Server: 300 requests per minute
In the application log I found a lot of Closed Connection errors on these 2 servers and I think that this is related to our problem but actually I don't know why.
In access.log I have this request:
[26/Mar/2019:09:04:39 +0100] "GET /blockorder?orderCode=XXXX&access_token=XXXX HTTP/1.1" 400 122 "-" "AHC/1.0"
and in the console.log I have this:
hybrisHTTP8 2019-03-26 09:04:39,184 ERROR [[10.125.31.2] ] () [de.hybris.platform.jdbcwrapper.ConnectionImpl] error resetting AutoCommit
java.sql.SQLRecoverableException: Closed Connection
at oracle.jdbc.driver.PhysicalConnection.setAutoCommit(PhysicalConnection.java:3763)
at de.hybris.platform.jdbcwrapper.ConnectionImpl.doSetAutoCommit(ConnectionImpl.java:431)
at de.hybris.platform.jdbcwrapper.ConnectionImpl.restoreAutoCommit(ConnectionImpl.java:185)
at de.hybris.platform.jdbcwrapper.ConnectionImpl.unsetTxBound(ConnectionImpl.java:175)
at de.hybris.platform.tx.Transaction.unsetTxBoundConnection(Transaction.java:920)
at de.hybris.platform.tx.Transaction.clearTxBoundConnectionAndNotify(Transaction.java:897)
at de.hybris.platform.tx.Transaction.clearTxBoundConnectionAndNotifyRollback(Transaction.java:887)
at de.hybris.platform.tx.Transaction.rollbackOuter(Transaction.java:1084)
at de.hybris.platform.tx.Transaction.rollback(Transaction.java:1028)
at de.hybris.platform.tx.Transaction.commit(Transaction.java:690)
at de.hybris.platform.tx.Transaction.finishExecute(Transaction.java:1218)
at de.hybris.platform.tx.Transaction.execute(Transaction.java:1205)
at de.hybris.platform.tx.Transaction.execute(Transaction.java:1160)
at de.hybris.platform.jalo.Item.setAllAttributes(Item.java:2082)
at de.hybris.platform.jalo.Item.setAllAttributes(Item.java:2057)
at de.hybris.platform.servicelayer.internal.converter.impl.ItemModelConverter.storeAttributes(ItemModelConverter.java:1503)
at de.hybris.platform.servicelayer.internal.converter.impl.ItemModelConverter.save(ItemModelConverter.java:730)
at de.hybris.platform.servicelayer.internal.model.impl.wrapper.ModelWrapper.save(ModelWrapper.java:336)
at de.hybris.platform.servicelayer.internal.model.impl.ResolvingModelPersister.saveOthers(ResolvingModelPersister.java:64)
at de.hybris.platform.servicelayer.internal.model.impl.ResolvingModelPersister.persist(ResolvingModelPersister.java:49)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.saveViaJalo(DefaultModelService.java:1059)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.doJaloPersistence(DefaultModelService.java:648)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.persistWrappers(DefaultModelService.java:1002)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.performPersistenceOperations(DefaultModelService.java:626)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.saveAllInternal(DefaultModelService.java:620)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.saveAll(DefaultModelService.java:600)
at de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService.save(DefaultModelService.java:548)
at com.test.fulfilment.process.impl.DefaultOrderProcessService.requestForcedOrderCancellation(DefaultOrderProcessService.java:131)
at com.test.application.order.facades.impl.DefaultOrderFacade.forcedOrderCancel(DefaultOrderFacade.java:62)
at com.test.application.controllers.OrderController.blockOrder(OrderController.java:520)
Our pool config is the following:
{
"maxIdle": 90,
"minIdle": 2,
"maxActive": 90,
"maxWait": 10000,
"whenExhaustedAction": 1,
"testOnBorrow": true,
"testOnReturn": true,
"testWhileIdle": true,
"timeBetweenEvictionRunsMillis": 10000,
"numTestsPerEvictionRun": 100,
"minEvictableIdleTimeMillis": 300000,
"softMinEvictableIdleTimeMillis": -1,
"lifo": true
}
Our tomcat config is:
tomcat.generaloptions.JDBC=-Doracle.jdbc.ReadTimeout=60000
tomcat.generaloptions.TIMEOUT=-Dsun.net.client.defaultConnectTimeout\=60000 -Dsun.net.client.defaultReadTimeout\=60000
tomcat.ajp.acceptCount=100
tomcat.ajp.maxThreads=400
tomcat.maxthreads=400
tomcat.minsparethreads=50
tomcat.maxidletime=10000
tomcat.connectiontimeout=120000
tomcat.acceptcount=100
We tried to remove the oracle.jdbc.ReadTimeout but the result was that we started to see Closed Connections on the other servers.
The code that trigger this error is pretty simple (and it works in the 95% of time):
#Override
public boolean requestForcedOrderCancellation(final OrderModel order) {
Transaction.current().begin();
try {
modelService.lock(order.getPk());
modelService.refresh(order);
order.setForcedCancelled(true);
modelService.save(order);
Transaction.current().commit();
return true;
catch (Exception e) {
LOG.error(e.getMessage(), e);
Transaction.current().rollback();
return false;
}
}
We tried also without explicit locking and the problem is exactly the same.
It seems like the connection is already closed and we cannot rollback (or commit) the transactions that are still waiting in DB.
I expect to avoid this lock and these closed connection errors.
Your connection pool is probably fixing this already for you. Try in increase the logging to see whether it does.
Background: Databases hate long living connections because it can starve them. So they tend to close the connection after some time. Another culprit are firewalls which tend to delete idle connections from their tables. Connection pools know how to handle this by testing the connections (all those test* options in your config above).
Sometimes, you need to tell your pool how to test a connection. Check the documentation. For Oracle, a good test is select 1 from dual.
I think your real problem are those stuck sessions. Find out what they are waiting for by looking at a Java thread dump which you can create using the tool jstack which comes with the Java SDK.
We found that issue was due to uncatched exception/error in transactional code.
Server answer with error and Hybris did not rollback the transaction that is still open.
The same thread is reused sometime later (maybe some days) and old transaction is still open.
When this corrupted thread is used for locking some rows in database, even if we commit the transaction in the code, the same is not committed to database because internally Hybris has a transaction counter to handle inner transactions (maybe used in called methods). Transaction is commited/rollback to DB only when we use commit/rollback method and transaction counter is 1.
Request1:
Transaction.begin() // Hybris Counter = 1
doSomething() // This throws Exception, Application Exit, Hybris Counter is still 1
try {
Transaction.commit()
} catch (Exception e) {
Transaction.rollback();
}
Request2 on same thread:
Transaction.begin() // Hybris Counter now is 2
doSomething() // Works OK, Hybris Counter is still 2
try {
Transaction.commit() // HybrisCounter -= 1
// Transaction is not commited to DB because Hybris counter now is 1
} catch (Exception e) {
Transaction.rollback();
}
Request3 on same thread:
Transaction.begin() // Hybris Counter now is 2
lockRow()
// Row is locked for the whole transaction (the same opened in R1)
// Everything is OK
try {
Transaction.commit() // HybrisCounter -= 1
// Transaction is not commited to DB because Hybris counter now is 1
// Row is still locked
// Next requests to the same row will wait lock forever
} catch (Exception e) {
Transaction.rollback();
}
Related
I have a Rest API implemented with Spring Boot 2. To check some client behavior on timeout, how can I simulate that condition in my testing environment? The server should regularly receive the request and process it (in fact, in production timeouts happen due to random network slowdowns and large big response payloads).
Would adding a long sleep be the proper simulation technique? Is there any better method to really have the server "drop" the response?
Needing sleeps to test your code is considered bad practice. Instead you want to replicate the exception you receive from the timeout, e.g. java.net.SocketTimeoutException when using RestTemplate.exchange.
Then you can write a test as such:
public class FooTest
#Mock
RestTemplate restTemplate;
#Before
public void setup(){
when(restTemplate.exchange(...)).thenThrow(new java.net.SocketTimeoutException())
}
#Test
public void test(){
// TODO
}
}
That way you wont be twiddling your thumbs waiting for something to happen.
Sleep is one way to do it, but if you're writing dozens of tests like that then having to wait for a sleep will create a really long-running test suite.
The alternative would be to change the 'threshold' for timeout on the client side for testing. If in production your client is supposed to wait 5 seconds for a response, then in test change it to 0.5 seconds (assuming your server takes longer than that to respond) but keeping the same error handling logic.
The latter might not work in all scenarios, but it will definitely save you from having a test suite that takes 10+ mins to run.
You can do one thing which I did in my case .
Actually in my case, when my application is running in a production environment, we keep on polling trades from the API and sometimes it drops the connection by throwing an Exception SSLProtocolException.
What we did
int retryCount =5;
while (true ){
count++;
try{
//send an api request here
}catch (Exception e){
if(retryCount == count ) {
throw e
// here we can do a thread sleep. and try after that period to reconnect
}
}
}
Similarly in your case some Exception it will throw catch that Exception and put your thread in Sleep for a while and after that again try for Connection the retryCount you can modify as per your requirment, in my case it was 5.
I have a Spring Boot (v2.0.8) application which makes use of a HikariCP (v2.7.9) Pool (connecting to MariaDB) configured with:
minimumIdle: 1
maximumPoolSize: 10
leakDetectionThreshold: 30000
The issue is that our production component, once every few weeks, is repeatedly throwing SQLTransientConnectionException " Connection is not available, request timed out after 30000ms...". The issue is that it never recovers from this and consistently throws the exception. A restart of the componnent is therefore required.
From looking at the HikariPool source code, it would seem that this is happening because every time it is calling connectionBag.borrow(timeout, MILLISECONDS) the poolEntry is null and hence throws the timeout Exception. For it to be null, the connection pool must have no free entries i.e. all PoolEntry in the sharedList are marked IN_USE.
I am not sure why the component would not recover from this since eventually I would expect a PoolEntry to be marked NOT_IN_USE and this would break the repeated Exceptions.
Possible scenarios I can think of:
All entries are IN_USE and the DB goes down temporarily. I would expect Exceptions to be thrown for the in-flight queries. Perhaps at this point the PoolEntry status is never reset and therefore is stuck at IN_USE. In this case I would have thought if an Exception is thrown the status is changed so that the connection can cleared from the pool. Can anyone confirm if this is the case?
A flood of REST requests are made to the component which in turn require DB queries to be executed. This fills the connection pool and therefore subsequent requests timeout waiting for previous requests to complete. This makes sense however I would expect the component to recover once the requests complete, which it is not.
Does anyone have an idea of what might be the issue here? I have tried configuring the various timeouts that are in the Hikari documentation but have had no luck diagnosing / resolving this issue. Any help would be appreciated.
Thanks!
Scenario 2 is most likely what is happening. I ran into the same issue when using it with cloud dataflow and receiving a large amount of connection requests. The only solution I found was to play with the config to find a combination that worked for my use case.
I'll leave you my code that works for 50-100 requests per second and wish you luck.
private static DataSource pool;
final HikariConfig config = new HikariConfig();
config.setMinimumIdle(5);
config.setMaximumPoolSize(50);
config.setConnectionTimeout(10000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
config.setJdbcUrl(JDBC_URL);
config.setUsername(JDBC_USER);
config.setPassword(JDBC_PASS);
pool = new HikariDataSource(config);
Problem Statement
We have been using H2 in embedded mode for a while now. It has a connection pool configured above it. Following is the current pool configuration:
h2.datasource.min-idle=10
h2.datasource.initial-size=10
h2.datasource.max-active=200
h2.datasource.max-age=600000
h2.datasource.max-wait=3000
h2.datasource.min-evictable-idle-time-millis=60000
h2.datasource.remove-abandoned=true
h2.datasource.remove-abandoned-timeout=60
h2.datasource.log-abandoned=true
h2.datasource.abandonWhenPercentageFull=100
H2 config:
spring.h2.console.enabled=true
spring.h2.console.path=/h2
h2.datasource.url=jdbc:h2:file:~/h2/cartdb
h2.server.properties=webAllowOthers
spring.h2.console.settings.web-allow-others=true
h2.datasource.driver-class-name=org.h2.Driver
*skipping username and password properties.
We have verified that the above configuration takes effect by logging the pool properties.
The issue with this setup is that we are observing regular(though intermittent) connection pool exhaustion and once the pool hits the max limit it starts throwing the following exception for some queries.
SqlExceptionHelper.logExceptions(SqlExceptionHelper.java:129) - [http-apr-8080-exec-38] Timeout: Pool empty. Unable to fetch a connection in 3 seconds, none available[size:200; busy:200; idle:0; lastwait:3000].
And thereafter it fails to recover from this state even after many hours until we restart the web server(tomcat in this case).
H2 driver dependency:
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.4.196</version>
<scope>runtime</scope>
</dependency>
Query Pattern & Throughput
We use h2 to load up some data for every request, then execute a few(about 50) SELECT queries and finally delete the data. This results into consistent 30k-40k calls per minute(except off hours) on h2(according to new relic monitoring).
Every read operation acquires a new connection and releases the same after execution.
EntityManager entityManager = null;
try {
entityManager = entityManagerFactory.createEntityManager();
Query query = entityManager.createNativeQuery(sqlQuery);
query.setParameter("cartId", cartId);
List<String> resultList = query.getResultList();
return resultList;
} finally {
if(null != entityManager) { entityManager.close(); }
}
Observations
After application restart the pool utilization is minimal until at one moment when the pool utilization abruptly shoots up and eventually reaches max limit. This happens over the course of 1-2 days.
Once the pool hits the maximum connection limit, the borrowed connection count increases at a faster pace as compared the the returned connection count which otherwise remains very close to one another.
At the same time the abandoned connection count also starts increasing along with the abandon logs.
Interestingly the query response times remains the same after pool exhaustion. So this kind of rules out slow query.
This issue has happened at even the oddest of the hours when the traffic is minimum. So it has no relation to the traffic.
Please guide us in the right direction to solve this issue.
UPDATE
Recently we discovered the following causes in our stack trace when one such incident occured:
Caused by: org.h2.jdbc.JdbcSQLException: Database may be already in
use: null. Possible solutions: close all other connection(s); use the
server mode [90020-196]
Caused by: java.lang.IllegalStateException:The file is locked:
nio:/root/h2/cartdb.mv.db [1.4.196/7]
Caused by: java.nio.channels.OverlappingFileLockException
So after digging into this we have decided to move to in-memory mode as we don't require to persist the data beyond the application's life time. As a result, the file lock should not occur thereby reducing or eradicating this issue.
Will come back and update in either case.
Since the last update on the question:
After observing the performance for quite some time we have come to the conclusion that using H2 in file-mode(embedded) was somehow leading to file lock exceptions periodically(though irregular).
Since our application does not require to persist the data beyond the application's lifetime, we decided to move to pure in-memory mode.
Though the file lock exception's mystery still needs to disclosed.
I am currently working on a Java Swing application in NetBeans with Hibernate guided with this wonderful repo from GitHub.
From the example code found here, it basically urges new programmers to open and close SessionFactory connection every time certain queries have been executed:
try {
HibernateSessionFactory.Builder.configureFromDefaultHibernateCfgXml()
.createSessionFactory();
new MySqlExample().doSomeDatabaseStuff();
} catch (Throwable th) {
th.printStackTrace();
} finally {
HibernateSessionFactory.closeSessionFactory();
}
private void doSomeDatabaseStuff() {
deleteAllUsers();
insertUsers();
countUsers();
User user = findUser(USER_LOGIN_A);
LOG.info("User A: " + user);
}
Is this a good programming exercise? Isn't it more efficient to open the SessionFactory on app startup and close it on WindowClosing event? What are the drawbacks of each method?
Thanks.
Using a persistent connection means you are going to have as many opened connections on your database as opened clients, plus you'll have to make sure it stays open (very often it will be closed if it stays idle for a long time).
On the other hand, executing a query will be significantly faster if the connection is already opened.
So it really depends on often your clients will use the database. If they use it very rarely, a persistent connection is useless.
I am wondering if there is a possibility of hibernate delaying its writes to the DB. I have hibernate configured for mysql. Scenarios I hope to support are 80% reads and 20% writes. So I do not want to optimize my writes, I rather have the client wait until the DB has been written to, than to return a bit earlier. My tests currently have 100 client in parallel, the cpu does sometimes max out. I need this flush method to write to DB immediately and return only when the data is written.
On my client side, I send a write request and then a read request, but the read request sometimes returns null. I suspect hibernate is not writing to db immediately.
public final ThreadLocal session = new ThreadLocal();
public Session currentSession() {
Session s = (Session) session.get();
// Open a new Session, if this thread has none yet
if (s == null || !s.isOpen()) {
s = sessionFactory.openSession();
// Store it in the ThreadLocal variable
session.set(s);
}
return s;
}
public synchronized void flush(Object dataStore) throws DidNotSaveRequestSomeRandomError {
Transaction txD;
Session session;
session = currentSession();
txD = session.beginTransaction();
session.save(dataStore);
try {
txD.commit();
} catch (ConstraintViolationException e) {
e.printStackTrace();
throw new DidNotSaveRequestSomeRandomError(dataStore, feedbackManager);
} catch (TransactionException e) {
log.debug("txD state isActive" + txD.isActive() + " txD is participating" + txD.isParticipating());
log.debug(e);
} finally {
// session.flush();
txD = null;
session.close();
}
// mySession.clear();
}
#Siddharth Hibernate does not really delay in writing the response , and your code also does not speaks the same. I have also faced similar issue earlier and doubt you might be facing the same that is , when there a numerous request for write into hibernate are there many threads share same instance of your db and even having consecutive commits by hibernate you really dont see any changes .
You may also catch this by simple looking at you MySQL logs during the transaction and see what exactly went wrong !
Thanks for your hint. Took me some time to debug. Mysql logs are amazing.
This is what I run to check the time stamp on my inserts and mysql writes. mysql logs all db operations in a binlog. To read it we need to use the tool called mysqlbinlog. my.cnf too needs to be http://publib.boulder.ibm.com/infocenter/tpfhelp/current/index.jsp?topic=%2Fcom.ibm.ztpf-ztpfdf.doc_put.cur%2Fgtpm7%2Fm7enablelogs.html
I check which is the lastest mysql bin log file, and run this to grep for 1 line above the log, to get the time stamp. Then in java, I call Calendar.getInstance().getTimeInMilli() to compare with the time stamp.
sudo mysqlbinlog mysql/mysql-bin.000004 | grep "mystring" -1
So I debugged my problem. It was a delayed write problem. So I implemented a sync write too instead of all async. In other words the server call wont return until db is flushed for this object.