Background
We have 2 services written in Java - one handles database operations on different files (CRUD on database), the other handles long-running processing of those records (complicated background tasks). Simply we could say they are producer and consumer.
Supposed behavior is as follows:
Service 1 (uses the code bellow):
Store file into DB
If the file is of type 'C' put it into message queue for further processing
Service 2:
Receive the message from message queue
Load the file from the database (by ID)
Perform further processing
The code of Service 1 is as follows (I changed some names for corporate reasons)
private void persist() throws Exception {
try (SqlSession sqlSession = sessionFactory.openSession()) {
FileType fileType = FileType.fromFileName(filename);
FileEntity dto = new FileEntity(filename, currentTime(), null, user.getName(), count, data);
oracleFileStore.create(sqlSession, dto);
auditLog.logFileUploaded(user, filename, count);
sqlSession.commit();
if (fileType == FileType.C) {
mqClient.submit(new Record(dto.getId(), dto.getName(), user));
auditLog.logCFileDetected(user, filename);
}
}
}
Additional info
ActiveMQ 5.15 is used for message queue
Database is Oracle 12c
Database is handled by Mybatis 3.4.1
Problem
From time to time it happens, that Service 2 receives the message from MQ, tries to read the file from the database and surprisingly - file is not there. The incident is pretty rare but it happens. When we check the database, the file is there. It almost looks like the background processing of the file started before the file was put into database.
Questions
Is it possible that MQ call could be faster than the database commit? I created the file in DB, called commit and only after that I put the message into MQ. The MQ even contains the ID which is generated by database itself (sequence).
Does the connection needs to be closed to be sure the commit was performed? I always thought when I commit then it's in the database regardless if my transaction ended or not.
Can the problem be Mybatis? I've read some problems regarding Mybatis transactions/sessions but it doesn't seem similar to my problem
Update
I can provide some additional code although please understand that I cannot share everything for corporate reasons. If you don't see anything obvious in this, that's fine. Unfortunately I cannot continue in much more deeper analysis than this.
Also I basically wanted to confirm whether my understanding of SQL and Mybatis is correct and I can mark such response for correct as well.
SessionFactory.java (excerpt)
private SqlSessionFactory createLegacySessionFactory(DataSource dataSource) throws Exception
{
Configuration configuration = prepareConfiguration(dataSource);
return new SqlSessionFactoryBuilder().build(configuration);
}
//javax.sql.DataSource
private Configuration prepareConfiguration(DataSource dataSource)
{
//classes from package org.apache.ibatis
TransactionFactory transactionFactory = new JdbcTransactionFactory();
Environment environment = new Environment("development", transactionFactory, dataSource);
Configuration configuration = new Configuration(environment);
addSettings(configuration);
addTypeAliases(configuration);
addTypeHandlers(configuration);
configuration.addMapper(PermissionMapper.class);
addMapperXMLs(configuration); //just add all the XML mappers
return configuration;
}
public SqlSession openSession()
{
//Initialization of factory is above
return new ForceCommitSqlSession(factory.openSession());
}
ForceCommitSqlSession.java (excerpt)
/**
* ForceCommitSqlSession is wrapper around mybatis {#link SqlSession}.
* <p>
* Its purpose is to force commit/rollback during standard commit/rollback operations. The default implementation (according to javadoc)
* does
* not commit/rollback if there were no changes to the database - this can lead to problems, when operations are executed outside mybatis
* session (e.g. via {#link #getConnection()}).
*/
public class ForceCommitSqlSession implements SqlSession
{
private final SqlSession session;
/**
* Force the commit all the time (despite "generic contract")
*/
#Override
public void commit()
{
session.commit(true);
}
/**
* Force the roll back all the time (despite "generic contract")
*/
#Override
public void rollback()
{
session.rollback(true);
}
#Override
public int insert(String statement)
{
return session.insert(statement);
}
....
}
OracleFileStore.java (excerpt)
public int create(SqlSession session, FileEntity fileEntity) throws Exception
{
//the mybatis xml is simple insert SQL query
return session.insert(STATEMENT_CREATE, fileEntity);
}
Is it possible that MQ call could be faster than the database commit?
If database commit is done the changes are in the database. The creation of the task in the queue happens after that. The main thing here is that you need to check that commit does happen synchronously when you invoke commit on session. From the configuration you provided so far it seems ok, unless there's some mangling with the Connection itself. I can imagine that there is some wrapper over the native Connection for example. I would check in debugger that the commit call causes the call of the Connection.commit on the implementation from the oracle JDBC driver. It is even better to check the logs on the DB side.
Does the connection needs to be closed to be sure the commit was performed? I always thought when I commit then it's in the database regardless if my transaction ended or not.
You are correct. There is no need to close the connection that obeys JDBC specification (native JDCB connection does that). Of cause you can always create some wrapper that does not obey Connection API and does some magic (like delays commit until connection is closed).
Can the problem be Mybatis? I've read some problems regarding Mybatis transactions/sessions but it doesn't seem similar to my problem
I would say it is unlikely. You are using JdbcTransactionFactory which does commit to the database. You need to track what happens on commit to be sure.
Have you checked that the problem is not on the reader side? For example it may use long transaction with serialized isolation level, in this case it wouldn't be able to read changes in the database.
In postgres if the replication is used and replicas are used for read queries reader may see outdated data even if commit successfully completed on master. I'm not that familiar with oracle but it seems that if replication is used you may see the same issue:
A table snapshot is a transaction-consistent reflection of its master data as that data existed at a specific point in time. To keep a snapshot's data relatively current with the data of its master, Oracle must periodically refresh the snapshot
I would check the setup of the DB to know if this is the case. If replicatiin is usedyou need to change your approach to this.
Related
I have a Java Spring Boot application reading and writing data to a local Oracle 19c database.
I have the following CommandLineRunner:
#Override
public void run(String... args) {
final EntityManager em = entityManagerFactory.createEntityManager();
final EntityTransaction transaction = em.getTransaction();
transaction.begin();
em.persist(customer());
//COMMENT1 em.flush();
/*COMMENT2
Query q = em.createQuery("from " + Customer.class.getName() + " c");
#SuppressWarnings("unchecked")
final Iterator<Object> iterator = (Iterator<Object>) q.getResultList().iterator();
while (iterator.hasNext()) {
Object o = iterator.next();
final Customer c = (Customer) o;
log.info(c.getName());
}
*/
transaction.rollback();
}
When I run this code, using a packet sniffer to monitor TCP traffic between the application and database, I see what I expect: nothing particularly interesting in the conversation, as the em.persist(customer()) will not be flushed.
When I include the code in COMMENT1, then I'm surprised to find that the conversation looks the same - there is nothing interesting after the connection handshake.
When I include the code in COMMENT2, however, then I get a more complete TCP conversation. Now, the captured packets show that the write operation was indeed flushed to the database, and I can also see evidence of the read operation to list all entities following it.
Why is it that the TCP conversation does not reflect the explicit flush() when only COMMENT1 is removed? Why do I need to include COMMENT2 to see an insert into customer... statement captured in the TCP connection?
A call to flush() synchronizes your changes in the persistence context with the database but it may not commit the transaction immediately. Consider it as an optimization to avoid unnecessary DB writes on each flush.
When you un-comment the 2nd block then you see the flush getting executed for sure. This happens because the EM ensures your select query gets all the results in the latest state from DB. It, therefore, commits the flushed changes (alongwith any other changes done via other transactions, if any).
em.persist(customer());
persist does not directly insert the object into the database:
it just registers it as new in the persistence context (transaction).
em.flush();
It flushes the changes to the database but doesn't commit the transaction.
A query gets fired if there is a change expected in database(insert/update/delete)
em.rollback or em.commit will actually rollback or commit the transaction
And all these scenario depends on the flush mode, and I think behaviour is
vendor dependent. Assuming Hibernate, most probably FlushMode is set to auto in
your application and so the desired result as in second scenario.
AUTO Mode : The Session is sometimes flushed before query execution in order to ensure
that queries never return stale state
https://docs.jboss.org/hibernate/orm/3.5/javadocs/org/hibernate/FlushMode.html
In spring boot I think you can set it as
spring.jpa.properties.org.hibernate.flushMode=AUTO
I am trying to perform batch inserts with data that is currently being inserted to DB one statement per transaction. Transaction code statement looks similar to below. Currently, addHolding() method is being called for each quote that comes in from an external feed, and each of these quote updates happens about 150 times per second.
public class HoldingServiceImpl {
#Autowired
private HoldingDAO holdingDao;
#Transactional(propagation = Propagation.REQUIRES_NEW, rollbackFor = Exception.class)
public void addHolding(Quote quote) {
Holding holding = transformQuote(quote);
holdingDao.addHolding(holding);
}
}
And DAO is getting current session from Hibernate SessionFactory and calling save on object.
public class HoldingDAOImpl {
#Autowired
private SessionFactory sessionFactory;
public void addHolding(Holding holding) {
sessionFactory.getCurrentSession().save(holding);
}
}
I have looked at Hibernate batching documentation, but it is not clear from document how I would organize code for batch inserting in this case, since I don't have the full list of data at hand, but rather am waiting for it to stream.
Does merely setting Hibernate batching properties in properties file (e.g. hibernate.jdbc.batch_size=20) "magically" batch insert these? Or will I need to, say, capture each quote update in a synchronized list, and then insert list load and clear list when batch size limit reached?
Also, the whole purpose of implementing batching is to see if performance improves. If there is better way to handle inserts in this scenario, let me know.
Setting the property hibernate.jdbc.batch_size=20 is an indication for the hibernate to Flush the objects after 20. In your case hibernate automatically calls sessionfactory.flush() after 20 records saved.
When u call a sessionFactory.save(), the insert command is only fired to in-memory hibernate cache. Only once the Flush is called hibernate synchronizes these changes with the Database. Hence setting hibernate batch size is enough to do batch inserts. Fine tune the Batch size according to your needs.
Also make sure your transactions are handled properly. If you commit a transaction also forces hibernate to flush the session.
I have to write some methods to change values into database and make some operations on file system.
So I have to make this sequence of step:
Set the boolean Updating field to true into database. It is used to avoid access to file system and database information that are linked with this value (for example a fleet of cars)
Make some operation on the database. For example change the date, name, value or other fields. These changes affect more database tables.
Make change to file system and database
Set the boolean Updating to false
As you can imagine I have to manage errors and start rollback procedure to restore database and file system.
I have some doubt about how I can write my method. I have:
The entity
The repository interface that extends JpaRepositoryand has Query creation from method names and #Query annotated with #Transactional if them write into database (otherwise I recevied error)
The service interface
The service implementation that contains all the method to make simple changes to database. This class is annotated with #Transactional
From the other classes I call service methods to use database but if I call some of these methods I write each value into database so it isn't possible to throw rollback, or I wrong?
The step 1 has to be write immediatly into database instead the other changes should be use #Transactional properties, but just adding #Transactional to my method is enough? For file system rollback I create a backup of all subfolders and restore them in case of error.
For example:
#Transactional(rollbackFor=FileSystemException.class)
private void changeDisplacement(int idApplication, int idDisplacement){
applicationServices.setUpdating(true); //this has be to write immediatly into database so that the other methods can stop using this application
Application application = applicationServices.getId(idApplication);
application.setDisplacement(displacementServices.getId(idDisplacement));
//OTHER OPERATIONS ON DIFFERENT TABLES
//OPERATIONS ON FILE SYSTEM CATCHING ALL EXCEPTION WITH TRY-CATCH AND IN THE CATCH RESTORE FILESYSTEM AND THROW FileSystemException to start database rollback
//In the finally clause use applicationServices.setUpdating(false)
}
Can it work with this logic or the #Transactional field is wrong here?
Thanks
#Transactional is OK here. The only thing is you need to set propagation of applicationServices.setUpdating to REQUIRES_NEW so that it gets committed individually:
public class ApplicationServices {
#Transactional(propagation=Propagation.REQUIRES_NEW)
public void setUpdating(boolean b) {
// update DB here
}
}
In the case of the exceptions, it will still update the DB as long as you have the call to setUpdating in the finally block.
There are multiple questions here and some of them are hard to grasp, here is a bit of input. When you have this:
#Transactional(rollbackFor=FileSystemException.class)
private void changeDisplacement(int idApplication, int idDisplacement){
applicationServices.setUpdating(true);
That flag will hit the database only when the #Transactional finishes. The change stays in hibernate context, until the end of #Transactionl method.
So while you execute changeDisplacement and someone else comes and reads that flag - it will see false (because you have not written it to the DB just yet). You could get it via READ_UNCOMMITTED, but it's up to your application if you allow this.
You could have a method with REQUIRES_NEW and set that flag to true there and in case of revert update that flag back.
Generally updating both the DB and file system is not easy (keeping them in sync). The way I have done it before (might be better options) is register events (once a correct DB was made) and then write to the filesystem.
I have to implement a Neo4j Server Plugin that reacts to changes to the Database and get's information about those changes. I need to get all the Data that has been added, changed and deleted in a transaction. I use a TransactionEventHandler registed to the database. For performance reasons i have to use the afterCommit callback that is called after the changes to the database have been made. This way the transaction will not be held back by the plugin.
Now inside this callback i do something similiar to this:
public void afterCommit(TransactionData data, Void arg1) {
for(Node n:data.createdNodes()) {
String firstkey = n.getPropertyKeys().iterator().next();
}
}
But the getPropertyKeys throws an Exception because the transaction has already been commited. I don't understand why this is a problem, i don't want to change anything to the transaction, i just want properties the node has that has been changed. Is there some way to work around this? What is the reason for the Exception?
The Exception:
java.lang.IllegalStateException: This transaction has already been completed.
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.assertTransactionOpen(KernelTransactionImplementation.java:376)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.acquireStatement(KernelTransactionImplementation.java:261)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.acquireStatement(KernelTransactionImplementation.java:80)
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.instance(ThreadToStatementContextBridge.java:64)
at org.neo4j.kernel.InternalAbstractGraphDatabase$8.statement(InternalAbstractGraphDatabase.java:785)
at org.neo4j.kernel.impl.core.NodeProxy.getPropertyKeys(NodeProxy.java:358)
at de.example.neo4jVersionControl.ChangeEventListener.afterCommit(ChangeEventListener.java:41)
In afterCommit the transaction has already been committed (hence the name). To access properties from a node you need a transactional context - remember that every operations (even readonly) require this.
The recommended way for implementations of TransactionEventHandlers is to rely on TransactionData only. TransactionData.assignedNodeProperties() will return the properties of the newly created nodes as well.
maybe somebody can help me with a transactional issue in Spring (3.1)/ Postgresql (8.4.11)
My transactional service is as follows:
#Transactional(isolation = Isolation.SERIALIZABLE, readOnly = false)
#Override
public Foo insertObject(Bar bar) {
// these methods are just examples
int x = firstDao.getMaxNumberOfAllowedObjects(bar)
int y = secondDao.getNumerOfExistingObjects(bar)
// comparison
if (x - y > 0){
secondDao.insertNewObject(...)
}
....
}
The Spring configuration Webapp contains:
#Configuration
#EnableTransactionManagement
public class ....{
#Bean
public DataSource dataSource() {
org.apache.tomcat.jdbc.pool.DataSource ds = new DataSource();
....configuration details
return ds;
}
#Bean
public DataSourceTransactionManager txManager() {
return new DataSourceTransactionManager(dataSource());
}
}
Let us say a request "x" and a request "y" execute concurrently and arrive both at the comment "comparison" (method insertObject). Then both of them are allowed to insert a new object and their transactions are commited.
Why am I not having a RollbackException? As far as I know that is what the Serializable isolotation level is for. Coming back to the previous scenario, if x manages to insert a new object and commits its transaction, then "y"'s transaction should not be allowed to commit since there is a new object he did not read.
That is, if "y" could read again the value of secondDao.getNumerOfExistingObjects(bar) it would realize that there is a new object more. Phantom?
The transaction configuration seems to be working fine:
For each request I can see the same connection for firstDao and secondDao
A transaction is created everytime insertObject is invoked
Both first and second DAOs are as follows:
#Autowired
public void setDataSource(DataSource dataSource) {
this.jdbcTemplate = new JdbcTemplate(dataSource);
}
#Override
public Object daoMethod(Object param) {
//uses jdbcTemplate
}
I am sure I am missing something. Any idea?
Thanks for your time,
Javier
TL;DR: Detection of serializability conflicts improved dramatically in Pg 9.1, so upgrade.
It's tricky to figure out from your description what the actual SQL is and why you expect to get a rollback. It looks like you've seriously misunderstood serializable isolation, perhaps thinking it perfectly tests all predicates, which it doesn't, especially not in Pg 8.4.
SERIALIZABLE doesn't perfectly guarantee that the transactions execute as if they were run in series - as doing so would be prohibitively expensive from a performance point of view if it it were possible at all. It only provides limited checking. Exactly what is checked and how varies from database to database and version to version, so you need to read the docs for your version of your database.
Anomalies are possible, where two transactions executing in SERIALIZABLE mode produce a different result to if those transactions truly executed in series.
Read the documentation on transaction isolation in Pg to learn more. Note that SERIALIZABLE changed behaviour dramatically in Pg 9.1, so make sure to read the version of the manual appropriate for your Pg version. Here's the 8.4 version. In particular read 13.2.2.1. Serializable Isolation versus True Serializability. Now compare that to the greatly improved predicate locking based serialization support described in the Pg 9.1 docs.
It looks like you're trying to perform logic something like this pseudocode:
count = query("SELECT count(*) FROM the_table");
if (count < threshold):
query("INSERT INTO the_table (...) VALUES (...)");
If so, that's not going to work in Pg 8.4 when executed concurrently - it's pretty much the same as the anomaly example used in the documentation linked above. Amazingly it actually works on Pg 9.1; I didn't expect even 9.1's predicate locking to catch use of aggregates.
You write that:
Coming back to the previous scenario, if x manages to insert a new
object and commits its transaction, then "y"'s transaction should not
be allowed to commit since there is a new object he did not read.
but 8.4 won't detect that the two transactions are interdependent, something you can trivially prove by using two psql sessions to test it. It's only with the true-serializability stuff introduced in 9.1 that this will work - and frankly, I was surprised it works in 9.1.
If you want to do something like enforce a maximum row count in Pg 8.4, you need to LOCK the table to prevent concurrent INSERTs, doing the locking either manually or via a trigger function. Doing it in a trigger will inherently require a lock promotion and thus will frequently deadlock, but will successfully do the job. It's better done in the application where you can issue the LOCK TABLE my_table IN EXCLUSIVE MODE before obtaining even SELECTing from the table, so it already has the highest lock mode it will need on the table and thus shouldn't need deadlock-prone lock promotion. The EXCLUSIVE lock mode is appropriate because it permits SELECTs but nothing else.
Here's how to test it in two psql sessions:
SESSION 1 SESSION 2
create table ser_test( x text );
BEGIN TRANSACTION
ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM ser_test ;
SELECT count(*) FROM ser_test ;
INSERT INTO ser_test(x) VALUES ('bob');
INSERT INTO ser_test(x) VALUES ('bob');
COMMIT;
COMMIT;
When run on Pg 9.1, the st commits succeeds then the secondCOMMIT` fails with:
regress=# COMMIT;
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
but when run on 8.4 both commits commits succeed, because 8.4 didn't have all the predicate locking code for serializability added in 9.1.