Async auditing with JaVers

Async auditing with JaVers - java

I need to audit changes to some entities in our application and am thinking of using JaVers. I like the support for interrogating the audit data provided by JaVers. Hibernate Envers looks good, but it stores data in the same DB.
Here are my requirements:
async logging - for minimal performance impact
store audit data in a different db - performance reasons as well
As far as I can see JaVers is not designed for the above, but seems possible to adapt to achieve the above. Here's how:
JaVers actually allows data to be stored in a different DB. You can provide a connection to any DB really. It's not how it's intended, but it works. Code below (note connectionProvider which can provide a connection to any DB):
'
final Connection dbConnection =
DriverManager.getConnection("jdbc:mysql://localhost:3306/javers", "root", "root");
ConnectionProvider connectionProvider = new ConnectionProvider() {
#Override
public Connection getConnection() {
//suitable only for testing!
return dbConnection;
}
};
JaversSqlRepository sqlRepository = SqlRepositoryBuilder
.sqlRepository()
.withConnectionProvider(connectionProvider)
.withDialect(DialectName.MYSQL).build();
The async can be achieved by moving the execution of the JaVers commit into a thread/executor. The challenge with that is that if the execution takes too long, it could be that the object changes before it's logged. There are 2 solutions I can think of here:
we could create a snapshot of the object (e.g. serialize it to JSON or the like) and pass that to a Thread to log it.
we provide our custom implementation of Javers Repository which processes the differences in the current thread, and then passes the Snapshot objects to be persisted in another thread. This way we'd only do reading from DB in the application thread, and do writing (which is generally more costly performance wise) in the Auditing thread.
QUESTIONS:
am I missing anything here? Could this work?
Does JaVers have support to create a snapshot of the object which then can be moved to another thread. It does it internally somewhere, so maybe it's something we could use.
JUST FYI: Not relevant for the question, but here are some other challenges I can think of and how I'm planning to solve them:
due to not doing audits in the same transaction, as if the transaction fails, it'd make audit rollback complex. So we need to audit only objects that were successfully committed. I intend to do that by using a Hibernate Interceptor, listening to the afterTransactionCompletion and only committing objects updated by that transaction.
In case of lazy loaded objects, I could see how, if we're trying to access them once the transaction is finished, it might be that the lazy loaded props can't be accessed (as the session might be closed too) - don't know how to fix this, but it might not be an issue as I think we're loading eager most props.

Interesting question.
First the démenti. All JaVers core modules are designed to decouple audit data from application data. As you mentioned, user provides a ConnectionProvider to be used by JaVers. It could be any database you want.
What are not designed to use with multiple DB are Spring integration modules for SQL, so javers-spring-jpa and javers-spring-boot-starter-sql. They just cover most common scenario so the same DB for application and JaVers.
You are right about lack of async commit. Fortunately, it can be implemented only in JaversCore without changing the Repositories.
The API could be:
CompletableFuture<Commit> javers.commitAsync(..., Executor);
First, Javers will take a snapshot of user's objects, it's fast so it can be done in the current thread.
Then, DB reads (loading latest snapshots) and DB writes (inserting new snapshots) can be done asynchronously (submitted to the given Executor).
As you mentioned, it requires the new approach to DB transactions. We plan to implement the Commit Withdrawal feature, so the app would be able to withdraw JaVers' commit after main DB rollback. See https://github.com/javers/javers/issues/588

Related

Cache options so #Postload annotated method dont get called everytime Hibernate is asked to retrieve object

I have a class annotated with JPA #Entity annotation, so objects of this class persist in database and are managed using Hibernate ORM. In my class constructor, a connection to a MQTT broker is created, so each object during initialization stablish a TCP connection.
When a object data is fetch from the database, this constructor can not be used by the ORM, as ORM uses default constructor without arguments, so I put the code that stablish the connection in a #PostLoad annotated method.
The problem is, everytime the web application page is refreshed the ORM is asked to get the object, and the #Postload method is executed so the TCP connection is stablished again... but I want the connection to be stablished only the first time object is fetch from database, and no everytime page is refreshed.
So the solution would be a ORM with in memory object cache. This way the first time object is loaded from database the #Postload method is called, but next times ORM is asked to retrieve the object it is retrieved from cache.
I dont know if this is possible with Hibernate, I have been playing with cache options and #Cacheable annotation but it seems that #Postload method is called everytime I use the findById method of Repository class, no matter the cache options I set. So I guess Hibernate cache is caching table rows, no objects in memory.

You could use an entity manager with an extended persistence context which spans multiple transactions, but I have no idea how or if Spring Data supports this. This way, the entity would not be reloaded from the database but be part of this extended persistence context. Note though, that this comes with other issues.
Usually, such expensive operations are simply not done in entities. You could move the logic out of the class, or do some kind of connection pooling/caching to avoid reconnects. I don't know why you need a dedicated connection, but connecting to message brokers is usually done differently. Usually, such connections are pooled by some context object of the library or maybe Spring offers some integration with options for pooling. In Java/Jakarta EE, this is usually done through resource adapters. I bet there is a JMS implementation for MQTT that you could use, probably also with Spring. AFAIK, in Spring Data JPA, you usually fire a domain/application event and react to that somewhere else. In that listener you could publish a message to a topic/queue through JMS or the native MQTT library.

How to coordinate J2EE and Java EE database access?

We have a somewhat huge application which started a decade ago and is still under active development. So some parts are still in J2EE 1.4 architecture, others using Java EE 5/6.
While testing some new code, I realized that I had data inconsistency between information coming in through old and new code parts, where the old one uses the Hibernate session directly and the new one an injected EntityManager. This led to the problem, that one part couldn't see new data from the other part and thus also created a database record, resulting in primary key constraint violation.
It is planned to migrate the old code completely to get rid of J2EE, but in the meantime - what can I do to coordinate database access between the two parts? And shouldn't at some point within the application server both ways come together in the Hibernate layer, regardless if accessed via JPA or directly?

You can mix both Hibernate Session and Entity Manager in the same application without any problem. The EntityManagerImpl simply delegates calls the a private SessionImpl instance.
What you describe is a Transaction configuration anomaly. Every database transaction runs in isolation (unless you use REAN_UNCOMMITED which I guess it's not the case), but once you commit it the changes are available from any other transaction or connection. So once a transaction is committed you should see al changes in any other Hibernate Session, JDBC connection or even your database UI manager tool.
You said that there was a primary key conflict. This can't happen if you use Hibernate identity or sequence generator. For the old hi-lo generator you can have problems if an external connection tries to insert records in the same table Hibernate uses an old hi/lo identifier generator.
This problem can also occur if there is a master/master replication anomaly. If you have multiple nodes and there is no strict consistency replication you can end up with primar key constraint violations.
Update
Solution 1:
When coordinating the new and the old code trying to insert the same entity, you could have a slect-than-insert logic running in a SERIALIZABLE transaction. The SERIALIZABLE transaction acquires the appropriate locks on tour behalf and so you can still have a default READ_COMMITTED isolation level, while only the problematic Service methods are marked as SERIALIZABLE.
So both the old code and the new code have this logic running a select for checking if there is already a row satisfying the select constraint, only to insert it if nothing is found. The SERIALIZABLE isolation level prevents phantom reads so I think it should prevent constraint violations.
Solution 2:
If you are open to delegate this task to JDBC, you might also investigate the MERGE SQL statement, if your current database supports it. Basically, this is an upsert operation issuing an update or an insert behind the scenes. This command is much more attractive since you can still run it with even on READ_COMMITTED. The only drawback is that you can't use Hibernate for it, and only some databases support it.

If you instanciate separately a SessionFactory for the old code and an EntityManagerFactory for new code, that can lead to different value in first level cache. If during a single Http request, you change a value in old code, but do not immediately commit, the value will be changed in session cache, but it will not be available for new code until it is commited. Independentely of any transaction or database locking that would protect persistent values, that mix of two different Hibernate session can give weird things for in memory values.
I admit that the injected EntityManager still uses Hibernate. IMHO the most robust solution is to get the EntityManagerFactory for the PersistenceUnit and cast it to an Hibernate EntityManagerFactoryImpl. Then you can directly access the the underlying SessionFactory :
SessionFactory sessionFactory = entityManagerFactory.getSessionFactory();
You can then safely use this SessionFactory in your old code, because now it is unique in your application and shared between old and new code.
You still have to deal with the problem of session creation-close and transaction management. I suppose it is allready implemented in old code. Without knowing more, I think that you should port it to JPA, because I am pretty sure that if an EntityManager exists, sessionFactory.getCurrentSession() will give its underlying Session but I cannot affirm anything for the opposite.

I've run into a similar problem when I had a list of enumerated lookup values, where two pieces of code would check for the existence of a given value in the list, and if it didn't exist the code would create a new entry in the database. When both of them came across the same non-existent value, they'd both try to create a new one and one would have its transaction rolled back (throwing away a bunch of other work we'd done in the transaction).
Our solution was to create those lookup values in a separate transaction that committed immediately; if that transaction succeeded, then we knew we could use that object, and if it failed, then we knew we simply needed to perform a get to retrieve the one saved by another process. Once we had a lookup object that we knew was safe to use in our session, we could happily do the rest of the DB modifications without risking the transaction being rolled back.
It's hard to know from your description whether your data model would lend itself to a similar approach, where you'd at least commit the initial version of the entity right away, and then once you're sure you're working with a persistent object you could do the rest of the DB modifications that you knew you needed to do. But if you can find a way to make that work, it would avoid the need to share the Session between the different pieces of code (and would work even if the old and new code were running in separate JVMs).

Is explicit transaction creation required for database reads when using JpaTemplate?

In our application we mainly use Spring #Transactional annotations together with JpaTemplate and Hibernate for database interaction. One part of the project, however, communicates with a different database, and hence we cannot use #Transactional here as a different transactionManager is required.
Here, for database updates, we explicitly create transactions in code, using PlatformTransactionManager, and either commit them or roll back after the call to JpaTemplate.execute(JpaCallback).
I have noticed a few places in this area of our code where we are not doing this for reads, however, and simply call JpaTemplate.execute(JpaCallback) without wrapping in a transaction. I am wondering what the dangers are of this. I appreciate that a transaction must be being created, as no database query can be run without a transaction, but since nothing in our code attempts to commit, will something potentially be holding on to resources?

EntityManager lifecycle and persistent client-server-communication

we are developing an (JavaSE-) application which communicates to many clients via persistent tcp-connections. The client connects, performs some/many operations (which are updated to a SQL-Database) and closes the application / disconnects from server. We're using Hibernate-JPA and manage the EntityManager-lifecycle on our own, using a ThreadLocal-variable. Actually we create a new EntityManager-instance on every client-request which works fine so far. Recently we profiled a bit and we found out that hibernate performs a SELECT-query to the DB before every UPDATE-statement. That is because our entities are in detached-state and every new EntityManager attaches the entity to the persistence context first. This leads to a massive SQL-overhead when the server is under load (because we have an write-heavy application)and we try to eliminate that leak.
First, we thought about 2nd-Level-Cache. However, we discovered that hibernate invalidates it's Query- and Collection-Caches whenever a new item is added or removed.
On second thought, we evaluate whether to keep an EntityManager up as long as the client is logged in on the server. But I wonder if this is a "best practice", because there are some drawbacks: thread-safety, managing-overhead of the EntityManager-instances, etc.
In short: we are looking for a way to get rid of those SELECT-statements before every UPDATE. Any ideas out there?

One possible way to get rid of select statements when reattaching detached entities is to use Hibernate-specific update() operation instead of merge().
update() unconditionally runs an update SQL statement and makes detached object persistent. If persistent object with the same identifier already exists in a session, it throws an exception. Thus, it's a good choice when you are sure that:
Detached object contains modified state that should be saved in the database
Saving that state is the main goal of opening a session for that request (i.e. there were no other operations that loaded entity with the same id in that session)
In JPA 2.0 you can access Hibernate-specific operations as follows:
em.unwrap(Session.class).update(o);
See also:
11.6. Modifying detached objects

One possible option would be to use StatelessSession for the update statements. I've successfully used it in my 'write-heavy' application.

A typical lifecycle of a Hibernate object in a web app -?

Describe please a typical lifecycle of a Hibernate object (that maps to a db table) in a web app.
Suppose, you create a new instance of an object and persist in the db.
But during the app lifetime you'll be working on a detached object and finally
you need to update it in the database, for example on exit.
How does it look like with hibernate and spring?
p.s. Can transactions and sessions live between servlet transitions? So that we opened 1 session and use it in all servlets without a need to reopen it?
I'll try to give a descriptive example.
Suppose, when the app starts, the log record is created. this can be done at once,
Log log = new Log(...) and then something like save(log) -- log corresponds to a table LOG.
then, as the application processes user inputs and keeps going, new data is being accumulated.
and after the second step we could add something to a log object, a collection for example:
// now we have a tracking of what user chosen: Set thisUserChoice,
// so we can update the persistent object, we have new data now !
// log.userChoices = thisUserChoice.
Here occurs the nature of my question. How are we supposed to deal with it, if we want to
update the database whenever new data is gotten from a user?
In a relational model we can work with a row id, so we could get this record and update some other data of the row.
In Hibernate we are also able to load a object by its id.
But is IT THE WAY TO GO? IS ANYTHING BETTER?

You could do everything in a single session. But that's like doing everything in a single class. It could make sense from a beginner's point of view, but nobody does it like that in practice.
In a web app, you can normally expect to have several threads running at once, each dealing with a different user. Each thread would typically have a separate session, and the session would only have managed instances of the objects that were actually needed by that user. It's not that you can completely ignore concurrency in your own code, but it's useful to have hibernate's help. If you were to do everything with one session, you would have to do all the concurrency management yourself.
Hibernate can also manage the concurrency if you have multiple application servers talking to a single database. The separate JVMs can't possibly share the same session in this case...
The lifecycle is described in the hibernate documentation (which I'm sure you've seen).
Whenever a request comes from the web client to the server, the first thing you should do is load the relevant objects (see section 10.3) so that you have persistent, not detached entities to deal with. Then, you do whatever operations are required. When the session closes (ie. when the server returns the response to the client), it will write any updates to the database. Or, if your operation involves creating new entities, you'll have to create transient ones (with new) and then call persist() or save() (see section 10.2). That will result in a managed entity -- you can make more changes to it, and hibernate will record those changes when the session closes.
I try to avoid using detached objects. But if I have to (perhaps they're stored in the user's session), then whenever they might need to be saved to the database, you'll have to use update() (see section 10.6). This converts it into a managed object, and so the session will save any changes to the database when it's closed.
Spring makes it very easy to generate a new session for each request. You would normally tell Spring to create a sessionFactory, and then every request will be given its own session. Search for "spring hibernate tutorial" and you'll find several examples.

http://scbcd.blogspot.com/2007/01/hibernate-persistence-lifecycle.html This explains transient, persistent objects.
Also have a look at the Lifecycle interface to know what hibernate does (and it provides hooks at all stages for user to do something)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.