Replicate modified data on different database - java

I would like to ask for an starting point of what technology or framework to research.
What I need to accomplish is the following:
We have a Java EE 6 application using JPA for persistance; we would like to use a primary database as some sort of scratchpad, where users can insert/delete records according to the tasks they are are given. Then, at the end of the day an administrator will do some kind of check on their work approving or disapproving it. If he approves the work, all changes will be done permanent and the primary database will be synced - replicated to another one (for security reasons). Otherwise, if administrator do not approve changes they will be rolled back.
Now here I got two problems to figure out:
First.- Is it possible to rollback a bunch of JPA operations done through a certain amount of time?
Second.- Trigger the replication (This can be done by RDBMS engines) process by code.
Now, if RDBMS replication is not possible (maybe because of client requirement) we would need a sync framework for JPA as a backup. I was looking at some JMS solutions, however not clear about the exact process or how to make them work on JPA.
Any help would be greatly appreciated,
Thanks.

I think, your design steps are having too much risk on loosing data. What I understand that you are talking about holding data in memory until admin approves/reject it. You must think about a disaster scenario and saving your data in that case.
Rather this problem statement is more inclined towards a workflow design, where the
data is entered by one entity, it is persisted.
Other entity approve/> reject the data.
All the approved data is further replicated to next database.
All these three steps could be implemented in 3 modules, backed by a persistent storage/ JMS technology. Depending on how real time, each of these steps needs to be; you could think of an elegant design to accomplish this in a cost effective manner.

Add a "workflow state" column to your table. States: Wait for approval, approved, replicated
Persist your data normally using JPA (state: wait for approval)
Approver approves: Update using JPA, change to approved state
As for the replication
In the approve method you could replicate the data synchronously to the other database (using JPA)
You could copy as well the approved data to another table, and use some RDBMS functionality to have the RDBMS replicate the data of that table
You could as well send a JMS message. At the end of the day a job reads the queue and persists the data into the other database
Anyway I suggest using a normal RDBMS cluster with synchronous replication. In that scenario you don't have to develop a self-made replication scheme, and you always have a copy of your data. You always have the workflow state.

Related

How to detect database operation with JPA/Hibernate?

My scenario is the following: I have two applications, one allows the user to interact with the product catalogue (only GET API, totally passive) and the other one allows admins to create/modify/delete (crud API) products from catalogue.
In order to speed up the user application, I have been thinking about implement Spring Cache. The problem is that if an admin does any interaction with the database (Oracle19c), the app for the citizen does not detect anything.
How can I solve this problem?
In the past, I managed something like this with the Change Streams, by using Mongo or thanks to Spring Data Events, so that any operation to database could be perceived.
I would need to detect operation made on database to empty and reload my cache with always last updates and I do not if is possible.
Any advice?

XA transaction involving JMS, Database and Hazelcast distributed collection

One of our application is expected to have a significant load increase soon and I am in the process of evaluating Hazelcast distributed collections to help us eliminate some existing database bottlenecks.
Multiple instances of our application are running on a bunch of different hosts for horizontal scaling. Different modules of the application get deployed to multiple Webshere Application Servers to spread the load to multiple JVMs. A typical work flow consist in:
A message gets pushed to an MDB from an Webshere MQ queue
The MDB parses the message and saves it to the database
The MDB extracts from the message a special key identifying related messages, and inserts that key into a special locking table so once such a key is picked up by a node it will process all related messages on that node. Processing all related messages in sequence is crucial to our application.
This table is one of the things we want to replace with a hazelcast blocking queue.
The same MDB sends a notification in a Webshere MQ topic informing the other JVMs that work arrived for further processing. This topic we consider replacing with a Hazelcast topic but this is optional
All the above flow happens in the same XA transaction so once the other JVMs receive the notification it is certain that the locking table entry is there available for pick up.
The receiving JVMs once they gets the notification will jump on the locking table trying to lock a key and process all the messages belonging to that key. There is a constant flow of messages so there are always keys ready for pick up by all running JVMs.
We noticed part of our stress tests that because of the multiple threads trying to lock keys at the same time the database starts being under an increasing pressure affecting the overall performance of our application.
There are a few such semaphore tables controlling the in sequence processing and this is what we consider moving to an in memory data grid.
The above is pretty much our story. In theory it seems like a good idea and I hope to achieve a performance increase not necessarily because reducing network traffic as this will happen anyway but at least by spreading the pressure on more than one resource.
I tried to google about how to set up an XA transnational context in which JMS, DB and Hazelcast collections take part. Unfortunately Hazelcast documentation about XA is just a few rows of code and nothing more. I am sure I am not the only one facing this problem and I hope for some inputs here. No need for a working solution, just a link to a good example or some more how to tips documentation to get me moving would be enough.
Thanks u in advance
If you use JTXA and the Hazelcast Resource Adapter (github.com/hazelcast/hazelcast-ra), Hazelcast will be part of the overall JTXA transaction which can include any type of transaction.
I suggest you take a look at the XA test classes:
https://github.com/hazelcast/hazelcast/tree/master/hazelcast/src/test/java/com/hazelcast/xa
Also, there are a few code samples here:
https://github.com/hazelcast/hazelcast-code-samples/tree/master/transactions

ORM query? or updated DB table information should updated into hibernete web application?

The problem statement :
Example : I have table name called "STUDENT" and it has 10 rows and consider one of the rows has name as "Jack". So when my server started and running I make the DB database into cache memory so my application has the value of "jack" and I am using it all over my application.
Now external source changed my "STUDENT" table and changed name "Jack" into "Prabhu Jack". I want the updated information asap into my application with out reloading/refresh into my application.. I dont want to run some constant thread to monitor and update my application. All I want is part of hibernate or any feasible solution to achieve this?
..
What you describe is the classic case of whether to pull or push updates.
Pull
This approach relies on the application using some background thread or task system that periodically polls a resource and requests the desired information. Its the responsibility of the application to perform this task.
In order to use a pull mechanism in conjunction with a cache implementation with Hibernate, this would mean that you'd want your Hibernate query results to be stored in a L2 cache implementation, such as ehcache.
Your ehcache would specify the storage capacity and expiration details and you simply query for the student data at each point you require it. The L2 cache would be consulted first, which lives on the application server side, and would only consult the database if the L2 cache had expired.
The downside is you would need to specify a reasonable time-to-live setting for the L2 cache so that the cache got updated by a query within reason after the rows were updated. Depending on the frequency of change and usage, maybe a 5 minute window is sufficient.
Using the L2 cache prevents the need for a useless background poll thread and allows you to specify a reasonable poll time all within the Hibernate framework backed by a cache implementation.
Push
This approach relies on the point where a change occurs to be capable of notifying interested parties that something changed and allowing the interested party to perform some action.
In order to use a push mechanism, your application would need to expose a way to be told a change occurred and preferably what the change actually was. Then when your external source modifies the table in question, that operation would need to raise an event and notify interested parties.
One way to architect this would be to use a JMS broker and have the external source submit a JMS message to a queue and have your application subscribe to the JMS queue to read the message when its sent.
Another solution would be to couple the place where the external source manipulates the data tightly with your application such that the external source doesn't just manipulate the data in question, but also sends a JSON request to your application, allowing it to update its internal cache immediately.
Conclusion
Using a push situation could require the introduction of additional middleware components, should you want to efficiently decouple the external source side & your application. But it does come with the added benefit that the eventual consistency between the database and your application's cache should happen with relative real-time. This solution also has no additional needs for querying the database after startup for those rows.
Using a pull situation doesn't require anything more than what you're likely already using in your application, other than maybe using a supported L2 cache provider rather than some homegrown solution. However, the eventual consistency between the database and your application's cache is completely dependent on your TTL configuration for that entity's cache. But be aware that this solution will continue to query the database to refresh the cache once your TTL has expired.

Java web - sessions design for horizontal scalability

I'm definitely not an expert Java coder, I need to implement sessions in my java Servlet based web application, and as far as I know this is normally done through HttpSession. However this method stores the session data in the local filesystem and I don't want to have such constraints to horizontal scalability. Therefore I thought to save sessions in an external database to which the application communicates through a REST interface.
Basically in my application there are users performing some actions such as searches. Therefore what I'm going to persist in sessions is essentialy the login data, and the meta data associated to searches.
As the main data storage I'm planning to use a graph noSQL database, the question is: let's say I can eventually also use another database of another kind for sessions, which architecture fits better for this kind of situation?
I currently thought to two possible ways. the first one uses another db (such as an SQL db) to store sessions data. In this way I would have a more distributed workload since I'm not using the main storage also for sessions. Moreover I'd also have a more organized environment being session state variables and persisten ones not mixed up.
The second way instead consists in storing every information relative to any session into the "user node" of the main database. The sessionid will be at this point just a "shortcut" for an authentication. This way I dont have to rely on a second database, however I move all the workload to the main db mixing the session data with the persistent ones.
is there any standard general architecture to which I can ake reference? DO I miss some important point which should constraint my architecture?
Your idea to store sessions in a different location is good. How about using an in-memory cache like memcached or redis? Session data is generally not long-lived so you have other options other than a full-blown database. Memcached & Redis can both be clustered and can scale horizontally.

Implementing multi-tenancy for a mature enterprise application

I've been tasked with making an enterprise application multi-tenant. It has a Java/Glassfish BLL using SOAP web services and a PostgreSQL backend. Each tenant has its own database, so (in my case at least) "multi-tenant" means supporting multiple databases per application server.
The current single-tenant appserver initializes a C3P0 connection pool with a connection string that it gets from a config file. My thinking is that now there will need to be one connection pool per client/database serviced by the appserver.
Once a user is logged in, I can map it to the right connection pool by looking up its tenant. My main issue is how to get this far - when a user is first logged in, the backend's User table is queried and the corresponding User object is served up. It seems I will need to know which database to use with only a username to work with.
My only decent idea is that there will need to be a "config" database - a centralized database for managing tenant information such as connection strings. The BLL can query this database for enough information to initialize the necessary connection pools. But since I only have a username to work with, it seems I would need a centralized username lookup as well, in other words a UserName table with a foreign key to the Tenant table.
This is where my design plan starts to smell, giving me doubts. Now I would have user information in two separate databases, which would need to be maintained synchronously (user additions, updates, and deletions). Additionally, usernames would now have to be globally unique, whereas before they only needed to be unique per tenant.
I strongly suspect I'm reinventing the wheel, or that there is at least a better architecture possible. I have never done this kind of thing before, nor has anyone on my team, hence our ignorance. Unfortunately the application makes little use of existing technologies (the ORM was home-rolled for example), so our path may be a hard one.
I'm asking for the following:
Criticism of my existing design plan, and suggestions for improving or reworking the architecture.
Recommendations of existing technologies that provide a solution to this issue. I'm hoping for something that can be easily plugged in late in the game, though this may be unrealistic. I've read about jspirit, but have found little information on it - any feedback on it or other frameworks will be helpful.
UPDATE: The solution has been successfully implemented and deployed, and has passed initial testing. Thanks to #mikera for his helpful and reassuring answer!
Some quick thoughts:
You will definitely need some form of shared user management index (otherwise you can't associate a client login with the right target database instance). However I would suggest making this very lightweight, and only using it for initial login. Your User object can still be pulled from the client-specific database once you have determined which database this is.
You can make the primary key [clientID, username] so that usernames don't need to be unique across clients.
Apart from this thin user index layer, I would keep the majority of the user information where it is in the client-specific databases. Refactoring this right now will probably be too disruptive, you should get the basic multi-tenant capability working first.
You will need to keep the shared index in sync with the individual client databases. But I don't think that should be too difficult. You can also "test" the synchronisation and correct any errors with an batch job, which can be run overnight or by your DBA on demand if anything ever gets out of sync. I'd treat the client databases as the master, and use this to rebuild the shared user index on demand.
Over time you can refactor towards a fully shared user management layer (and even in the end fully shared client databases if you like. But save this for a future iteration.....

Categories