Strategy for Offline/Online data synchronization

Strategy for Offline/Online data synchronization - java

My requirement is I have server J2EE web application and client J2EE web application. Sometimes client can go offline. When client comes online he should be able to synchronize changes to and fro. Also I should be able to control which rows/tables need to be synchronized based on some filters/rules. Is there any existing Java frameworks for doing it? If I need to implement on my own, what are the different strategies that you can suggest?
One solution in my mind is maintaining sql logs and executing same statements at other side during synchronization. Do you see any problems with this strategy?

There are a number of Java libraries for data synchronizing/replication. Two that I'm aware of are daffodil and SymmetricDS. In a previous life I foolishly implemented (in Java) my own data replication process. It seems like the sort of thing that should be fairly straightforward, but if the data can be updated in multiple places simultaneously, it's hellishly complicated. I strongly recommend you use one of the aforementioned projects to try and bypass dealing with this complexity yourself.

The biggist issue with synchronization is when the user edits something offline, and it is edited online at the same time. You need to merge the two changed pieces of data, or deal with the UI to allow the user to say which version is correct. If you eliminate the possibility of both being edited at the same time, then you don't have to solve this sticky problem.
The method is usually to add a field 'modified' to all tables, and compare the client's modified field for a given record in a given row, against the server's modified date. If they don't match, then you replace the server's data.
Be careful with autogenerated keys - you need to make sure your data integrity is maintained when you copy from the client to the server. Strictly running the SQL statements again on the server could put you in a situation where the autogenerated key has changed, and suddenly your foreign keys are pointing to different records than you intended.
Often when importing data from another source, you keep track of the primary key from the foreign source as well as your own personal primary key. This makes determining the changes and differences between the data sets easier for difficult synchronization situations.

Your synchronizer needs to identify when data can just be updated and when a human being needs to mediate a potential conflict. I have written a paper that explains how to do this using logging and algebraic laws.

What is best suited as the client-side data store in your application? You can choose from an embedded database like SQLite or a message queue or some object store or (if none of these can be used since it is a web application) files/ documents saved on the client using Web DB or IndexedDB through HTML 5's LocalStorage API.
Check the paper Gold Rush: Mobile Transaction Middleware with Java-Object Replication. Microsoft's documentation of occasionally connected systems describes two approaches: service-oriented or message-oriented and data-oriented. Gold Rush takes the earlier approach. The later approach uses database merge-replication.

Related

Oracle Incremental Checksum Crypto for Security

I have a unique problem to solve.
I have a legacy java application which connects to an Oracle RDBMS. There are all sorts of queries and DMLs scattered over in the application - Inserts, Update, Delete and of course selects. It uses JBC (Preparedstatement), though one recently added lodule uses JPA.
I have a requirement to add a protection layer / logic to the application / Database whereby if any user (could even be A DBA or an OS root user) tries to modify the data (updates, inserts or deletes) bypassing the app, we are able to identify the operation as part of an audit.
Audit trail seemed to be the go to thing here, except that we cannot even trust the OS root user and thus a guy having DBA and root access can easily modify the data and remove the trace of it in the audit trails.
I was thinking to implement a cyclic crypto kind of algorithm on the sensitive tables so that on every DML executed by the application, a crypto / hash is introduced and it is incremental so that any change is easily caught by doing an audit using the application.
In theory, it seems feasible except that it might get tricky because after every DML we would potentially need to recalculate the hash / checksum of a number of subsequent records and this might overburden the application / database.
Is this a feasible solution?

You are right that computing a hash of every updated row of data will impose a burden on the system. Are you going to also validate that hash before changes are submitted to the database to ensure nothing has been changed outside the application? That's even more overhead, and a lot more custom code for your application. It also wouldn't help you identify who modified the data, or when, only that it had been updated outside of the app. Using a database trigger wouldn't work, as they are easily disabled and aren't capable of modifying the same table that calls them (you'd need a separate hash table with an entry for every row of data in every table you wanted to monitor). Auditing is still your best way to go, as it wouldn't require any modification to your app or your data schemas.
You have a couple of options in regards to auditing, depending on the version of Oracle you're using. If you're using 12c or later, you can use Unified Auditing, which has its own set of permissions and roles to allow separation of duties (i.e. normal DBA from security admin). Even in older versions you can put an update/delete audit on the actual audit trail table, so that any attempt to modify the data will itself leave a fingerprint.
Lastly, you can use a tool like Splunk, Elastic Search, syslog, or Oracle's Database Audit Vault or some other file monitoring solution to centralize your audit records to another system as they are created by the database - making them inaccessible to the DBA or local sys admin. This will take some work by your DBA and/or sysadmin to configure in the first place, but can go a long way to securing your audit data.
All that said, sooner or later you're going to have to trust two people: the sys admin and the DBA. If you can't trust them then you are in deep, deep trouble.

Oracle 20c has blockchain tables. Version 20c is currently only available in Oracle's cloud, but it will probably be available on-premise in a few months.

is it possible save state between requests in GAE/java

I plan to implement a GAE app only for my own usage.
The application will get its data using URL Fetch service, updating it every x minutes (using Scheduled tasks). Then it will serve that information to me when I request it.
I have barely started to look into GAE, but I have a main question that I am not able to clear. Can state be maintained in GAE between different requests without using jdo/jpa and the datastore?
As I am the only user, I guess I could keep the info in a servlet subclass and so I can avoid having to deal with Datastore...but my concern is that, as this app will have very few request, if it is moved to disk or whatever (don't know yet if it has some specific name), it will loose its status?
I am not concerned about having to restart the whole app and start collecting data from scratch from time to time, that is ok.

If this is an app for your own use, and you're double-extra sure that you won't be making it multi-user, and you're not concerned about the possibility that you might be using it from two browsers at once, you can skip using sessions and use a known key for storing information in memcache.
If your reason for avoiding datastore is concern over performance, then I strong recommend testing that assumption. You may be pleasantly surprised.

You could use the http session to maintain state between requests, but that will use the datastore itself (although you won't have to write any code to get this behaviour).
You might also consider using the Cache API (like memcache). It's JSR 107 I think, which Google provide an implementation of. The Cache is shared between instances, but it can empty at anytime. But if you're happy with that behaviour this may be an option. Looking at your requirements this may be the most feasible option, if you don't want to write your own persistence code.
You could store data as a static against your Class or in an application scoped Object, but doing that means when your instance spins down or your instance switches to another instance, the data would be lost as your classes would need to be loaded into the new instance.
Or you could serialize the state to the client and send it back in with each request.
The most robust option is persistence to the datastore - the JPA code is trivial. Perhaps you should reconsider?

Sharing nHibernate and hibernate 2nd level cache

Is it possible to share the 2nd level cache between a hibernate and nhibernate solution? I have an environment where there are servers running .net and servers running java who both access the same database.
there is some overlap in the data they access, so sharing a 2nd level cache would be desirable. Is it possible?
If this is not possible, what are some of the solutions other have come up with?

There is some overlap in the data they access, so sharing a 2nd level cache would be desirable. Is it possible?
This would require (and this is very likely oversimplified):
Being able to access a cache from Java and .Net.
Having cache provider implementations for both (N)Hibernate.
Being able to read/write data in a format compatible with both languages (or there is no point at mutualizing the cache).
This sounds feasible but:
I'm not aware of an existing ready-to-use solution implementing this (my first idea was Memcache but AFAIK Memcache stores a serialized version of the data so this doesn't meet the requirement #3 which is the most important).
I wonder if using a language neutral format to store data would not generate too much overhead (and somehow defeat the purpose of using a cache).
If this is not possible, what are some of the solutions other have come up with?
I never had to do this but if we're talking about a read-write cache and if you use two separate caches, you'll have to invalidate a given Java cache region from the .Net side and inversely. You'll have to write the code to handle that.

As Pascal said, it's improbable that sharing the 2nd cache is technically possible.
However, you can think about this from a different perspective.
It's unlikely that both applications read and write the same data. So, instead of sharing the cache, what you could implement is a cache invalidation service (using the communications stack of your choice).
Example:
Application A mostly reads Customer data and writes Invoice data
Application B mostly reads Invoice data and writes Customer data
Therefore, Application A caches Customer data and Application B caches Invoice data
When Application A, for example, modifies an invoice, it sends a message to Application B and tells it to evict the invoice from the cache.
You can also evict whole entity types, collections and regions.

Pattern / Framework for lazy population of a database from remote source

My application pulls a large amount of data from an external source - normally across the internet - and stores it locally on the application server.
Currently, as a user starts a new project we aggressively try to pull the data from the external source based on the order that we predict the user will want to access it. This process can take 2 - 3 hours.
It seems like a smarter approach here is to provide access to the data in a lazy loading style fashion. Eg - If a user wants to access entity A, try to grab it from our database first. If it's not yet there, fetch it from the remote source and populate the database at the same time.
This, combined with continuing to populate the database in the background, would give a much slicker experience for the user.
Are there frameworks which manage this level of abstraction? (My application is in Java).
There's several considerations here - Ie., Currently my database enforces relational integrity - something that might have to be turned off to facilitate this lazy loading approach. Concurrency seems like it would cause problems here.
Also, it seems like entities and collections could exist in a partially populated state - this requires additional schema data to distinguish the complete from the partially populated.
As I understand it, this is just an aggregated repository pattern - is this correct, or is this a more appropriate pattern I should study?

Have you tried JPA/Hibernate? This seems easily possible in Hibernate.

monitoring mysql for changes

I have a Java app using a MySQL database through hibernate. The database is really used as persistence layer: The database is read at the initial load of the program, and the records are then maintained in memory.
However, we are adding extra complexity, where another process may change the database as well, and it would be nice for the changes to reflect on the Java app. Yet, I don't particularly like pulling mechanisms to query the database every few seconds, especially that the database is rarely updated.
Is there a way to have a callback to listen to database changes? Would triggers help?

Or change both applications so the Java app is truly the owner of the MySQL database and exposes it as a service. You're coupling the two apps at the database level by doing what you're proposing.
If you have one owner of the data you can hide schema changes and such behind the service interface. You can also make it possible to have a publish/subscribe mechanism to alert interested parties about database changes. If those things are important to you, I'd reconsider letting another application access MySQL directly.

Is there a way to have a callback to listen to database changes? Would triggers help?
To my knowledge, such a thing doesn't exist and I don't think a trigger would help. You might want to check this similar question here on SO.
So, I'd expose a hook at the Java application level (it could be a simple servlet in the case of a webapp) to notify it after an update of the database and have it invalidate its cache.

Another way would be to use a self compiled MySQL server with the patches from this project
ProjectPage External Language Stored Procedures
Check this blog post for a more detailed introduction
Calling Java code in MySQL

One option would be tail the binary logs (or setup a replication slave) and look for changes relevant to your application. This is likely to be a quite involved solution.
Another would be to add a "last_updated" indexed column to the relevant tables (you can even have mysql update this automatically) and poll for changes since the last time you checked. The queries should be very cheap.

Instead of caching the database contents within the memory space of the Java app, you could use an external cache like memcached or Ehcache. When either process updates (or reads) from the database, have it update memcached as well.
This way whenever either process updates the DB, its updates will be in the cache that the other process reads from.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.