I have a unique problem to solve.
I have a legacy java application which connects to an Oracle RDBMS. There are all sorts of queries and DMLs scattered over in the application - Inserts, Update, Delete and of course selects. It uses JBC (Preparedstatement), though one recently added lodule uses JPA.
I have a requirement to add a protection layer / logic to the application / Database whereby if any user (could even be A DBA or an OS root user) tries to modify the data (updates, inserts or deletes) bypassing the app, we are able to identify the operation as part of an audit.
Audit trail seemed to be the go to thing here, except that we cannot even trust the OS root user and thus a guy having DBA and root access can easily modify the data and remove the trace of it in the audit trails.
I was thinking to implement a cyclic crypto kind of algorithm on the sensitive tables so that on every DML executed by the application, a crypto / hash is introduced and it is incremental so that any change is easily caught by doing an audit using the application.
In theory, it seems feasible except that it might get tricky because after every DML we would potentially need to recalculate the hash / checksum of a number of subsequent records and this might overburden the application / database.
Is this a feasible solution?
You are right that computing a hash of every updated row of data will impose a burden on the system. Are you going to also validate that hash before changes are submitted to the database to ensure nothing has been changed outside the application? That's even more overhead, and a lot more custom code for your application. It also wouldn't help you identify who modified the data, or when, only that it had been updated outside of the app. Using a database trigger wouldn't work, as they are easily disabled and aren't capable of modifying the same table that calls them (you'd need a separate hash table with an entry for every row of data in every table you wanted to monitor). Auditing is still your best way to go, as it wouldn't require any modification to your app or your data schemas.
You have a couple of options in regards to auditing, depending on the version of Oracle you're using. If you're using 12c or later, you can use Unified Auditing, which has its own set of permissions and roles to allow separation of duties (i.e. normal DBA from security admin). Even in older versions you can put an update/delete audit on the actual audit trail table, so that any attempt to modify the data will itself leave a fingerprint.
Lastly, you can use a tool like Splunk, Elastic Search, syslog, or Oracle's Database Audit Vault or some other file monitoring solution to centralize your audit records to another system as they are created by the database - making them inaccessible to the DBA or local sys admin. This will take some work by your DBA and/or sysadmin to configure in the first place, but can go a long way to securing your audit data.
All that said, sooner or later you're going to have to trust two people: the sys admin and the DBA. If you can't trust them then you are in deep, deep trouble.
Oracle 20c has blockchain tables. Version 20c is currently only available in Oracle's cloud, but it will probably be available on-premise in a few months.
Related
I need to run a method in Java every time a specific table of an Oracle DB is updated (any sorts of updates, including record additions, deletions and modifications).
What is the most efficient way do "poll" a table for changes from Java that has good performance and does not put too much pressure on the DB?
Unfortunately I have many constraints:
I can't create additional tables, triggers, stored procedures etc., because I have no control over the DB administration / design.
I'd rather avoid Oracle Change Notification, as proposed in that post, as it seems to involve C/JNI.
Just counting the records is not good enough as I might miss modifications and simultaneous additions/deletions.
a delay of up to 30/60s between the actual change and the notification is acceptable.
The tables I want to monitor generally have 100k+ records (some 1m+) so I don't think pulling the whole tables is an option (from a DB load / performance perspective).
Starting from oracle 11g you can use Oracle Change Notification with plain JDBC driver Link
I've got an Oracle database that has two schemas in it which are identical. One is essentially the "on" schema, and the other is the "off" schema. We update data in the off schema and then switch the schemas behind an alias which our production servers use. Not a great solution, but it's what I've been given to work with.
My problem is that there is a separate application that will now be streaming data to the database (also handed to me) which is currently only updating the alias, which means it is only updating the "on" schema at any given time. That means that when the schemas get switched, all the data from this separate application vanishes from production (the schema it is in is now the "off" schema).
This application is using Hibernate 3.3.2 to update the database. There's Spring 3.0.6 in the mix as well, but not for the database updates. Finally, we're running on Java 1.6.
Can anyone point me in a direction to updating both "on" and "off" schemas simultaneously that does not involve rewriting the whole DAO layer using Spring JDBC to load two separate connection pools? I have not been able to find anything about getting hibernate to do this. Thanks in advance!
You shouldn't be updating two seperate databases this way, especially from the application's point of view. All it should know/care about is whether or not the data is there, not having to mess with two separate databases.
Frankly, this sounds like you may need to purchase an ETL tool. Even if you can't get it to update the 'on' schema from the 'off' one (fast enough to be practical), you will likely be able to use it to keep the two in sync (mirror changes from 'on' to 'off').
HA-JDBC is a replicating JDBC Driver we investigated for a short while. It will automatically replicate all inserts and updates, and distribute all selects. There are other database specific master-slave solutions as well.
On the other hand, I wouldn't recommend doing this for 4-8 hour procedures. Better lock the database before, update one database, and then backup-restore a copy, and then unlock again.
My application pulls a large amount of data from an external source - normally across the internet - and stores it locally on the application server.
Currently, as a user starts a new project we aggressively try to pull the data from the external source based on the order that we predict the user will want to access it. This process can take 2 - 3 hours.
It seems like a smarter approach here is to provide access to the data in a lazy loading style fashion. Eg - If a user wants to access entity A, try to grab it from our database first. If it's not yet there, fetch it from the remote source and populate the database at the same time.
This, combined with continuing to populate the database in the background, would give a much slicker experience for the user.
Are there frameworks which manage this level of abstraction? (My application is in Java).
There's several considerations here - Ie., Currently my database enforces relational integrity - something that might have to be turned off to facilitate this lazy loading approach. Concurrency seems like it would cause problems here.
Also, it seems like entities and collections could exist in a partially populated state - this requires additional schema data to distinguish the complete from the partially populated.
As I understand it, this is just an aggregated repository pattern - is this correct, or is this a more appropriate pattern I should study?
Have you tried JPA/Hibernate? This seems easily possible in Hibernate.
I have a Java app using a MySQL database through hibernate. The database is really used as persistence layer: The database is read at the initial load of the program, and the records are then maintained in memory.
However, we are adding extra complexity, where another process may change the database as well, and it would be nice for the changes to reflect on the Java app. Yet, I don't particularly like pulling mechanisms to query the database every few seconds, especially that the database is rarely updated.
Is there a way to have a callback to listen to database changes? Would triggers help?
Or change both applications so the Java app is truly the owner of the MySQL database and exposes it as a service. You're coupling the two apps at the database level by doing what you're proposing.
If you have one owner of the data you can hide schema changes and such behind the service interface. You can also make it possible to have a publish/subscribe mechanism to alert interested parties about database changes. If those things are important to you, I'd reconsider letting another application access MySQL directly.
Is there a way to have a callback to listen to database changes? Would triggers help?
To my knowledge, such a thing doesn't exist and I don't think a trigger would help. You might want to check this similar question here on SO.
So, I'd expose a hook at the Java application level (it could be a simple servlet in the case of a webapp) to notify it after an update of the database and have it invalidate its cache.
Another way would be to use a self compiled MySQL server with the patches from this project
ProjectPage External Language Stored Procedures
Check this blog post for a more detailed introduction
Calling Java code in MySQL
One option would be tail the binary logs (or setup a replication slave) and look for changes relevant to your application. This is likely to be a quite involved solution.
Another would be to add a "last_updated" indexed column to the relevant tables (you can even have mysql update this automatically) and poll for changes since the last time you checked. The queries should be very cheap.
Instead of caching the database contents within the memory space of the Java app, you could use an external cache like memcached or Ehcache. When either process updates (or reads) from the database, have it update memcached as well.
This way whenever either process updates the DB, its updates will be in the cache that the other process reads from.
My requirement is I have server J2EE web application and client J2EE web application. Sometimes client can go offline. When client comes online he should be able to synchronize changes to and fro. Also I should be able to control which rows/tables need to be synchronized based on some filters/rules. Is there any existing Java frameworks for doing it? If I need to implement on my own, what are the different strategies that you can suggest?
One solution in my mind is maintaining sql logs and executing same statements at other side during synchronization. Do you see any problems with this strategy?
There are a number of Java libraries for data synchronizing/replication. Two that I'm aware of are daffodil and SymmetricDS. In a previous life I foolishly implemented (in Java) my own data replication process. It seems like the sort of thing that should be fairly straightforward, but if the data can be updated in multiple places simultaneously, it's hellishly complicated. I strongly recommend you use one of the aforementioned projects to try and bypass dealing with this complexity yourself.
The biggist issue with synchronization is when the user edits something offline, and it is edited online at the same time. You need to merge the two changed pieces of data, or deal with the UI to allow the user to say which version is correct. If you eliminate the possibility of both being edited at the same time, then you don't have to solve this sticky problem.
The method is usually to add a field 'modified' to all tables, and compare the client's modified field for a given record in a given row, against the server's modified date. If they don't match, then you replace the server's data.
Be careful with autogenerated keys - you need to make sure your data integrity is maintained when you copy from the client to the server. Strictly running the SQL statements again on the server could put you in a situation where the autogenerated key has changed, and suddenly your foreign keys are pointing to different records than you intended.
Often when importing data from another source, you keep track of the primary key from the foreign source as well as your own personal primary key. This makes determining the changes and differences between the data sets easier for difficult synchronization situations.
Your synchronizer needs to identify when data can just be updated and when a human being needs to mediate a potential conflict. I have written a paper that explains how to do this using logging and algebraic laws.
What is best suited as the client-side data store in your application? You can choose from an embedded database like SQLite or a message queue or some object store or (if none of these can be used since it is a web application) files/ documents saved on the client using Web DB or IndexedDB through HTML 5's LocalStorage API.
Check the paper Gold Rush: Mobile Transaction Middleware with Java-Object Replication. Microsoft's documentation of occasionally connected systems describes two approaches: service-oriented or message-oriented and data-oriented. Gold Rush takes the earlier approach. The later approach uses database merge-replication.