I need to run a method in Java every time a specific table of an Oracle DB is updated (any sorts of updates, including record additions, deletions and modifications).
What is the most efficient way do "poll" a table for changes from Java that has good performance and does not put too much pressure on the DB?
Unfortunately I have many constraints:
I can't create additional tables, triggers, stored procedures etc., because I have no control over the DB administration / design.
I'd rather avoid Oracle Change Notification, as proposed in that post, as it seems to involve C/JNI.
Just counting the records is not good enough as I might miss modifications and simultaneous additions/deletions.
a delay of up to 30/60s between the actual change and the notification is acceptable.
The tables I want to monitor generally have 100k+ records (some 1m+) so I don't think pulling the whole tables is an option (from a DB load / performance perspective).
Starting from oracle 11g you can use Oracle Change Notification with plain JDBC driver Link
Related
I have a unique problem to solve.
I have a legacy java application which connects to an Oracle RDBMS. There are all sorts of queries and DMLs scattered over in the application - Inserts, Update, Delete and of course selects. It uses JBC (Preparedstatement), though one recently added lodule uses JPA.
I have a requirement to add a protection layer / logic to the application / Database whereby if any user (could even be A DBA or an OS root user) tries to modify the data (updates, inserts or deletes) bypassing the app, we are able to identify the operation as part of an audit.
Audit trail seemed to be the go to thing here, except that we cannot even trust the OS root user and thus a guy having DBA and root access can easily modify the data and remove the trace of it in the audit trails.
I was thinking to implement a cyclic crypto kind of algorithm on the sensitive tables so that on every DML executed by the application, a crypto / hash is introduced and it is incremental so that any change is easily caught by doing an audit using the application.
In theory, it seems feasible except that it might get tricky because after every DML we would potentially need to recalculate the hash / checksum of a number of subsequent records and this might overburden the application / database.
Is this a feasible solution?
You are right that computing a hash of every updated row of data will impose a burden on the system. Are you going to also validate that hash before changes are submitted to the database to ensure nothing has been changed outside the application? That's even more overhead, and a lot more custom code for your application. It also wouldn't help you identify who modified the data, or when, only that it had been updated outside of the app. Using a database trigger wouldn't work, as they are easily disabled and aren't capable of modifying the same table that calls them (you'd need a separate hash table with an entry for every row of data in every table you wanted to monitor). Auditing is still your best way to go, as it wouldn't require any modification to your app or your data schemas.
You have a couple of options in regards to auditing, depending on the version of Oracle you're using. If you're using 12c or later, you can use Unified Auditing, which has its own set of permissions and roles to allow separation of duties (i.e. normal DBA from security admin). Even in older versions you can put an update/delete audit on the actual audit trail table, so that any attempt to modify the data will itself leave a fingerprint.
Lastly, you can use a tool like Splunk, Elastic Search, syslog, or Oracle's Database Audit Vault or some other file monitoring solution to centralize your audit records to another system as they are created by the database - making them inaccessible to the DBA or local sys admin. This will take some work by your DBA and/or sysadmin to configure in the first place, but can go a long way to securing your audit data.
All that said, sooner or later you're going to have to trust two people: the sys admin and the DBA. If you can't trust them then you are in deep, deep trouble.
Oracle 20c has blockchain tables. Version 20c is currently only available in Oracle's cloud, but it will probably be available on-premise in a few months.
I'm investigating the possibility of using neo4j to handle some of the queries of our java web application that simply take too long to run on MSSQL as they require so many joins on large tables, even with indexes implemented.
I am however concerned about the time that it might take to complete the ETL ultimately impacting on how outdated the information may be when queries.
Can someone advise on either a production strategy or toolkit / library that can assist in reading a production sql-server database (using deltas if possible to optimise) and updating a running instance of a neo4j database? I imagine that there will have to be some kind of mapping configuration but the idea is to have this run in an automated manner, updating the neo4j database with one or more sql-server table or view contents.
The direct way to connect a MS SQL database to a Neo4j database would be using the apoc.load.jdbc procedure.
For an initial load you can use Neo4j ETL (https://neo4j.com/blog/rdbms-neo4j-etl-tool/).
There is however no way around the fact that some planning and work will be involved if you want to keep two databases in sync (and if the logic involved goes beyond a few simple queries) continiously. You might want to offload a delta every so often (monthly, daily, hourly, ...) into CSV files and load those (with CYPHER syntax determining what needs to be added, removed, changed or connected) with LOAD CSV.
Sadly enough there's no such thing as a free lunch.
Hope this helps,
Tom
I am using Spring 2.5 and the Hibernate that goes with it. I'm running against an Oracle 11g database.
I have created my DAOs which extend HibernateTemplate. Now I want to write a loader that inserts 5 million rows in my person table. I have written this in a simple minded fashion like read a row from a CSV file, turn it into a person, save into the table. Keep doing this until CSV file is empty.
The problem is that I run out of heap space at about 450000 rows. So I double the size of memory from 1024m to 2048m and now I run out of memory after about 900000 rows.
Hmmmmm....
So I've read some things about turning off the query cache for Hibernate, but I'm not using a L2 cache, so I don't think this is the issue.
I've read some things about JDBC2 batching, but I don't think that applies to hibernate.
So, I'm wondering if maybe there's a fundamental thing about Hibernate that I'm missing.
To be honest I wouldn't be using hibernate for that. ORMs are not designed to load million of rows into DBs. Not saying that you can't, but it's a bit like digging a swimming pool with a electric drill; you'd use an excavator for that, not a drill.
In your case, I'd load the CSV directly to the DB with a loader application that comes with databases. If you don't want to do that, yes, batch inserts will be way more efficient. I don't think Hibernate let's you do that easily though. If I were you I'd just use plain JDBC, or at most Spring JDBC.
If you have complicated businesslogic in the entities and absolutely have to use Hibernate, you could flush every N records as Richard suggests. However, I'd consider that a pretty bad hack.
In my experience with EclipseLink, holding a single transaction open while inserting/updating many records results in the symptoms you've experienced.
You are working with an EntityManager (of some sort, JPA or Hybernate specific - it's still managing Entitys). It's trying to keep the working set in memory, for the life of the transaction.
A general solution was to commit & the restart the transaction after every N inserts; a typical N for me was 1000.
As a footnote, with some version (undefined, it's been a few years) of EclipseLink, a session flush/clear didn't solve the problem.
It sounds like you are running out of space due to your first-level cache (the Hibernate session). You can flush the Hibernate session periodically to keep memory usage down, and break up the work into chunks by committing every few thousand rows, keeping the database's transaction log from getting too big.
But using Hibernate for a load task like that will be slow, because JDBC is slow. If you have a good idea what the environment will be like, you have a cap on the amount of data, and you have a big enough window for processing, then you can manage, but in a situation where you want it to work in multiple different client sites and you want to minimize the time spent on figuring out problems due to some client site's load job not working, then you should go with the database's bulk-copy tool.
The bulk-copy approach means the database suspends all constraint checking and index-building and transaction logging, instead it concentrates on slurping the data in as fast as possible. Because JDBC doesn't get anything like this level of cooperation from the database it can't compete. At a previous job we replaced a JDBC loader task that took over 8 hours to run with a SQLLoader task that took 20 minutes.
You do sacrifice database independence, but all databases have a bulk-copy tool (because DBAs rely on them) so you will have a very similar process for each database, only the exe you invoke and the way the file formatting is specified should change. And this way you make the best use of your processing window.
I've got an Oracle database that has two schemas in it which are identical. One is essentially the "on" schema, and the other is the "off" schema. We update data in the off schema and then switch the schemas behind an alias which our production servers use. Not a great solution, but it's what I've been given to work with.
My problem is that there is a separate application that will now be streaming data to the database (also handed to me) which is currently only updating the alias, which means it is only updating the "on" schema at any given time. That means that when the schemas get switched, all the data from this separate application vanishes from production (the schema it is in is now the "off" schema).
This application is using Hibernate 3.3.2 to update the database. There's Spring 3.0.6 in the mix as well, but not for the database updates. Finally, we're running on Java 1.6.
Can anyone point me in a direction to updating both "on" and "off" schemas simultaneously that does not involve rewriting the whole DAO layer using Spring JDBC to load two separate connection pools? I have not been able to find anything about getting hibernate to do this. Thanks in advance!
You shouldn't be updating two seperate databases this way, especially from the application's point of view. All it should know/care about is whether or not the data is there, not having to mess with two separate databases.
Frankly, this sounds like you may need to purchase an ETL tool. Even if you can't get it to update the 'on' schema from the 'off' one (fast enough to be practical), you will likely be able to use it to keep the two in sync (mirror changes from 'on' to 'off').
HA-JDBC is a replicating JDBC Driver we investigated for a short while. It will automatically replicate all inserts and updates, and distribute all selects. There are other database specific master-slave solutions as well.
On the other hand, I wouldn't recommend doing this for 4-8 hour procedures. Better lock the database before, update one database, and then backup-restore a copy, and then unlock again.
I have a Java app using a MySQL database through hibernate. The database is really used as persistence layer: The database is read at the initial load of the program, and the records are then maintained in memory.
However, we are adding extra complexity, where another process may change the database as well, and it would be nice for the changes to reflect on the Java app. Yet, I don't particularly like pulling mechanisms to query the database every few seconds, especially that the database is rarely updated.
Is there a way to have a callback to listen to database changes? Would triggers help?
Or change both applications so the Java app is truly the owner of the MySQL database and exposes it as a service. You're coupling the two apps at the database level by doing what you're proposing.
If you have one owner of the data you can hide schema changes and such behind the service interface. You can also make it possible to have a publish/subscribe mechanism to alert interested parties about database changes. If those things are important to you, I'd reconsider letting another application access MySQL directly.
Is there a way to have a callback to listen to database changes? Would triggers help?
To my knowledge, such a thing doesn't exist and I don't think a trigger would help. You might want to check this similar question here on SO.
So, I'd expose a hook at the Java application level (it could be a simple servlet in the case of a webapp) to notify it after an update of the database and have it invalidate its cache.
Another way would be to use a self compiled MySQL server with the patches from this project
ProjectPage External Language Stored Procedures
Check this blog post for a more detailed introduction
Calling Java code in MySQL
One option would be tail the binary logs (or setup a replication slave) and look for changes relevant to your application. This is likely to be a quite involved solution.
Another would be to add a "last_updated" indexed column to the relevant tables (you can even have mysql update this automatically) and poll for changes since the last time you checked. The queries should be very cheap.
Instead of caching the database contents within the memory space of the Java app, you could use an external cache like memcached or Ehcache. When either process updates (or reads) from the database, have it update memcached as well.
This way whenever either process updates the DB, its updates will be in the cache that the other process reads from.