Postgres trigger to update Java cache - java

I have a Java web app (WAR deployed to Tomcat) that keeps a cache (Map<Long,Widget>) in memory. I have a Postgres database that contains a widgets table:
widget_id | widget_name | widget_value
(INT) (VARCHAR 50) (INT)
To O/R map between Widget POJOs and widgets table records, I am using MyBatis. I would like to implement a solution whereby the Java cache (the Map) is updated in real-time whenever a value in the widgets table changes. I could have a polling component that checks the table every, say, 30 seconds, but polling just doesn't feel like the right solution here. So here's what I'm proposing:
Write a Postgres trigger that calls a stored procedure (run_cache_updater())
The procedure in turns runs a shell script (run_cache_updater.sh)
The script base-64 encodes the changed widgets record and then cURLs the encoded record to an HTTP URL
The Java WAR has a servlet listening on the cURLed URL and handles any HttpServletRequests sent to it. It base-64 decodes the record and somehow transforms it into a Widget POJO.
The cache (Map<Long,Widget>) is updated with the correct key/value.
This solution feels awkward, and so I am first wondering how any Java/Postgres gurus out there would handle such a situation. Is polling the better/simpler choice here (am I just being stubborn?) Is there another/better/more standard solution I am overlooking?
If not, and this solution is the standard way of pushing changed records from Postgres to the application layer, then I'm choking on how to write the trigger, stored procedure, and shell script so that the entire widgets record gets passed into the cURL statement. Thanks in advance for any help here.

I can't speak to MyBatis, but I can tell you that PostgreSQL has a publish/subscribe system baked in, which would let you do this with much less hackery.
First, set up a trigger on widgets that runs on every insert, update, and delete operation. Have it extract the primary key and NOTIFYwidgets_changed, id. (Well, from PL/pgSQL, you'd probably want PERFORM pg_notify(...).) PostgreSQL will broadcast your notification if and when that transaction commits, making both the notification and the corresponding data changes visible to other connections.
In the client, you'd want to run a thread dedicated to keeping this map up-to-date. It would connect to PostgreSQL, LISTENwidgets_changed to start queueing notifications, SELECT * FROM widgets to populate the map, and wait for notifications to arrive. (Checking for notifications apparently involves polling the JDBC driver, which sucks, but not as bad as you might think. See PgNotificationPoller for a concrete implementation.) Once you see a notification, look up the indicated record and update your map. Note that it's important to LISTEN before the initial SELECT *, since records could be changed between SELECT * and LISTEN.
This approach doesn't require PostgreSQL to know anything about your application. All it has to do is send notifications; your application does the rest. There's no shell scripts, no HTTP, and no callbacks, letting you reconfigure/redeploy your application without also having to reconfigure the database. It's just a database, and it can be backed up, restored, replicated, etc. with no extra complications. Similarly, your application has no extra complexities: all it needs is a connection to PostgreSQL, which you already have.

Related

How to know if there is a change in the table?

We have a PLM system where users go and create/update objects (i.e. Products, Colorway etc...). This objects eventually gets stored to sqlserver database. The tables do have a column for modifyTimeStamp. The field has updated timestamp when a user updated an object.
We are integrating this tool with some other application. This other application needs to know when someone creates/update objects to our PLM System.
What's the best way to achieve this? Writing some kind of listener which will keep listening and if there is a change in the table, it will notify?
The other approach could be having a trigger. But, then how my code will call that trigger as the triggers are only within the scope of that table?
I think there are many ways to go about to solve this problem. I will try to describe a few.
Creating a scheduler on the listening application. I suggest to implement a scheduler that will run every given interval to fetch the latest data according to the modify time and processing them.
Creating a new API on the listening application and to call it via the creating/updating application.
Using a microservice architecture such as using messaging services between the applications to inform one or another of creation/update events.
I hope it will help you and good luck!
SQL Server has a feature called "Change Tracking". It must first be activated for a database. If it is activated, you can issue special queries that return information about data changes in a specific table.
According to the example in the docs, the query
DECLARE #last_sync_version bigint;
SET #last_sync_version = <value obtained from query>;
SELECT [Emp ID], SSN,
SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION,
SYS_CHANGE_COLUMNS, SYS_CHANGE_CONTEXT
FROM CHANGETABLE (CHANGES Employees, #last_sync_version) AS C;
would return the data changes in the Employees table since #last_sync_version.

Strategies to build real-time data grid using Oracle, Java & AngularJS

This is my first stack-overflow post, so please ignore/forgive if I am not being specific enough.I'm sure I will learn the process gradually.
I have built a JSON to be displayed in angular data grid. This JSON comes from a complex query over materialized view.My thought to refresh the JSON as underlying data changes is as follows:
a) Register query for Oracle CQRN (Oracle Continuous Query Result Change Notification) at application startup
b) When the Underlying data changes, Oracle Database Change Listener in Java side gets invoked and Ire-query the data (with change) and push it to socket end-point. That way the JSON gets changed with latest data.
This works fine with simple query.
Issues are:
a) In my case the query is very complex and involves multiple materialized views with UNION ALL and complex JOINS. CQRN does not support materialized view registration for query result change.
b) The query I am registering at start-up, for query result change notification, is pretty static. It does not meet the requirement of various different parameterized queries behind the data-grid.
Can anyone suggest any other alternative for example cache the grid data in the middle-tire and and refresh cache with updated data whenever the underlying grid data changes. I should be notified when underlying grid data changes so I will re-query & send the updated data to socket end-point, which will refresh the grid.
I have to show the grid-data changes in real-time, so I have used Java WebSocket (JSR 356)
Technology stack:
UI: Javascaipt/AngularJS
Middle-tier: Java 1.7
Server: Jetty 9.2
Database: Oracle 11g R2
Build Platform: Maven 3.3
Suggestion for any other suitable approach also will be much appreciated.
Thanks & regards,
- Joy
While not directly answering your question we just implemented a real time data grid involving multiple data sources and CQRN. This built in is based on a table changing. Our technique was:
add on insert trigger (data feed was real time, no deletes, no updates) to the base tables
call a stored procedure to manipulate the data. You would use the logic in your materialized view. The procedure inserts data into a destination table. That has a trigger to call the CQRN.
often with realtime you need to delete old data so everything stays fast

Synchronizing table data across databases

I have one table that records its row insert/update timestamps on a field.
I want to synchronize data in this table with another table on another db server. Two db servers are not connected and synchronization is one way (master/slave). Using table triggers is not suitable
My workflow:
I use a global last_sync_date parameter and query table Master for
the changed/inserted records
Output the resulting rows to xml
Parse the xml and update table Slave using updates and inserts
The complexity of the problem rises when dealing with deleted records of Master table. To catch the deleted records I think I have to maintain a log table for the previously inserted records and use sql "NOT IN". This becomes a performance problem when dealing with large datasets.
What would be an alternative workflow dealing with this scenario?
It sounds like you need a transactional message queue.
How this works is simple. When you update the master db you can send a message to the message broker (of whatever the update was) which can go to any number of queues. Each slave db can have its own queue and because queue's preserve order the process should eventually synchronize correctly (ironically this is sort of how most RDBMS do replication internally).
Think of the Message Queue as a sort of SCM change-list or patch-list database. That is for the most part the same (or roughly the same) SQL statements sent to master should be replicated to the other databases eventually. Don't worry about loosing messages as most message queues support durability and transactions.
I recommend you look at spring-amqp and/or spring-integration especially since you tagged this question with spring-batch.
Based on your comments:
See Spring Integration: http://static.springsource.org/spring-integration/reference/htmlsingle/ .
Google SEDA. Whether you go this route or not you should know about Message queues as it goes hand-in-hand with batch processing.
RabbitMQ has a good picture diagram of how messaging works
The contents of your message might be the entire row and whether its a CRUD, UPDATE, DELETE. You can use whatever format (e.g. JSON. See spring integration on recommendations).
You could even send the direct SQL statements as a message!
BTW your concern of NOT IN being a performance problem is not a very good one as there are a plethora of work-arounds but given your not wanting to do DB specific things (like triggers and replication) I still feel a message queue is your best option.
EDIT - Non MQ route
Since I gave you a tough time about asking this quesiton I will continue to try to help.
Besides the message queue you can do some sort of XML file like you we were trying before. THE CRITICAL FEATURE you need in the schema is a CREATE TIMESTAMP column on your master database so that you can do the batch processing while the system is up and running (otherwise you will have to stop the system). Now if you go this route you will want to SELECT * WHERE CREATE_TIME < ? is less than the current time. Basically your only getting the rows at a snapshot.
Now on your other database for the delete your going to remove rows by inner joining on a ID table but with != (that is you can use JOINS instead of slow NOT IN). Luckily you only need all the ids for delete and not the other columns. The other columns you can use a delta based on the the update time stamp column (for update, and create aka insert).
I am not sure about the solution. But I hope these links may help you.
http://knowledgebase.apexsql.com/2007/09/how-to-synchronize-data-between.htm
http://www.codeproject.com/Tips/348386/Copy-Synchronize-Table-Data-between-databases
Have a look at Oracle GoldenGate:
Oracle GoldenGate is a comprehensive software package for enabling the
replication of data in heterogeneous data environments. The product
set enables high availability solutions, real-time data integration,
transactional change data capture, data replication, transformations,
and verification between operational and analytical enterprise
systems.
SymmetricDS:
SymmetricDS is open source software for multi-master database
replication, filtered synchronization, or transformation across the
network in a heterogeneous environment. It supports multiple
subscribers with one direction or bi-directional asynchronous data
replication.
Daffodil Replicator:
Daffodil Replicator is a Java tool for data synchronization, data
migration, and data backup between various database servers.
Why don't you just add a TIMESTAMP column that indicates the last update/insert/delete time? Then add a deleted column -- ie. mark the row as deleted instead of actually deleting it immediately. Delete it after having exported the delete action.
In case you cannot alter schema usage in an existing app:
Can't you use triggers at all? How about a second ("hidden") table that gets populated with every insert/update/delete and which would constitute the content of the next to be generated xml export file? That is a common concept: a history (or "log") table: it would have its own progressing id column which can be used as an export marker.
Very interesting question.
In may case I was having enough RAM to load all ids from master and slave tables to diff them.
If ids in master table are sequential you try to may maintain a set of full filled ranges in master table (ranges with all ids used, without blanks, like 100,101,102,103).
To find removed ids without loading all of them to the memory you may execute SQL query to count number of records with id >= full_region.start and id <= full_region.end for each full filled region. If result of query == (full_region.end - full_region.end) + 1 it means all record in region are not deleted. Otherwise - split region into 2 parts and do the same check for both of them (in a lot of cases only one side contains removed records).
After some length of range (about 5000 I think) it will faster to load all present ids and check for absent using Set.
Also there is a sense to load all ids to the memory for a batch of small (10-20 records) regions.
Make a history table for the table that needs to be synchronized (basically a duplicate of that table, with a few extra fields perhaps) and insert the entire row every time something is inserted/updated/deleted in the active table.
Write a Spring batch job to sync the data to Slave machine based on the history table's extra fields
hope this helps..
A potential option for allowing deletes within your current workflow:
In the case that the trigger restriction is limited to triggers with references across databases, a possible solution within your current workflow would be to create a helper table in your Master database to store only the unique identifiers of the deleted rows (or whatever unique key would enable you to most efficiently delete your deleted rows).
Those ids would need to be inserted by a trigger on your master table on delete.
Using the same mechanism as your insert/updates, create a task following your inserts and updates. You could export your helper table to xml, as you noted in your current workflow.
This task would simply delete the rows out of the slave table, then delete all data from your helper table following completion of the task. Log any errors from the task so that you can troubleshoot this since there is no audit trail.
If your database has a transaction dump log, just ship that one.
It is possible with MySQL and should be possible with PostgreSQL.
I would agree with another comment - this requires the usage of triggers. I think another table should hold the history of your sql statements. See this answer about using 2008 extended events... Then, you can get the entire sql, and store the result query in the history table. Its up to you if you want to store it as a mysql query or a mssql query.
Here's my take. Do you really need to deal with this? I assume that the slave is for reporting purposes. So the question I would ask is how up to date should it be? Is it ok if the data is one day old? Do you plan a nightly refresh?
If so, forget about this online sync process, download the full tables; ship it to the mysql and batch load it. Processing time might be a lot quicker than you think.

Sync data bethen two JPA applications

I wrote an application that uses JPA (and hibernate as persistence provider).
It works on a database with several tables.
I need to create an "offline mode", where a copy of the programa, which acts as a client, allows the same functionality while keeping their data synchronized with the server when it is reachable.
The aim is to get a client that you can "detach" from the server, make changes on the data and then merge changes back. A bit like a revision control system.
It is not important to manage conflicts, in case the user will decide which version to keep.
My idea, but it can't work, was to assign to each row in the database the last edit timestamp. The client initially download a copy of the entire database and also records a second timestamp when it modify a row while non connected to the server. In this way, it knows what data has changed and the last timestamp where it is synchronized with the server. When you reconnect to the server, he will have to ask what are the data that have been changed since the last synchronization from the server and sends the data it has changed. (a bit simplified, but the management of conflicts should not be a big problem)
This, of course, does not work in case of deleting a row. If both the server or the client deletes a row they will not notice it and the other will never know.
The solution would be to maintain a table with the list of deleted rows, but it seems too expensive.
Does anyone know a method that works? there is already something similar?
Enver:
If you like to have a simple solution, you can create Version-Fields that acts like your "Timestamp".
Audit:
If you like to have a complex, powerfull solution, you should use the Hibernateplugin

Continuously fetch data from database using Java

I have a scenario where my Java program has to continuously communicate with the database table, for example my Java program has to get the data of my table when new rows are added to it at runtime. There should be continuous communication between my program and database.
If the table has 10 rows initially and 2 rows are added by the user, it must detect this and return the rows.
My program shouldn't use AJAX and timers.
If the database you are using is Oracle, consider using triggers, that call java stored procedure, that notifies your client of changes in the db (using JMS, RMI or whatever you want).
without Ajax and timers, it not seems to do this task.
I have also faced the same issue, where i need to push some data from server to client when it changes.
For this, you can user Server push AKA "Comet" programming.
In coment
we make a channel between client and server, where client subscribes for particular channel.
Server puts its data in the channel when it has it.
when client reads the channel, it gets all the data in the channel and channel is emptied.
so every time client reads from channel, it will get new data only.
Also to monitor DB changes, you can have two things,
Some trigger/timer (Check out Quartz Scheduler)
Event base mechanism, which pushes data in the channel on particular events.
basically, client can't know anything happening on server side, so you must push some data or event to tell client that, i have some new data, please call some method. Its kind of notification. So please check in comet/server push with event notification.
hope this helps.
thanks.
Not the simplest problem, really.
Let's divide it into 2 smaller problems:
1) how to enable reloading without timers and ajax
2) how to implement server side
There is no way to notify clients from the server. So, you need to use flash or silverlight or JavaFX or Applets to create a thick client. If the problem with Ajax is that you don't know how to use it for this problem then you can investigate some ready-to-use libraries of jsp tags or jsf components with ajax support.
If you have only 1 server then just add a cache. If there are several servers then consider using distributed caches.
If you have a low-traffic database you could implement a thread that rapidly checks for updates to the DB (polling).
If you have a high-traffic DB i wouldn't recommend that, 'cause polling creates much additional traffic.
server notifying client is not a good idea (consider a scenario with a 1000 clients). Do u use some persistence layer or u have to stick to pure JDBC?
If you have binary logs turned on in MYSQL , you can see all of the transactions that occur in the database.
A portable way to do this, is adding a column time stamp (create date) which indicates when the row was added to the table. After initial loading of the content you simply poll for new content which a where clause current_time >= create_date. In case that rows could have identical timestamps you need to filter duplicates before adding them.

Categories