Implementing Database Listener - java

I'm developing a Java desktop application with Swing as GUI. The app contains services that query from database every seconds to make the interface synced from database. We all know that with this approach, performance is the enemy.
What I want to achieve is that every changes made from database, altered thru psql (Postgres command-line) for example, my app should be notified to update the UI. In this way, performance may be optimized.
Thanks!

As #a_horse_with_no_name points out, PostgreSQL supports asynchronous notification channels for just this purpose.
You should create a trigger, probably in plpgsql, on the table(s) you wish to monitor. This trigger fires a NOTIFY when the table changes, optionally including the changed data its self.
The application LISTENs on the notification channel(s) and processes any asynchronous notifications it receives.
Note that it's valid to send an empty query, and you should do that instead of SELECT 1. e.g.:
stmt.execute("");
IIRC even that's optional if you're not using SSL, there's a way to poll purely client side. I don't remember what it is, though, so that's not real helpful.
You can determine exactly what changed by using a trigger-maintained change list table, or by using payloads on your notifications. Or you can simply re-read the whole table if it's small.

Related

Implement a callback mechanism that notifies for INSERTS/UPDATES

I have an SQL server and I would like to get a callback each time new data was added or updated to some tables.
My initial thoughts are:
I would like to avoid using solutions like log shipping, linked servers, transactional replication, change tracking, change-data-capture, integration service because I want to be as less invasive to the server as possible.
Create a trigger that somehow sends callback requests to my Java service.
Can I achieve such a thing using a simple trigger like that:
CREATE TRIGGER my_trigger ON my_table
AFTER INSERT, UPDATE
AS
BEGIN
-- call something that notifies my Java service
END

2 way syncing with Google Calendar/Outlook

I am using FullCalendar in my application to display events created via our own application.
I have an add/edit form for creating/updating events. These events are stored in the db used by application.
I need to move further from this and need to sync Google and Outlook calendars into my calendar. This should be 2 way syncing i.e
If I create/edit/update an event in my calendar it should be created/edited/deleted in Google/Outlook calendars.
It should be vice-versa too.
If I am doing some change in Google/Outlook calendars it should be visible in my calendar.
I would like your thoughts on implementing this:
Should I fetch all the events of Google/Outlook and import them into my db and then display them in my calendar view. Is this even technically possible? i.e importing the entire set of events of a channel to my db.
Should I just do a GET via Google/Outlook API to fetch the events for a particular view where I am right now in my calendar (I will be having start data and end date of my calendar view) and just show them to my calendar (i.e I am not storing those external events in my db). If a user wants to do any change in the events that should be updated directly to Google/Outlook calendars via their API calls of create/update and delete.
What should be the best approach?
Edit:
I went to https://calendar.sunrise.am/ (one of the calendar sync web app) and noticed
i.e they are allowing many different applications calendars/tasks to be synced into their calendar.
Seeing all that I feel that storing all the events of all those applications into our own application's db is not feasible. If any change is done to those events via my application I should call the API of those applications to make that change into their application (Google calendar, Outlook etc).
Whats your thoughts?
To be able to create reliable sync solution you need several things. Most important is that the other party (google calendar and outlook in this case) should cooperate with you and provide an api to perform incremental synchronization. I didn't look at Outlook, but Google Calendar api provides you all you need.
First to answer your question - yes, you need to fetch all events (you may skip events in the past though) and store them in your own database. Always making a query to all external sources (plus to your own database) is slow, makes synchronization much harder and limits you quite a lot, because you cannot for example filter or search events at multiple sources easily. Below I will assumed we are working with Google Calendar only, hopefully Outlook is similar (but I did not check).
So checklist of what you need:
Your own database with events, where event table has several important metadata columns: Created\Updated (time when event was last created or updated, not related to the date of event itself), Source (where this event came from, like Google Calendar, Outlook or your own app), UpdatedAtSource (source where this event was last modified), EventID (unique identifier of event - important to have that to prevent duplicates in certain cases).
Initially fetch all events from target provider and store them in your database. Here is a reference to the structure of Google Calendar event and you see that all required metadata fields (created,updated, id) are present there.
Now you need to watch for new events coming from provider. You can either do this by polling (periodically checking if there are new events) or by making provider push events to you. Google Calendar supports both options. Here is a link describing how to implement push notifications and here is the link describing how to get only new events, that is events you (your application) didn't see before. Note that you don't need to fetch whole list every time, nor do you need to provide some filter options (like "give me all events created after 2016-06-21"). All this would be unreliable, but Google Calendar developers know how to make good sync api, so they took care of that for you. Just grab and store provided nextSyncToken and use it to make future requests. If you use push notifications - always also periodically poll events, but not often (like one every several hours). Push notifications are not 100% reliable and some can be missed - you need to handle those using that nextSyncToken api.
Push changes made by your own application to target providers. But, do not do this immediatly when the change itself is made. Instead use background process which pushes changes for each user+provider pair one by one. There will be network failures, there will be conflicts, so you have to push changes sequentially, not in parallel (again, sequentially for every user + provider pair, not globally). Store timestamp of last successfully pushed change (again, for every user + provider) and if process has been interrupted - you know where to start over.
I will not cover that here much, but you will conflicts - that is when user modified same event in multiple sources. However, if you use push notifications - conflicts will be very rare. Still you have to plan for them at least in user interface. If you detected unresolvable conflict - pause synchronization process and ask user how to resolve it.
So you see that there is some work to do, but in the end you will make small number of requests and fetch small amount of data with each request to provider, and your users will be happy to see new events from their Google Calendar\Outlook in your application immediatly (and visa versa).

Listen for Changes In Cassandra Datastore?

I wonder if it is possible to add a listener to Cassandra getting the table and the primary key for changed entries? It would be great to have such a mechanism.
Checking Cassandra documentation I only find adding StateListener(s) to the Cluster instance.
Does anyone know how to do this without hacking Cassandras data store or encapsulate the driver and do something on my own?
Check out this future jira --
https://issues.apache.org/jira/browse/CASSANDRA-8844
If you like it vote for it : )
CDC
"In databases, change data capture (CDC) is a set of software design
patterns used to determine (and track) the data that has changed so
that action can be taken using the changed data. Also, Change data
capture (CDC) is an approach to data integration that is based on the
identification, capture and delivery of the changes made to enterprise
data sources."
-Wikipedia
As Cassandra is increasingly being used as the Source of Record (SoR)
for mission critical data in large enterprises, it is increasingly
being called upon to act as the central hub of traffic and data flow
to other systems. In order to try to address the general need, we,
propose implementing a simple data logging mechanism to enable
per-table CDC patterns.
If clients need to know about changes, the world has mostly gone to the message broker model-- a middleman which connects producers and consumers of arbitrary data. You can read about Kafka, RabbitMQ, and NATS here. There is an older DZone article here. In your case, the client writing to the database would also send out a change message. What's nice about this model is you can then pull whatever you need from the database.
Kafka is interesting because it can also store data. In some cases, you might be able to dispose of the database altogether.
Are you looking for something like triggers?
https://github.com/apache/cassandra/tree/trunk/examples/triggers
A database trigger is procedural code that is automatically executed
in response to certain events on a particular table or view in a
database. The trigger is mostly used for maintaining the integrity of
the information on the database. For example, when a new record
(representing a new worker) is added to the employees table, new
records should also be created in the tables of the taxes, vacations
and salaries.

Denormalization in Google App Engine?

Background::::
I'm working with google app engine (GAE) for Java. I'm struggling to design a data model that plays to big table's strengths and weaknesses, these are two previous related posts:
Database design - google app engine
Appointments and Line Items
I've tentatively decided on a fully normalized backbone with denormalized properties added into entities so that most client requests can be serviced with only one query.
I reason that a fully normalized backbone will:
Help maintain data integrity if I code a mistake in the denormalization
Enable writes in one operation from a client's perspective
Allow for any type of unanticipated query on the data (provided one is willing to wait)
While the denormalized data will:
Enable most client requests to be serviced very fast
Basic denormalization technique:::
I watched an app engine video describing a technique referred to as "fan-out." The idea is to make quick writes to normalized data and then use the task queue to finish up the denormalization behind the scenes without the client having to wait. I've included the video here for reference, but its an hour long and theres no need to watch it in order to understand this question:
http://code.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html
If I use this "fan-out" technique, every time the client modifies some data, the application would update the normalized model in one quick write and then fire off the denormalization instructions to the task queue so the client does not have to wait for them to complete as well.
Problem:::
The problem with using the task queue to update the denormalized version of the data is that the client could make a read request on data that they just modified before the task queue has completed the denormalization on that data. This would provide the client with stale data that is incongruent with their recent request confusing the client and making the application appear buggy.
As a remedy, I propose fanning out denormalization operations in parallel via asynchronous calls to other URLS in the application via URLFetch: http://code.google.com/appengine/docs/java/urlfetch/ The application would wait until all of the asynchronous calls had been completed before responding to the client request.
For example, if I have an "Appointment" entity and a "Customer" entity. Each appointment would include a denormalized copy of the customer information for who its scheduled for. If a customer changed their first name, the application would make 30 asynchronous calls; one to each affected appointment resource in order to change the copy of the customer's first name in each one.
In theory, this could all be done in parallel. All of this information could be updated in roughly the time it takes to make 1 or 2 writes to the datastore. A timely response could be made to the client after the denormalization was completed eliminating the possibility of the client being exposed to incongruent data.
The biggest potential problem I see with this is that the application can not have more than 10 asynchronous request calls going at any one time (documented here): http://code.google.com/appengine/docs/java/urlfetch/overview.html).
Proposed denormalization technique (recursive asynchronous fan-out):::
My proposed remedy is to send denormalization instructions to another resource that recursively splits the instructions into equal-sized smaller chunks, calling itself with the smaller chunks as parameters until the number of instructions in each chunk is small enough to be executed outright. For example, if a customer with 30 associated appointments changed the spelling of their first name. I'd call the denormalization resource with instructions to update all 30 appointments. It would then split those instructions up into 10 sets of 3 instructions and make 10 asynchronous requests to its own URL with each set of 3 instructions. Once the instruction set was less than 10, the resource would then make asynchronous requests outright as per each instruction.
My concerns with this approach are:
It could be interpreted as an attempt to circumvent app engine's rules, which would cause problems. (its not even allowed for a URL to call itself, so I'd in fact have to have two URL resources that handle the recursion that would call each other)
It is complex with multiple points of potential failure.
I'd really appreciate some input on this approach.
This sounds awfully complicated, and the more complicated the design the more difficult it is to code and maintain.
Assuming you need to denormalize your data, I'd suggest just using the basic denormalization technique, but keep track of which objects are being updated. If a client requests an object which is being updated, you know you need to query the database to get the updated data; if not, you can rely on the denormalized data. Once the task queue finishes, it can remove the object from the "being updated" list, and everything can rely on the denormalized data.
A sophisticated version could even track when each object was edited, so a given object would know if it had already been updated by the task queue.
It sounds like you are re-implemeting Materialized Views http://en.wikipedia.org/wiki/Materialized_view.
I suggest you the easy solution with Memcache. Uppon update from your client, you could save an Entity in the Memcache storing the Key of the updated Entity with the status 'updating'. When you task finisches, it will delete the Memcached status. Then you would check the status before a read, allowing the user to be correctly informed if the Entity is still 'locked'.

JMS message. Model to include data or pointers to data?

I am trying to resolve a design difference of opinion where neither of us has experience with JMS.
We want to use JMS to communicate between a j2ee application and the stand-alone application when a new event occurs. We would be using a single point-to-point queue. Both sides are Java-based. The question is whether to send the event data itself in the JMS message body or to send a pointer to the data so that the stand-alone program can retrieve it. Details below.
I have a j2ee application that supports data entry of new and updated persons and related events. The person records and associated events are written to an Oracle database. There are also stand-alone, separate programs that contribute new person and event records to the database. When a new event occurs through any of 5-10 different application functions, I need to notify remote systems through an outbound interface using an industry-specific standard messaging protocol. The outbound interface has been designed as a stand-alone application to support scalability through asynchronous operation and by moving it to a separate server.
The j2ee application currently has most of the data in memory at the time the event is entered. The data would consist of approximately 6 different objects; a person object and some with multiple instances for an average size in the range of 3000 to 20,000 bytes. Some special cases could be many times this amount.
From a performance and reliability perspective, should I model the JMS message to pass all the data needed to create the interface message, or model the JMS message to contain record keys for the data and have the stand-alone Java application retrieve the data to create the interface message?
I wouldn't just focus on performance for the decision, but also on other non-functional considerations.
I've been working on a system where we decided to not send the data in the message, but rather the PK of the data in database. Our approach was closer to the command message pattern. Our choice was motivated by the following reasons:
Data size: we would store the data in BLOB because it could bu hughe. In your case, the size of the data probably fit in a message anayway.
Message loss: we planned for the worse. If the messages were lost, we could recover the data and we had a recovery procedure to resubmit the messages. Looks maybe paranoid, but here are two scenario that could lead to some message being lost: (1) queue is purged by mistake (2) an error occurs and messages can't be delivered for a long time. They go to the dead message queue (DMQ) which eventually reaches its limit and start discarding messages, if not configured correctly.
Monitoring: different messages/command could update the same row in database. That was easy to monitor and troubleshoot.
Using a JMS + database did however complicates a bit the design:
distributed transactions: this adds some complexity, and sometimes some problems. Distributed transactions have subtle differences with "regular" transactions, such as distributed timeout.
persitency: the code is less intuitive. Data must first be persisted to have the PK, which leads to some complexity in the code if an ORM is used.
I guess both approaches can work. I've described what led us to not send the data in the message, but your system and requirements might be different, so it might still be easier to send the data in the message in your case. I can not provide a definitive answer, but I hope it helps you make your decision.
Send the data, not the pointer. I wouldn't consider your messages to be an extraordinary size that can't be handled.
It will be no problem for the queue to handle the data, the messages in the queue are persisted anyway (memory, file or database persistence whatever fits better for the size of your queue).
If you just put a handle to the data in the queue the application that process the queue will make unnecessary work to get the data that the sender already has.
Depending on your question I cannot say what's the best in your case. Sure there are performance implications because of the message size and stuff, but first you need to know which information needs to be sent to the remote system by your message consumer, especially in a system which may have concurring updates on the same data.
It is relevant whether you need to keep the information stored in the remote system in sync with the version of the record just stored in your database, and whether you want to propagate a complete history along to the remote system which is updated by the message reciever. As a lot of time may pass in between the message send and the processing on the other end of the queue.
Assume (for some reason) there are a whole lot of messages in the queue, and within a few seconds or minutes three or four update notifications on the same object hit the queue. Assume the first message is processed after the fourth update to the record was finished, and its update notification is put in the queue. When you only pass along the ID of the record, all four messages would perform exactly the same operation on the remote system, which for one is absolutely superfluous. In addition, the remote system sees four updates, all the same,but has no information of the three intermediating states of the object, thus, the history, if relevant, is lost for this system.
Beside these semantic implications, technical reasons for passing the id or the whole data are whether it's cheaper to unwrap the updated information from the message body or to load them from the database. This depends on how you want to serialize/deserialize the contents. The message sizes you provided should be no problem for decent JMS implementation when you want to send the data along.
When serializing java objects into messages you need to hold the class format in sync between sender and consumer, and you have to empty the queue before you can update to a newer version of the class on the consuming site. Of course the same counts for database updates when you just pass along the id.
When you just send the ID to the consumer you will have additional database connections, this might also be relevant depending on the load on the database and how complex the queries are you need to execute to get the objects.

Categories