2 way syncing with Google Calendar/Outlook - java

I am using FullCalendar in my application to display events created via our own application.
I have an add/edit form for creating/updating events. These events are stored in the db used by application.
I need to move further from this and need to sync Google and Outlook calendars into my calendar. This should be 2 way syncing i.e
If I create/edit/update an event in my calendar it should be created/edited/deleted in Google/Outlook calendars.
It should be vice-versa too.
If I am doing some change in Google/Outlook calendars it should be visible in my calendar.
I would like your thoughts on implementing this:
Should I fetch all the events of Google/Outlook and import them into my db and then display them in my calendar view. Is this even technically possible? i.e importing the entire set of events of a channel to my db.
Should I just do a GET via Google/Outlook API to fetch the events for a particular view where I am right now in my calendar (I will be having start data and end date of my calendar view) and just show them to my calendar (i.e I am not storing those external events in my db). If a user wants to do any change in the events that should be updated directly to Google/Outlook calendars via their API calls of create/update and delete.
What should be the best approach?
Edit:
I went to https://calendar.sunrise.am/ (one of the calendar sync web app) and noticed
i.e they are allowing many different applications calendars/tasks to be synced into their calendar.
Seeing all that I feel that storing all the events of all those applications into our own application's db is not feasible. If any change is done to those events via my application I should call the API of those applications to make that change into their application (Google calendar, Outlook etc).
Whats your thoughts?

To be able to create reliable sync solution you need several things. Most important is that the other party (google calendar and outlook in this case) should cooperate with you and provide an api to perform incremental synchronization. I didn't look at Outlook, but Google Calendar api provides you all you need.
First to answer your question - yes, you need to fetch all events (you may skip events in the past though) and store them in your own database. Always making a query to all external sources (plus to your own database) is slow, makes synchronization much harder and limits you quite a lot, because you cannot for example filter or search events at multiple sources easily. Below I will assumed we are working with Google Calendar only, hopefully Outlook is similar (but I did not check).
So checklist of what you need:
Your own database with events, where event table has several important metadata columns: Created\Updated (time when event was last created or updated, not related to the date of event itself), Source (where this event came from, like Google Calendar, Outlook or your own app), UpdatedAtSource (source where this event was last modified), EventID (unique identifier of event - important to have that to prevent duplicates in certain cases).
Initially fetch all events from target provider and store them in your database. Here is a reference to the structure of Google Calendar event and you see that all required metadata fields (created,updated, id) are present there.
Now you need to watch for new events coming from provider. You can either do this by polling (periodically checking if there are new events) or by making provider push events to you. Google Calendar supports both options. Here is a link describing how to implement push notifications and here is the link describing how to get only new events, that is events you (your application) didn't see before. Note that you don't need to fetch whole list every time, nor do you need to provide some filter options (like "give me all events created after 2016-06-21"). All this would be unreliable, but Google Calendar developers know how to make good sync api, so they took care of that for you. Just grab and store provided nextSyncToken and use it to make future requests. If you use push notifications - always also periodically poll events, but not often (like one every several hours). Push notifications are not 100% reliable and some can be missed - you need to handle those using that nextSyncToken api.
Push changes made by your own application to target providers. But, do not do this immediatly when the change itself is made. Instead use background process which pushes changes for each user+provider pair one by one. There will be network failures, there will be conflicts, so you have to push changes sequentially, not in parallel (again, sequentially for every user + provider pair, not globally). Store timestamp of last successfully pushed change (again, for every user + provider) and if process has been interrupted - you know where to start over.
I will not cover that here much, but you will conflicts - that is when user modified same event in multiple sources. However, if you use push notifications - conflicts will be very rare. Still you have to plan for them at least in user interface. If you detected unresolvable conflict - pause synchronization process and ask user how to resolve it.
So you see that there is some work to do, but in the end you will make small number of requests and fetch small amount of data with each request to provider, and your users will be happy to see new events from their Google Calendar\Outlook in your application immediatly (and visa versa).

Related

Axon Framewok vs GDPR (delete phisically personal data from Domain Event Entry table)

So let's have a look at this very interisting point of nowadays the General Data Protection Regulation (GDPR). Let's make it clear what is Axon's best answer for the problem below. (I am using Axon 4.1 with Spring Boot)
I introduce you my problem:
The user comes and for example the he wants to book an appointment, where he must enter his email, phonenumber etc. a lot of personal data. Before the user click on Enter he has to accept a privacy statement which includes how long we store his personal information. So when the user click on Enter the backend will event source all the information the user entered. All his privacy data will be stored to Axon's Domain Event Entry table for an Aggregate for the created event. The user's personal data can be found in the payload.
So when the storage time expires I have to remove all the personal data from all my tables including Axon's Domain Event Entry table.
So my question is how to phisically remove an aggregate from Domain Event Entry.
I tried this solution:
#EventSourcingHandler
public void on(CampaignDeletedEvent event) {
markDeleted();
}
But it does not doing anything, the API says: "Marks this aggregate as deleted, instructing a repository to remove that aggregate at an appropriate time." - It should remove phisically? it not doing it, I waited for 30 minutes, the aggregate still in the table, what that means "appropriate time"?
After my failed attempt to delete I read this stackoverflow question (Axon Framework: Delete Aggregate Root) where Allard said this at the comment section: "That's correct. With Event Sourcing, "delete" doesn't really exist. It's just a state like any other, except that on a "deleted" state, all commands are rejected."
Ok. So this means my Aggregate is dead but the user's personal data still there in the payload field of Domain Event Entry table for the aggregate?
So I have to somehow create a Repository and delete it or with SQL script, How are you doing this? I might be wrong and outdated about new features but if the authority comes the fine is $$$$$$$$$$$$$$$$
Thanks,
Máté
Event Sourcing mandates that the state change of the application isn't explicitly stored in the database as the new state (overwriting the previous state) but as an immutable series of events. You should not delete these events and/or change the content. This way you don't lose any data/information. Everything that happened in the system is stored. Information is far more valuable than the price of the storage these days, Don't throw it away ;)
But, some attributes of an event should not be read by all consumers, and we should be able to delete them, without touching the event store (series of events). One of the common solutions to this problem is to encrypt the sensitive attributes, with a different encryption key for each resource. Only give the key to consumers that require it. When the sensitive information needs to be erased, delete the encryption key instead, to ensure the information can never be accessed again. This effectively makes all copies and backups of the sensitive data unusable. This pattern is known as Crypto-Shredding. The Crypto-Shredding pattern is of course only as good as your encryption and your key management practices and in my opinion a better option than just running the delete on SQL table (do you delete all the data really - what about logs?)
Axon provides a commercial module Axon Data Protection module (https://axoniq.io/product-overview/axon-data-protection) for this purpose.

Axon recreating aggregate state not clear

I am working on my first Axon application and I cant figure out the use of the aggregates. I understand every time a command handler is called, the aggregate is being recreated by all the events, but I dont understand what other usage the recreating of the aggregates could have.
Like when should I manually recreate an aggregate?
What is the benefit of the aggregate being recreated every time I call an command?
The way I set up my application, I use a aggregateview to persist the data I need into the database. So now I feel like the events are just stored in the event store and are only used to recreate the aggregate after I call a command. Is there nothing else I should do with the events being stored and the recreation of the aggregate? Shouldn't I for example recreate the entire aggregate, instead of fetching the aggregateview out of my database by ID to update it.
The idea behind Event Sourcing your Aggregate, is that these events are the source for any model within your system.
Thus, if you create a dedicated Command Model handling the commands like you describe, then this model (which from Axon's perspective is the #Aggregate(Root) annotation class) will be sourced from the events it has published.
Additionally, you can introduce any type of Query Model you want; a RDBMS view, a Text-Based Search solution (e.g. Elastic), a time series database, you name it. Any of these Query Models are however still part of this same root application your Aggregate resides in. As you have the events as the means to notify others of decisions being made, it comes natural to (re)use those to update all your Query Models as well.
Now, it is perfectly true that you are not inclined to use Event Sourcing for your Aggregates in Axon, which from it's perspective is called a State-Stored Aggregate. If you do this however, you'll be back at having distinct models in distinct storage mechanism, without a single source of truth.
So, to circle back to your question with this added knowledge, I'd state the following:
Like when should I manually recreate an aggregate?
You are never inclined to recreate the Aggregate as the Command Model, ever, as the framework does this for you. If you have a mirrored Query Model Aggregate, then you would recreate this whenever you have added/removed/changed fields within the model. Or, if you have introduced entirely new models.
What is the benefit of the aggregate being recreated every time I call an command?
The benefit of recreating it every time, is the assurance that you will be using the latest state always. Even if between release of your application you have added/changed/removed new fields. The #EventSourcingHandler annotated methods would simply fill them in, without the need for you to for example write a database script to adjust it on the database level directly.
Concluding, the reason for this approach lies entirely within the architectural concepts supported through Axon. You can read up on them on AxonIQ's Architectural Concepts page if you want; I am sure it will clarify things even further.
Hope this helps you out #Gisrou8! If not, please come back with more questions, I'd gladly like to explain things further.
Update: Further Command Model explanation
In the comment Gisrou8 placed under my response it becomes apparent that "the unease" with this approach mainly resides in the state of the Aggregate.
As shared in my earlier response, the Aggregate as can be modeled with Axon Framework should be, in an Event Sourced set up, regarded as the Command Model in a CQRS system.
One of the main pillars around the Command Model, is that the only state it contains is the state required for decision making logic. To be more specific on the latter, the only state stored in your Aggregate is the state used to decide if a Command Handler should accepts the incoming command and publish an event as a result.
Thus, the sole fields you would introduce in your Aggregate along side the Aggregate Identifier are the fields you need to drive these decisions.
This is what the Command Model is intended for, so do not worry about this point.
To answer any queries within your application, you'd introduce a dedicated Query Model which is updated as a result of the events published by the Command Handlers within the Aggregate. It's this exact segregation which is the strong suit of this model as it allows for better scaling, performance improvements or required team separations, among other non-functional requirements.

Listen for Changes In Cassandra Datastore?

I wonder if it is possible to add a listener to Cassandra getting the table and the primary key for changed entries? It would be great to have such a mechanism.
Checking Cassandra documentation I only find adding StateListener(s) to the Cluster instance.
Does anyone know how to do this without hacking Cassandras data store or encapsulate the driver and do something on my own?
Check out this future jira --
https://issues.apache.org/jira/browse/CASSANDRA-8844
If you like it vote for it : )
CDC
"In databases, change data capture (CDC) is a set of software design
patterns used to determine (and track) the data that has changed so
that action can be taken using the changed data. Also, Change data
capture (CDC) is an approach to data integration that is based on the
identification, capture and delivery of the changes made to enterprise
data sources."
-Wikipedia
As Cassandra is increasingly being used as the Source of Record (SoR)
for mission critical data in large enterprises, it is increasingly
being called upon to act as the central hub of traffic and data flow
to other systems. In order to try to address the general need, we,
propose implementing a simple data logging mechanism to enable
per-table CDC patterns.
If clients need to know about changes, the world has mostly gone to the message broker model-- a middleman which connects producers and consumers of arbitrary data. You can read about Kafka, RabbitMQ, and NATS here. There is an older DZone article here. In your case, the client writing to the database would also send out a change message. What's nice about this model is you can then pull whatever you need from the database.
Kafka is interesting because it can also store data. In some cases, you might be able to dispose of the database altogether.
Are you looking for something like triggers?
https://github.com/apache/cassandra/tree/trunk/examples/triggers
A database trigger is procedural code that is automatically executed
in response to certain events on a particular table or view in a
database. The trigger is mostly used for maintaining the integrity of
the information on the database. For example, when a new record
(representing a new worker) is added to the employees table, new
records should also be created in the tables of the taxes, vacations
and salaries.

Implementing Database Listener

I'm developing a Java desktop application with Swing as GUI. The app contains services that query from database every seconds to make the interface synced from database. We all know that with this approach, performance is the enemy.
What I want to achieve is that every changes made from database, altered thru psql (Postgres command-line) for example, my app should be notified to update the UI. In this way, performance may be optimized.
Thanks!
As #a_horse_with_no_name points out, PostgreSQL supports asynchronous notification channels for just this purpose.
You should create a trigger, probably in plpgsql, on the table(s) you wish to monitor. This trigger fires a NOTIFY when the table changes, optionally including the changed data its self.
The application LISTENs on the notification channel(s) and processes any asynchronous notifications it receives.
Note that it's valid to send an empty query, and you should do that instead of SELECT 1. e.g.:
stmt.execute("");
IIRC even that's optional if you're not using SSL, there's a way to poll purely client side. I don't remember what it is, though, so that's not real helpful.
You can determine exactly what changed by using a trigger-maintained change list table, or by using payloads on your notifications. Or you can simply re-read the whole table if it's small.

Denormalization in Google App Engine?

Background::::
I'm working with google app engine (GAE) for Java. I'm struggling to design a data model that plays to big table's strengths and weaknesses, these are two previous related posts:
Database design - google app engine
Appointments and Line Items
I've tentatively decided on a fully normalized backbone with denormalized properties added into entities so that most client requests can be serviced with only one query.
I reason that a fully normalized backbone will:
Help maintain data integrity if I code a mistake in the denormalization
Enable writes in one operation from a client's perspective
Allow for any type of unanticipated query on the data (provided one is willing to wait)
While the denormalized data will:
Enable most client requests to be serviced very fast
Basic denormalization technique:::
I watched an app engine video describing a technique referred to as "fan-out." The idea is to make quick writes to normalized data and then use the task queue to finish up the denormalization behind the scenes without the client having to wait. I've included the video here for reference, but its an hour long and theres no need to watch it in order to understand this question:
http://code.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html
If I use this "fan-out" technique, every time the client modifies some data, the application would update the normalized model in one quick write and then fire off the denormalization instructions to the task queue so the client does not have to wait for them to complete as well.
Problem:::
The problem with using the task queue to update the denormalized version of the data is that the client could make a read request on data that they just modified before the task queue has completed the denormalization on that data. This would provide the client with stale data that is incongruent with their recent request confusing the client and making the application appear buggy.
As a remedy, I propose fanning out denormalization operations in parallel via asynchronous calls to other URLS in the application via URLFetch: http://code.google.com/appengine/docs/java/urlfetch/ The application would wait until all of the asynchronous calls had been completed before responding to the client request.
For example, if I have an "Appointment" entity and a "Customer" entity. Each appointment would include a denormalized copy of the customer information for who its scheduled for. If a customer changed their first name, the application would make 30 asynchronous calls; one to each affected appointment resource in order to change the copy of the customer's first name in each one.
In theory, this could all be done in parallel. All of this information could be updated in roughly the time it takes to make 1 or 2 writes to the datastore. A timely response could be made to the client after the denormalization was completed eliminating the possibility of the client being exposed to incongruent data.
The biggest potential problem I see with this is that the application can not have more than 10 asynchronous request calls going at any one time (documented here): http://code.google.com/appengine/docs/java/urlfetch/overview.html).
Proposed denormalization technique (recursive asynchronous fan-out):::
My proposed remedy is to send denormalization instructions to another resource that recursively splits the instructions into equal-sized smaller chunks, calling itself with the smaller chunks as parameters until the number of instructions in each chunk is small enough to be executed outright. For example, if a customer with 30 associated appointments changed the spelling of their first name. I'd call the denormalization resource with instructions to update all 30 appointments. It would then split those instructions up into 10 sets of 3 instructions and make 10 asynchronous requests to its own URL with each set of 3 instructions. Once the instruction set was less than 10, the resource would then make asynchronous requests outright as per each instruction.
My concerns with this approach are:
It could be interpreted as an attempt to circumvent app engine's rules, which would cause problems. (its not even allowed for a URL to call itself, so I'd in fact have to have two URL resources that handle the recursion that would call each other)
It is complex with multiple points of potential failure.
I'd really appreciate some input on this approach.
This sounds awfully complicated, and the more complicated the design the more difficult it is to code and maintain.
Assuming you need to denormalize your data, I'd suggest just using the basic denormalization technique, but keep track of which objects are being updated. If a client requests an object which is being updated, you know you need to query the database to get the updated data; if not, you can rely on the denormalized data. Once the task queue finishes, it can remove the object from the "being updated" list, and everything can rely on the denormalized data.
A sophisticated version could even track when each object was edited, so a given object would know if it had already been updated by the task queue.
It sounds like you are re-implemeting Materialized Views http://en.wikipedia.org/wiki/Materialized_view.
I suggest you the easy solution with Memcache. Uppon update from your client, you could save an Entity in the Memcache storing the Key of the updated Entity with the status 'updating'. When you task finisches, it will delete the Memcached status. Then you would check the status before a read, allowing the user to be correctly informed if the Entity is still 'locked'.

Categories