Proper way to handle schema changes in MongoDB with java driver

Proper way to handle schema changes in MongoDB with java driver - java

I'm having an application which stores data in a cloud instance of mongoDB. So If I explain further on requirement, I'm currently having data organized at collection level like below.
collection_1 : [{doc_1}, {doc_2}, ... , {doc_n}]
collection_2 : [{doc_1}, {doc_2}, ... , {doc_n}]
...
...
collection_n : [{doc_1}, {doc_2}, ... , {doc_n}]
Note: my collection name is a unique ID to represent collection and in this explanation I'm using collection_1, collection_2 ... to represent that ids.
So I want to change this data model to a single collection model as below. The collection ID will be embedded into document to uniquely identify the data.
global_collection: [{doc_x, collection_id : collection_1}, {doc_y, collection_id : collection_1}, ...]
I'm having the data access layer(data insert, delete, update and create operations) for this application written using Java backend.
Additionally, the entire application is deployed on k8s cluster.
My requirement is to do this migration (data access layer change and existing data migration) with a zero downtime and without impacting any operation in the application. Assume that my application is a heavily used application which has a high concurrent traffic.
What is the proper way to handle this, experts please provide me the guidance..??
For example, if I consider the backend (data access layer) change, I may use a temporary code in java to support both the models and do the migration using an external client. If so, what is the proper way to do the implementation change, is there any specific design patterns for this??
Likewise a complete explanation for this is highly appreciated...

I think you have honestly already hinted at the simplest answer.
First, update your data access layer to handle both the new and old schema: Inserts and updates should update both the new and old in order to keep things in sync. Queries should only look at the old schema as it's the source of record at this point.
Then copy all data from the old to the new schema.
Then update the data access to now query the new data. This will keep the old data updated, but will allow full testing of the new data before making any changes that will result in the two sets of data being out of sync. It will also help facilitate rolling updates (ie. applications with both new and old data access code will still function at the same time.
Finally, update the data access layer to only access the new schema and then delete the old data.
Except for this final stage, you can always roll back to the previous version should you encounter problems.

Related

Efficient database synchronization between clients - server in 2019

I need to keep in sync Client with postgreSQL database (only data that are loaded from database, not entire database, 50+ db tables and a lot of collections inside entities). As recently I have added server based on Spring-REST API to my application I could manage those changes maybe differently/more efficient that would require less work. So untill now my approach was to add psql notification that triggers json
CREATE TRIGGER extChangesOccured
AFTER INSERT OR UPDATE OR DELETE ON xxx_table
FOR EACH ROW EXECUTE PROCEDURE notifyUsers();
the client then receive the json built as:
json_build_object(
'table',TG_TABLE_NAME,
'action', TG_OP,
'id', data,
'session', session_app_name);
compare if this change is made by this client or any other and fetch the new data from database.
Then on client side new object is manually "rewritten", something like method copyFromObject(new_entity) and variables are being overriden (including collections, avoid transient etc...).
This approach requires to keep copyFromObject method for each entity (hmm still can be optimized with reflections)
Problems with my approach is:
requires some work when modifying variables (can be optimized using reflections)
entire new entity is loaded when changed by some client
I am curious of Your solutions to keep clients in sync with db, generally I have desktop client here and the client loads a lot of data from database which must be sync, loading database takes even 1min on the app start depends on chosen data-period which should be fetched
The perfect solution would be to have some engine that would fetch/override only those variables in entities that was really changed and make it automatically.

A simple solution is to implement Optimistic Lock? It will prevent user from persisting data if the entity was changed after the user fetched it.
Or
You can use 3rd party apps for DB synchronization. I've played some time ago with Pusher and you can find an excessive tutorial about Client synchronization here: React client synchronization
Of course pusher is not the only one solution, and I'm not related to the dev team of that app by at all.

For my purpose I have implemented AVL Tree based loaded entities and database synchronization engine that creates repositiories based on the loaded entities from hibernate and asynchronously search throught all the fields in entities and rewrites/merge all the same fields (so that if some field (pk) is the same entity like the one in repository, it replaces it)
In this way synchronization with database is easy as it comes to find the externally changed entity in the repository (so basically in the AVL Tree which is O(log n)) and rewrite its fields.

Schema-less in All layers

I have use case where schema of my entities keep on changing again and again.
Based on that i have to change/add new business rules which is not scalable for me.
I am trying to go schema-less by using JSON documents in my data , service and ui layer.
I want to avoid creating DAO(Data Access objects), Data Transfer object(for sending objects) , View Model objects that are major problem for me in changing schema and deploy again.
Are there any good frameworks from where i can take help in this ..

Have you tried the Java API for JSON Processing?
http://www.oracle.com/technetwork/articles/java/json-1973242.html
https://jcp.org/en/jsr/detail?id=353

Best way to share a data structure between different instances of the same java application?

Currently, I'm developing an application based on spring boot. One of the requirements is that the application should be real-time and I need some kind of unique data structure based on InvertedRadixTree(not exactly this but data strucutre is using the tree to answer the queries). I developed an admin UI for crud operations. The number of cruds are not so much and basically will be done by OPs employees. the data structure that I developed is thread safe and is synchronized by database(which is mongodb) and since this is the only app using this database, I'm not worried about the other apps messing up with the data. The only problem that I have is that if we have multiple instances of this app, and one of them do some crud operations on mongodb; although the data structure of this instance will get updated, the other instance will not be updated. I created an scheduler to update the data structure from database every 12 hours, but I'm looking for another solution like sharing data structure between all the instances. I really appreciate every suggestions.
EDIT: After searching around, I found that updating the whole data structure doesn't take to much. I wrote some test cases and put around a million record of my class inside mongodb and fetched the whole collection. Fetching and data structure creation took less than a second. So I ended up using this method instead of using some sophisticated method for synchronizing memory and database.

One of the suggestion can be that you can use a shared database. Every time there is an update by any of the APP,It should be updated in the database.And every time you have to use the data you will have to load the fresh data from the database.This is the easiest way as far as i think ..!!!

I would use something like redis http://redis.io/topics/pubsub , and listen to an event fired for the instance that make the change and use some local cache on every instance if the data is not frequently updated

data consistent in a desktop application

I am trying to create a desktop application using eclipse-rcp. In that application, I use a ORM framework to load objects from database and using JFace-databinding to bind these objects to user-interface, so the users can modify data that these objects contains.
since the objects loaded, other users or other client may also work with the same data. so when user want to save the objects back into database, the data these objects contains may differs from data in database, the difference may be caused by my application or caused by others.
should I check against real data in database when I need to save a object that may be not fresh any more?
maybe this is a common problem in ORM, but this is first time I need to deal with ORM.

yes - it's not a bad idea to check against "real" data before saving. you may have a special field - last update timestamp, or increment count.
such approach is called optimistic locking and, since it is very typical it may be supported by ORM's.

In my current project (an order management system build from scratch), we are handling orders in the form of XML objects which are saved in a relational database.
I would outline the requirements like this:
Selecting various details from anywhere in the order
Updating / enriching data (e.g. from the CRM system)
Keeping a record of the changes (invalidating old data, inserting new values)
Details of orders should be easily selectable by SQL queries (for 2nd level support)
What we did:
The serialization is done with proprietary code, disassembling the order into tables like customer, address, phone_number, order_position etc.
Whenever an order is processed a bit further (e.g. due to an incoming event), it is read completely from the database and assembled back into a XML document.
Selection of data is done by XPath (scattered over code).
Most updates are done directly in the database (the order will then be reloaded for the next step).
The problems we face:
The order structure (XSD) evolves with every release. Therefore XPaths and the custom persistence often breaks and produces bugs.
We ended up having a mixture of working with the document and the database (because the persistence layer can not persist the changes in the document).
Performance is not really an issue (yet), since it is an offline system and orders are often intentionally delayed by days.
I do not expect free consultancy here, but I am a little confused on how the approach could be improved (next time, basically).
What would you think is a good solution for handling these requirements?
Would working with an object graph, something like JXPath and OGNL and an OR mapper be a better approach? Or using XML support of e.g. the Oracle database?

If your schema changes often, I would advise against using any kind of object-mapping. You'd keep changing boilerplate code just for the heck of it.
Instead, use the declarative schema definition to validate data changes and access.
Consider an order as a single datum, expressed as an XML document.
Use a document-oriented store like MongoDB, Cassandra or one of the many XML databases to manipulate the document directly. Don't bother with cutting it into pieces to store it in a relational db.
Making the data accessible via reporting tools in a relational database might be considered secondary. A simple map-reduce job on a MongoDB, for example, could populate the required order details into a relational database whenever required, separating the two use cases quite naturally.

The standard Java EE approach is to represent your data as POJOs and use JPA for the database access and JAXB to convert the objects to/from XML.
JPA
Object-to-Relational standard
Supported by all the application server vendors.
Multiple available implementations EclipseLink, Hibernate, etc.
Powerful query language JPQL (that is very similar to SQL)
Handles query optimization for you.
JAXB
Object-to-XML standard
Supported by all the application server vendors.
Multiple implementations available: EclipseLink MOXy, Metro, Apache JaxMe, etc.
Example
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.