I have a giant monolithic which has around a million entities. I want to sync data to the micro-service so that it always has the same replica of entities with some fields as the monolithic system. There are 2 ways to do so:
Write an API for the microservice and fetch data through rest calls in
batches
Write an ETL service that directly connect to database of
monolithic and the database of microservice to load the data.
The drawback of the first approach is that it will include a number of Rest calls and would be slow as I could be having a million of records. The second approach breaks the microservices principle(Correct me if it is not the principal) as apart from microservice ETL service would be accessing the database.
Note:I only want to sync some fields from the record not all say if a record has 200 fields and in my service only 2 fields are being used, then I need to have all records with only those 3 fields.And number of records being used can be changed dynamically.Say after some times the service is using 4 fields than 3,then i need to bring that 4th field into the db of my microservice.
So can anyone suggest which approach is better?
The first approach is better in terms of low-coupling high-cohesion since you have a clear interface (the REST api) between what you expose from the monolith and the data inside the monolith. In the long run, it makes both the microservice and the monolith easier to maintain.
But there's a third approach that's especially suitable for data synchronisation: asynchronous integration. Basically you monolith would need to send out a stream of change data messages, e.g. to a message queue or something like kafka. These messages are the interface, so you get the same advantage of low-coupling as with the REST API. But you also get additional advantages.
You don't have the overhead of REST calls, just an asynchronous message listener.
If the monolith is down or slow responding, you microservice is not affected.
There is a problem however of bootstrapping: do you retro-actively need to generate events for everything that happened in the past, or can you start from some point in time and keep everything in sync from that point onwards?
What is your end goal here -
Is it to slowly migrate from Monolithic to Micro-services by distributing traffic between two systems.
or
On a fine day, completely cutover to new Micro-services.
If its second approach, I would do ETL for data migration.
If its First approach -
Implement an CDC/or just changes in monolithic service to publish the persistent operations to Messaging system (Kafka.Rabbit).
Implement the subscriber on Micro-services and update the DB.
Once confident on Pub/Sub implementation, redirect all reads to micro-services system.
Then slowly divert some percentage of persistent calls to micro-services which will do a rest call to old system to update old DB.
Once you are confide on new services and data quality and other requirements(performance), completely cutover to new Micro-services).
** you need to do historic sync before starting the Async Messaging process.
This is one way to smoothly cutover from old systems.
Why do you want to synchronise data between monolithic and microservice?
Are you rewriting monolithic to MicroService? If this is the case, I would prefer using ETL service for data synchronisation as this is more standardised for data synchronisation compared to rest calls.
Related
We have 13 years old monolithic java application using
Struts 2 for handling UI calls
JDBC/Spring JDBC Template for db calls
Spring DI
Tiles/JSP/Jquery for UI
Two deployables are created out of this single source code.
WAR for online application
JAR for running back-end jobs
The current UI is pretty old. Our goal is to redesign the application using microservices. We have identified modules which can run as separate microservice.
We have following questions in our mind
Which UI framework should we go for (Angular/React or a home grown one). Angular seems to be very slow and we need better performance as far as page loading is concerned.
Should UI/Javascript make call to backend web services directly or should there be a spring controller proxy in deployed WAR which kind of forwards UI calls to APIs. This will also help if a single UI calls requires getting/updating data from different microservice.
How should we cover microservice security aspect
Which load balancer should we go for if we want to have multiple instance of same microservice.
Since its a banking application, our organization does not allow using Elastic Search/Lucene for searching. So need suggestion for reporting using Oracle alone.
How should we run backend jobs?
There will also be a main payment microservice which will create payments. Since payments volume is huge hence it will require multiple instances. How will we manage user logged-in session. Should we go for in-memory distributed session store (may be memcache)
This is a very broad question. You need to get a consultant architect to understand your application in depth, because it is unlikely you will get meaningful in-depth answers here.
However as a rough guideline here are some brief answers:
Which UI framework should we go for (Angular/React or a home grown one). Angular seems to be very slow and we need better performance as far as page loading is concerned.
That depends on what the application actually needs to do. Angular is one of the leading frameworks, and is usually not slow at all. You might be doing something wrong (are you doing too many granular calls? is your backend slow?). React is also a strong contender, but seems to be losing popularity, although that is just a subjective opinion and could be wrong. Angular is a more feature complete framework, while React is more of a combination of tools. You would be just crazy if you think you can do a home grown one and bring it to the same maturity of these ready made tools.
Should UI/Javascript make call to backend web services directly or
should there be a spring controller proxy in deployed WAR which kind
of forwards UI calls to APIs. This will also help if a single UI calls
requires getting/updating data from different microservice.
A lot of larger microservice architectures often involve an API gateway. Then again it depends on your use case. You might also have an issue with CORS, so centralising calls through a proxy / API gateway, even if it is a simple reverse proxy (you don't need to develop it) might be a good idea.
How should we cover microservice security aspect.
Again no idea what your setup looks like. JWT is a common approach. I presume the authentication process itself uses some centralised LDAP / Exchange or similar process. Once you authenticate you can sign a token which you give to the client, which is then passed to the respective micro services in the HTTP authorization headers.
Which load balancer should we go for if we want to have multiple
instance of same microservice.
Depends on what you want. Are you deploying on a cloud based solution like AWS (in which case load balancing is provided by the infrastructure)? Are you going to deploy on a Kubernetes setup where load balancing and scaling is handled as part of its deployment fabric? Do you want client-side load balancing (comes part of Spring Cloud)?
Since its a banking application, our organization does not allow using
Elastic Search/Lucene for searching. So need suggestion for reporting
using Oracle alone.
Without knowledge of how the data on Oracle looks like and what the reporting requirements are, all solutions are possible.
How should we run backend jobs?
Depends on the infrastructure you choose. Everything is possible, from simple cron jobs, to cloud scheduling services, or integrated Java scheduling mechanisms like Quartz.
There will also be a main payment microservice which will create
payments. Since payments volume is huge hence it will require
multiple instances. How will we manage user logged-in session. Should
we go for in-memory distributed session store (may be memcache)
Not really. It will defeat the whole purpose of microservices. JWT tokens will be managed by the client's browser and expire automatically. You don't need to manage user logged-in session in such architectures.
As you have mentioned it's a banking site so security will be first priory. Here I have few suggestions for FE and BE.
FE : You better go with preactjs it's a react like library but much lighter and fast as compare to react. For ui you can go with styled components instead of using some heavy third party lib. This will also enhance performance and obviously CDNs for images and big files.
BE : As per your need you better go with hybrid solution node could be a good option.e.g. for sessions.
Setup an auth server and get you services validate user from there and it will be used in future for any kinda service .e.g. you will expose some kinda client API's.
User case for Auth : you can use redis for session info get user validated from auth server and add info to redis later check if user is logged in from redis this will reduce load from auth server. (I have used same strategy for a crypto exchange and went pretty well)
Load balancer : Don't have good familiarity with java but for node JS PM2 will do that for you not a big deal just one command and it will start multiple instances and will balance on it's own.
In case you have enormous traffic then you better go with some messaging service like rabbitmq this will reduce cost of servers by preventing you from scaling your servers.
BE Jobs : I have done that with node for extensive tasks and went quite well there you can use forking or spanning this will start a new instance for particular job and will be killed after completing it and you can easily generate logs along with that.
For further clarification I'm here :)
I am planning the develop of a microservice based architecture application and I decided to use kafka for the internal communicaton while I was reading the book Microservice Architecture by Ronnie Mitra; Matt McLarty; Mike Amundsen; Irakli Nadareishvili where they said:
letting microservices directly interact with message brokers (such as
RabbitMQ, etc.) is rarely a good idea. If two microservices are
directly communicating via a message-queue channel, they are sharing a
data space (the channel) and we have already talked, at length, about
the evils of two microservices sharing a data space. Instead, what we
can do is encapsulate message-passing behind an independent
microservice that can provide message-passing capability, in a loosely
coupled way, to all interested microservices.
I am using Netflix Eureka for Service registration and discovery, Zuul as edge server and Hystrix.
Said so, in practice, how can I implement that kind of microservice? How can I make my microservices indipendent from the communcation channel ( in this case Kafka)?
Actually I'm directly interacting with the channel, so I don't have an extra layer between my publishers/subscribers and kafka.
UPDATE 06/02/2018
to be more precise, we have a couple of microservices: one is publishing news on a topic (activemq, kafka...) and the other microservice is subscribed on that topic and doing some operations on the messages that are coming through. So we have these services that are coupled to the message broker (to the channel)... we have the the message broker's apis "embedeed" on our code and for example, if we want to change the message broker we have to change all the microservices that made use of the message broker's api. So, they are suggesting to use a microservice(in the picture I assume is the Events Hub) that is the "dispatcher" of the various messages. In this way it is the only component that interacts with the channel.
A general foreword - Don't do it if you don't need it. Introducing a queue system can be a big improvement if you are dealing with high number of events and events backing up issues etc. But if you don't face any issues you are probably better off with the lower complexity of a direct service communication.
Back to your question - It sounds like you want to abstract your communication with the queue because you are worried about the effort for replacing the queue with a different system - Is that correct?
In this case you can either do what you proposed - Develop a new service in the middle. This comes with all the baggage of a physical service (including deployment, scaling, etc).
Or the second alternative is to write a client library that abstracts the queue the way you want and allows you to reuse it in all services requiring to participate in the queue. This way you don't have to physically deploy another service for this purpose but you are still in full control of what your interface to the queue should look like and you have a single piece of code to incorporate changes (at least toward the direction of the queue). This would work given you are sure the app-facing side of the library can be stable enough.
But, again, don't do any of those in the first iteration when you are not sure you need all the complexity. (Over-engineering is a dangerous thing)
You should create a Interface lets say "Queue" which provide all functionalities which you want from Kafka or RabbitMQ, the create diff. impl like KafkaQueue and RabbitMQQueue of the Queue interface and inject the right impl which you want to use in your system.
In this your if new queue system is used , your existing code will not be changed
Creating another microservice is an extra overhead in this case
In a service architecture proper way to make your code independent out of constraints of communication channel is by having properly modeled self-sufficient messages. Historic examples would be WSDL in document mode, EDIFACT, HATEOAS etc. From this point of view microservices with spring-boot and kafka are just different implementation of same old thing done since mainframes ruled the world.
Essentially if you take a view of your app as blackbox asynchronous server; everything app does is receives events and produces new ones. It should not matter how events are raised within app. Http requests, xml within jms messages, json in kafka, whatever - all those things are just a way to pass events and business layer of application should respond only to a content of events.
So business layer is usually structured around some custom model/domain which are delivered as payload. Business layer is invoked/triggered by listener/producer layer which talks to communcation channel (kafka listener, http listener etc..). Aside from logging and enforcing security you should not have communication channel logic in app. I have seen unfortunate examples of business logic driven by by originating jms connection or parsing url of request. If you ever have this in your code you have failed to properly structure your code.
However that is easier to say than to implement. Some people are good at this level of modeling, and some never learn.
And there is no other way to learn but to try and fail.
I've got a legacy monolith application that I have to redesign. Unfortunately, the mandate is to redesign in bite-sized approaches.
At the moment, I need to implement a new public facing web-service for a particular module. I'm taking advantage of the situation to split it off into its own service. However, I am stuck with an issue. The server generated GUI (struts monolith) needs some data from my microservice. The obvious answer is to provide a REST interface (JSON/Protocol Buffers, etc) in the web service for my monolith to consume in order to provide results to the user. But is this my only choice?
Ideally, I would like to move to asynchronous communications between the monolith and the service, but what do I do when a user needs specific data from the microservice? How can one do that in an asynchronous process? With the user waiting on the GUI (a synchronous event), how can I have the communication protocol between the GUI and the webservice not be synchronous as well?
In this particular instance, the user will use the microservice to upload data in bulk (csv file). The microservice will process the data and provide options to the user to manipulate the data once ready (all via API calls). The business wants an option in the GUI to view/download the uploaded file. Essentially, the monolith (responsible for the GUI) needs to request the file from the service and return it to the user.
Is there any way to structure this design without using a synchronous IPC between the monolith and the microservice?
I started working with REST services recently. I have several tools joined into the framework of integrated tools. Tools communicate over the common component (CC) which handles their requests (using REST services) and is actually an interface between all the tools. For every POST request a new resource is created and stored into memory. Every time the CC goes off all the data is lost. For that case, I created an Apache Derby database to store all the resources. With every resource creation, entry is created in the database. Every time CC turns on it fetches all the data from the database and the data is regurally synced. The problem is that multiple tools can POST at almost the same time. How does REST handle these requests? I hoped that it manages the requests in a queue-like way, but from what I see it does it at the same time in a thread-like way. My database goes down instantly. Am I on the right track or something else could be wrong?
Is there a possible architecture that a developer can think of when it comes to extending a web application by introducing Web Services to the existing architecture or vice-verse. The main concern in this context is the data integrity and security.
The following images will suggest two approaches that a developer can think of.
This architecture indicates that all the request should be handled by an individual service layer. Therefore, only the service layer can communicate with the Data Base and satisfy request for both the web application and the gate way.
The second approach shows the web application is directly communicating with DB. For an example an Admin Portal. Meantime there can be an external web services also communicating with the DB. This approach may lead to Data Integrity Violating scenarios. However, introducing external web services might be easier than refactoring an existing Web Application to call a web service from developer end. Hence, can we still compromise for the long term consequences by having external web service and a separate web application instead of both the Web Application and the Gateway being catered with a single web service layer. Any reasonable comment on this would be appreciated.
You could build an API that has access to everything. In other word, the web application could work trough a rest/rpc api using Ajax/WebSockets.
Since everything goes through the API, data integrity shouldn't be enforced at any time. Also, you'll get a clear separation from Client, Api and Database.
This will allow you to replace the database by anything else without breaking other parts of your system.
I would personally advise to have at least a shared data access layer which handle data validation and persistence.
The best way would be however as defined in your first diagram with a shared service layer to factorize transaction management which should be defined at this level. You could so take advantage of custom transaction isolation and / or locking policy in order to ensure Data integrity. Public service layer methods could be in this case directly exposed as rest services and consumed by both mobile and web apps (gateway / API component is not necessary).
All of this will depend on the estimated time to refactor the legacy app architecture on a way and to duplicate data access logic (and business one ???) on the other. Of course the duplication will decrease maintainability and extensibility.
I've had to answer this question on several projects; there's a 3rd option which you don't mention, which is my favourite.
The problem with option 1 - "web services as persistence and business logic layer" is that it introduces a lot of additional moving parts into the design. Those moving parts are expensive - you have to write, test and maintain a lot more code, and very often the services you want from your web services to run your own application are not the ones that would make sense to a 3rd party developer.
You are also introducing a potentially significant performance and scalability risk - calling a web service which calls a database is measurably slower than just calling the database.
The second option - duplicating business logic across web app and service layer - has all the problems of duplication.
The option I prefer is to develop the business logic layer as a separate component, and have it used by both the web app and the web services; each application uses the component as a library. This means you have to have separate teams - the "library" team and the "app" teams - but you avoid duplication, and you avoid invoking a bunch of web services every time you have to render a web page.
The business logic layer is responsible for persistence - including making sure that database consistency is honoured, and managing transactions. As the business logic layer is shared between both the web app and the web services, this logic is concentrated into a single code base, and - ideally - made entirely transparent to the apps.
The web services now do far less. Their job is to handle incoming requests, translate them into method invocations on the business logic component, and returning any response data in the appropriate format (XML, JSON). They may offer "coarse grained" service requests and map them onto several more granular business logic methods. They may deal with authentication, authorization, request throttling etc. They just don't deal with the actual business logic.