I've spent some time today on stackoverflow.com and haven't found an answer for my challenge.
The challenge
I would like to create a Spring Boot based microservice I can scale easily. That microservice will be writing some entries (e.g. Product) to database and reading them. This service will be deployed as a k8s deployment . Microservice saves data in AWS RDS MySQL instance. I will be scaling the microservice by embedded k8s Deployment mechanisms.
Questions
How can I create an spring boot based app that will handle saving data across many threads and across many instances of the app?
I've read some posts that in that case there should be some queue and from the queue only once instance and one thread should save the data but I guess it's cumbersome. I expect that there will be more and more traffic and as a consequence -> more and more messages in a queue to process.
Can you recommend some books ideally about that problem (in my words "multithreading and multi-instances writes synchronization problem).
Thank you for any help.
Related
This problem has been in my head for a while and I can't seem to understand the logic when I find solutions.
Heres the deal:
I'm currently working on a simple application that has been split up into 2 microservices.
The application is very similar to the task of managing software Trello.
There's a microservice solely used for storing User information (User service)
Another microservice is responsible for the Boards (Boards hold lists and lists hold tasks but that's not relevant to my question). Both microservices use their own database.
All of this will be hosted on AWS, the code is written in Java and I use Hibernate to generate the databases.
My problem:
How do I make use of the User service and have a Board be used by multiple User entities?
I understand the practice of using a many-to-many table which stores the BoardId's together with the UserId's, but what would happen if I were to remove a user that's connected to a board. There's no logical connection between the User that's in the user database and the userId that's stored by the board.
(A user can be signed to multiple boards and vice-versa)
My questions in short: How does this look in the database?
How does the Boards service access the User service and save Users to Boards?
What happens when I delete a User on the User microservice?
This seems to be a typical Microservice Communication related problem. As per my knowledge and experience, there is a solution i.e. to introduce a message broker in the architecture which can perform actions on events like say USER_DELETED event can be passed to board microservice which will handle board related db changes.
Implementing this can feel scary for you at the start, but nowadays it is not that difficult with all the very supportive internet communities helping and coming up with new solutions every day.
If you are using Java for microservices with spring boot, then there are solutions like Apache Kafka, Hazelcast(quite cost effective).
If you are using NodeJS for micorservices, then NATS is one of the libraries which I have used recently.
I have a service which is deployed as microservice and a mongodb with some documents with "few" states, for example: READY, RUNNING, COMPLETED. I need to pick the documents with state "READY" and then process them. But with multiple instances running there is high possibility of processing the "duplicates". I have seen the below thread, but it is only concerned about one instance only picking up tasks.
Spring boot Webservice / Microservices and scheduling
Above talks about solution using Hazlecast and mongodb. But what I am looking at is that all instances wait for the lock, get their own "documents (non-duplicates) and process them. I have checked the various documents and unfortunately I am not able to find any solution.
One of the option I thought is to introduce Kafka, where we can "assign" specific tasks to specific consumers. But before opting would like to see if we any solutions which can be implemented using simple methods such as database locks etc. Any pointers towards this are highly appreciated.
I have a giant monolithic which has around a million entities. I want to sync data to the micro-service so that it always has the same replica of entities with some fields as the monolithic system. There are 2 ways to do so:
Write an API for the microservice and fetch data through rest calls in
batches
Write an ETL service that directly connect to database of
monolithic and the database of microservice to load the data.
The drawback of the first approach is that it will include a number of Rest calls and would be slow as I could be having a million of records. The second approach breaks the microservices principle(Correct me if it is not the principal) as apart from microservice ETL service would be accessing the database.
Note:I only want to sync some fields from the record not all say if a record has 200 fields and in my service only 2 fields are being used, then I need to have all records with only those 3 fields.And number of records being used can be changed dynamically.Say after some times the service is using 4 fields than 3,then i need to bring that 4th field into the db of my microservice.
So can anyone suggest which approach is better?
The first approach is better in terms of low-coupling high-cohesion since you have a clear interface (the REST api) between what you expose from the monolith and the data inside the monolith. In the long run, it makes both the microservice and the monolith easier to maintain.
But there's a third approach that's especially suitable for data synchronisation: asynchronous integration. Basically you monolith would need to send out a stream of change data messages, e.g. to a message queue or something like kafka. These messages are the interface, so you get the same advantage of low-coupling as with the REST API. But you also get additional advantages.
You don't have the overhead of REST calls, just an asynchronous message listener.
If the monolith is down or slow responding, you microservice is not affected.
There is a problem however of bootstrapping: do you retro-actively need to generate events for everything that happened in the past, or can you start from some point in time and keep everything in sync from that point onwards?
What is your end goal here -
Is it to slowly migrate from Monolithic to Micro-services by distributing traffic between two systems.
or
On a fine day, completely cutover to new Micro-services.
If its second approach, I would do ETL for data migration.
If its First approach -
Implement an CDC/or just changes in monolithic service to publish the persistent operations to Messaging system (Kafka.Rabbit).
Implement the subscriber on Micro-services and update the DB.
Once confident on Pub/Sub implementation, redirect all reads to micro-services system.
Then slowly divert some percentage of persistent calls to micro-services which will do a rest call to old system to update old DB.
Once you are confide on new services and data quality and other requirements(performance), completely cutover to new Micro-services).
** you need to do historic sync before starting the Async Messaging process.
This is one way to smoothly cutover from old systems.
Why do you want to synchronise data between monolithic and microservice?
Are you rewriting monolithic to MicroService? If this is the case, I would prefer using ETL service for data synchronisation as this is more standardised for data synchronisation compared to rest calls.
I have a requirement to create around 10 Spring Batch jobs, which will consists of a reader and a writer. All readers read data from some different Oracle DB and write into a different Oracle Db(Source and destination servers are different). And the Spring jobs are implemented using Spring Boot. Also all 10+ jobs would be packaged into a single Jar File. So far fine.
Now the client also wants some UI to monitor the job status and act as a job organizer. I gone through the Spring Data flow Server documentation for UI requirement. But I'm not sure whether it'll serve the purpose, or is there any other alternative option available for monitoring the job status, stop and start the jobs whenever required from the UI.
Also how could I separate the the 10+ jobs inside a single Jar in the Spring Data Flow Server if it's the only option for an UI.
Thanks in advance.
I don't have reputation to add a comment. So, I am posting answer here. Although I know this is not the way to share reference link as an answer.
This might help you:
spring-batch-job-monitoring-with-angular-front-end-real-time-progress-bar
Observability of spring batch jobs is given by data that are persisted by the framework in a relational database... instances..executions..timestamps...read count..write count....
You have different way to exploit these data. SQL client, JMX, spring batch api (JobExplorer, JobOperator), spring admin (deprecated in favor of cloud data flow server).
Data flow is an orchestrator allowing you to execute data pipelines with streams and tasks(finite and short lived/monitored services). For your jobs we can imagine wrap each jobs in tasks and create a multitask pipeline. Data flow gives you status of each executions.
You can also expose your monitoring data by pushing them as metrics in an influxDb for instance...
We have got a web based Java application which we are planning to migrate to cloud with an intention that multiple clients will be using it in a SaaS based environment. The current architecture of the application is quite asynchronous in nature. There are 4 different modules, each having a database of its own. When there is a need of data exchange between the modules we push the data using Pentaho and make use of a directory structure to store the interim data file, which is then picked up by the other module to populate its database. Given the nature of our application this asynchronous communication is very important for us.
Now we are facing a couple of challenges while migrating this application to cloud:
We are planning to use Multi Tenancy on our database server, but how do we ensure that the flat files we use for transferring the data between different modules are also channelized to their respective tenants in the DB.
Since we are planning to host this in cloud, would seek your views, if keeping a text file on a cloud server would be safe from a data security perspective.
File storage in cloud is safe and you can use control IAM roles setup to control the permissions of a file. Cloud providers like Google (Cloud storage), Amazon (AWS S3), etc provides a secure and scalable infrastructure to maintain files in the cloud.
In general setup, cloud storage provides you with buckets which are tagged with a global unique identification. For a multi-tenant setup you can create multiple buckets for individual tenants and store the necessary data feeds in it. Next, you can have jobs batch or streaming jobs using kettle (Pentaho) to push it to the right database based on the unique bucket definition.
Alternatively, you can also push (like other answers) to a streaming setup (like ActiveMQ, Kafka, etc) with user specific topics and have a streaming service (using java or pentaho) to ingest the data to respective database based on the topic.
Hope this helps :)
I cannot realistically give any specific advice without knowing more
about your system. However, based on my experience, I would
recommend switching to message queues, something like Kafka would
work nicely.
Yes, cloud providers offer enough security for static file storage. You can
limit access however you see fit, for example using AWS S3.
1- The multi tenancy may create a bit of issue while transferring the files. But from what information you have given the process of flat file movement across application will not be impacted. Still you can think of moving to MQ mode for passing the data across.
2-From data security view, AWS provides lot of features at access level, MFA, etc. If it needs to be highly secured i would recommend to get AWS Private cloud where nothing is shared with any one at any level.