Approaches to split scheduled jobs from microservices without code duplication

Approaches to split scheduled jobs from microservices without code duplication - java

Here're goals i'm trying to achieve:
Take the scheduled jobs out of microservice because it can and would harm timings/performance
Execute jobs in a separate computation cluster aka workers
Avoid code duplication: i want to keep all my business logic in one Service, all DB-related operations in one Dao, do not write additional services/daos for jobs
Avoid dependency management problems: different jobs may require different libs/versions/etc. For instance, job from ServiceA may use javax.annotation-api while job originated from ServiceB may use jakarta.annotation-api. Making a worker depend both on ServiceA and ServiceB will cause build or runtime problems.
Are there any approaches/libraries/solutions to achieve all the goals at the same time?
UPD:
Both Temporal.io and quartz are not quite what I need - they both require worker to depend on workflow tasks.
I can imagine that I’m approaching the issue I face in incorrect way, so architectural advises are also appreciated

From architectural perspective, expose service (business logic) via API.
Have schedulers run on separate instance or if you are using some of the popular cloud solutions have their FaaS (function as a service in your case scheduler) trigger service API via HTTP (any or dedicated instance).
Azure -> user azure functions
AWS -> lambda functions
Google Cloud -> Google Cloud Functions
All of the above have comprehensive guide how to create scheduled function aka trigger.
Hope this helps and I'm not off topic.

From my perspective you have the option to use one of three possible solutions:
Most straight forward - Ensure that service logic which is required in the jobs also implements a local API (programming API).
As such it can act as and be imported as a library and reused in jobs without code duplication.
If you have a larger development organization you also want to make sure that such libraries are correctly version managed and version releases are pre-planned, which allows the teams using the libraries to treat them like they would third party libraries.
Also there is no magic - You would have to work through any build/dependency problems if there would be conflicts. (Since your question sounds like this is a deal breaker, let's take a look at the other solutions.)
The second solution would be to provide a wrapper for each service logic that allows to access functionality via CLI. That means you don't have to import the libraries, but rather execute them as jars/executables through the CLI. This would allow you to use the same code but avoid dependency problems.
(You will still have to deal with version management and version upgrades, etc.)
In case you use containerized deployments/hosting you can also consider to bundle up multiple containers together just for your jobs, where each job gets its own private service container instances for use during the job. Kubernetes and Docker Compose for example have options to run such multi-container deployments/jobs.
That solution would allow you to reuse the same services as they run for other purposes, but you have to make sure that they are configurable enough to work in this scenario.
One problem that all of the approaches have is that you have to make sure there are no runtime conflicts between your jobs and the deployed regular services. (For example state conflicts)
In terms of how to execute jobs it will depend on your deployment scenario. Kubernetes has an option to run containers as jobs natively, which makes it easy to bundle multiple jars, etc. But it is always an option to deploy a dedicated scheduler or workflow tool like Apache Airflow to run your jobs.

Related

Spring cloud function architecture with multiple functions

I just started a new project where I'm going to use Java, Spring cloud functions and AWS Lambda.
It's my first time building a serverless application and I've been looking at different example projects and tutorials on how to get started.
However, the projects I've found have been so small that it's hard to understand how to map it to a real project.
As I understand it you build a jar file and upload it to AWS Lambda where you specify which function to run.
However, as the project grows, more and more functions that aren't even going to run (unreachable code) will make the jar bigger and bigger and cause each Lambda startup to be slower and slower?
I could create separate modules for each Lambda function with its own Application class in order to build separate jars, but it doesn't feel like the intended architecture.
Also, I would like to be able to run all of the functions locally using tomcat in a single application.
I guess I could build a separate module specifically designed to run locally, but again it doesn't feel like the intended architecture.
Any suggestions or references to best practices would be greatly appreciated.

TL;DR:
One JAR per function, not all functions in one JAR.
Use Maven modules. One module per Lambda function.
Don't run the Lambda locally, use unit tests with mocks.
Deploy to AWS to test if the Lambda works as intended.
Reading the question I get the feeling that there are a few misconceptions on how AWS Lambda works, that need to be addressed first.
However, as the project grows, more and more functions that aren't even going to run (unreachable code) will make the jar bigger and bigger [...]
You do not deploy a single JAR that contains all your Lambda functions. Every function is deployed as a single JAR. So if you have 20 Lambda functions, you deploy 20 JAR files.
The size of the JAR file is determined by the individual dependencies of the function. A function might use a specific dependency an another might not. So JAR size will differ depending on your dependencies.
One way to improve this is to split your code from the dependencies, by putting the dependencies in Lambda layers. This way, you only deploy a small JAR with your code. The dependency JAR should only be deployed, when the dependencies have been updated. Unfortunately, this will make deployments more complex, but it is doable.
I could create separate modules for each Lambda function with its own Application class in order to build separate jars, but it doesn't feel like the intended architecture.
That's what I'd recommend. And it is more or less the only way. AWS Lambda has a 1 to 1 relationship between the JAR and the function. One Lambda function, per JAR. If you need a second Lambda function, you need to create it and deploy another JAR.
Also, I would like to be able to run all of the functions locally using tomcat in a single application. I guess I could build a separate module specifically designed to run locally, but again it doesn't feel like the intended architecture.
There are tools to run Lambdas locally, like the serverless framework. But running all the Lambdas in a Tomcat is probably going to be hard work.
In general, running Lambdas locally is something I'd not recommend. Write unit tests to run the code locally and deploy to AWS to test the Lambda. There is not really any better way I can think of to do testing efficiently.
Most Lambdas communicate with other services, like DynamoDB, S3 or RDS. So how would you run those locally? There are options, but it just makes everything more and more complicated. And what about services that you can't easily emulate locally (EventBridge, IAM, etc.)? That's why in my experience, running serverless applications locally is unrealistic and will not give you confidence that they'll work once deployed. So why not deploy during development and test the "real" thing?

From my experience, I would recommend using multiple Maven modules, one function per Maven module. Create shared modules for common logic. This approach would require you to implement some smart deployment pipeline to tell which function must be deployed if you change a common lib shared between many functions. If you don't have shared modules using just a hash on /src might be enough, otherwise, you need to add some metadata that describes the relation between Maven modules. I haven't investigated it but it might be possible to get the relation between modules from Maven to feed in your CI/CD so you use build tool to help sort out CD
It's possible to keep all functions within the same JAR and deploy one Jar multiple times with different entry points. The downside is you have tight coupling between all functions. Changes for one function might have some side effects on the other functions. Also coupling all functions within one JAR might make your function slower as it would create one Spring context containing all different beans. Also, Spring Boot approach with autoconfiguration would not help when for one function you need DB connection configured and for another, you need messaging configured. Ofc. you might mitigate some of the downsides but I think the idea of functions is similar to microservices to have a small unite of deployment, well encapsulated.
Finally, you could create a repository per function. It's the most flexible solution but also it might bring some caveats. Ultimately I could imagine every function uses a different version of Spring Boot, some functions are written in Java and some in Kotlin etc. Every function has a slightly different way of testing and running. This all would make maintenance very hard for you in long run. I believe in keeping all functions within one repo with a common set of libraries and configurations would benefit you in terms of cost of maintenance.
Thankfully to Spring Cloud function abstraction, you can use standalone web application by importing the required starter https://docs.spring.io/spring-cloud-function/docs/current/reference/html/spring-cloud-function.html#_standalone_web_applications. This will allow you to trigger your function as HTTP endpoint. Additionally, Spring Cloud function provides Maven plugin which allows you to run the function locally (only GCP) function:run https://docs.spring.io/spring-cloud-function/docs/current/reference/html/spring-cloud-function.html#_getting_started_3

Migrating multi-module project to microservices?

I have multi module application. To be more explicit these are maven modules where high level modules depends on low level modules.
Below are the some of the modules :-
user-management
common-services
utils
emails
For example :- If user management module wants to use any services from utils module, it can call its services as dependency of utils is already injected under user-management.
To convert/call my project truly following microserives architecture, I believe i need to convert each module as independently deployable services where each module is a war module
and provides its services over http(mainly as resful web services) . Is that correct or anything else need to be taken care of as well ?
Probably each modules now has to be secured and authentication layer as well ?
If that's the crux of microservices I really do not understand when someone ask whether you have worked on microservices as to me Its not tool/platform/framework but a simple
concept to divide you monolithic application in to smaller set of deployable modules whose services is available through HTTP. Is n't it ? May be its another buzz word.
Update:-
Obviously there are adavantages going micro services way like independent unit testablemodule, scalable as it can be deployed on separate machine, loose coupling etc but I see I need to handle two complex concerns also
Authentication:- For each module I need to ensure it authenticates the request which is not the case right now
Transaction:- I can not maintain the transaction atomicity across different services which I could do very easily at present

What you have understood is perfectly good and you have found the right area where microservices are getting complex over monoliths (Distributed Transaction) but let me clear up some points about microservices.
Microservice doesn't mean independent services exposed over HTTP: A microservice can communicate with other services either in a synchronous or asynchronous way, so REST is one of the solutions and it is applicable for synchronous communication but you can perform asynchronous communication too like message-driven using Kafka or hornetq etc. In synchronous communication an underlying service can call over Thrift protocol also.
Microservice following SRP: The beauty of microservices is that each service concentrates over only one business domain use case, so it handles only one domain object's functionality. But utils module is for common methods so every microservice depends on it. So even a small change in the utils module needs to build all other microservices so it is a violation of the microservices 12 principles so dissolve the utils service and make it local with each service.
Handling Authentication: To be specific a microservice can be one of three types:
a. Core service: Just perform a domain operation (like account creation/updation/deletion)
b. Aggregate service: Call one or more core service, gather results and perform some operation on it.
c. Edge service: Exposed to a client (like Mobile/browser etc). We sometimes call it a gateway service; the crux of this service is take a user request and based on the URL forward it to an actual microservice. So it is the ideal place to put authentication if it is common for all microservices.
Handling Distributed Transaction: Yes this is the hardest part of microservices but you can achieve it through an event-driven/message-driven way. Every action pops an event; a subscriber of this event receives the same and does some operation and generates another event. In case of failure it generates a reverse event which compensates the first event created.
For example, say from micoservice A we debited 100 rupees so create an AccountDebited event. Now in microservice B we try to credit the account. If it is successful we create AccountCredited which is received by A and creates another event AmountTransfered. In case of failure we generate an AccountCreditedFailed event which is received by A and generates a reverse event - AccountSpecialCredit - which maintains the atomicity.

What you have is mostly correct, but you appear to be considering some things as requirements when they are not, and you are forgetting one very important characteristic that microservices are supposed to have.
The main characteristics of microservices are statelessness and independence. Whether they are "WAR" modules and whether they provide their services over "HTTP" (and certainly whether they are RESTful) are secondary concerns and you may hear arguments to the contrary.
Statelessness means that no individual microservice may contain state. (Except for caches.) Microservices are supposed to always delegate the task of persisting data to some database module so they don't keep any state in memory. The idea is that this way, if one microservice fails, (or if an entire machine containing many microservices fails,) you can just route incoming requests to another instance (or another machine) and everything will continue working.
(Of course, if you want my opinion, it is just a cowardly acknowledgement of the fact that we don't know how to write reliable highly concurrent software, but the database guys are smart and they seem to have figured it all out, so we will just delegate the problem of maintaining our state to the software that they have written.)

In my opinion microservice architecture marries well with DDD
I think you should consider your multi-module project as a "monolith" and do your microservice separation based on domain concepts and not on maven projects.
Ex: Do not create a microservice called "utils" but rather a microservice called "accounts" or "user-management" or whatever your domain is. I think without domain driven development it kinda loses its usefulness.
It is really easy afterwards to work on different aspects of the domain knowing that it is separated by the rest. You should check out hexagonal architecture by Alistair Cockburn

How you split your application depends on the type of modules you have. If the module contains business logic than it makes sense to create a new service and communicate via Http or Messaging. On the other hand if your module has no business logic, but just a set of helper functions in might be better to extract it to a separate public/private maven package and just use it as a dependency.
Yes, microservice is a buzz-word that just recently became popular, but a concept has been around for a while. But it also comes with more than that. While it gives a benefits of scaling and independent service deployments, it comes with a price of complexity of managing and orchestrating big amount services.
For example in monolith application when you just call a function from another module you know for sure that it is always available for calling. With microservices some of the services might go down because of disruption or deployment, in which case you have to think about handling these situations gracefully (for example apply circuit breaker pattern).
There are many other things to consider when doing microservices and there are many literature available on this topic. I read Microservices: From Design to Deployment from Nginx. It's short and sweet.
So when people ask you Have you worked with microservices before? I guess they want to know if you familiar and had some experience with all the pitfalls of this concept.

In one way you are correct, in Microservices from outside it looks like this. When you go inside as you rightly mention about two complex concern :
Authentication:- For each module I need to ensure it authenticates the request which is not the case right now
Transaction:- I can not maintain the transaction atomicity across different services which I could do very easily at present
Apart from this there are various things which one need to understand otherwise doing and deploying microservices would be very tough:
I am mentioning some of them here, complete list you can see from my post:
What exactly is a microservice? Some said it should not exceed 1,000 lines of code.Some say it should fit one bounded context (if you don't know what a bounded context is, don't bother with it right now; keep reading).
Even before deciding on what the "micro"service will be, what exactly is a service?
Microservices do not allow updating multiple entities at once; how will I maintain consistency between entities?
Should I have a single database cluster for all my microservices?
What is this eventual consistency thing everyone is talking about?
How will I collate data which is composed of multiple entities residing in different services?
What would happen if one service goes down? How would the dependent services behave?
Should I make a sync invocation between microservices to always get consistent data?
How will I manage version upgrades to a few or all microservices? Is it always possible to do it without downtime?
And the last unavoidable question - how do I test the entire application as an integrated application?
How to do circuit breaking? (if one service down should not impact other)
CI/CD pipelines and .......
What we understood while starting our journey in microservices are:
Design pattern for breaking business problem in microservices is Domain Driven Design
Platform which support microservices development. (we used Lagom for this) which address some of the above concern out of the box
So in all while moving towards multiple process arch. communicating using Rest or some other methods, new considerations needs to be taken care which are not directly visible in Monolithic, and people want to whether you know about those considerations or not.

Dropwizard: handling multiple dropwizard instances

As I'm developing micro-services using Dropwizard I'm trying to find a balance between having many resources on one running instance/application of Dropwizard versus many instances.
For example - I have a project-A having 3 resources. In another project-B I would like to use one of the resources in project-A. The resource in common is related to user data.
Now I have options like :
make http call to user resource in project-A from project-B. I can use client approach of dropwizard here
as user resource is common - I can take it out from project-A to say project-C. And the I need to create client code in both project-A and project-B
i can extract jar containing user code and use in project-B. this will avoid making http calls.
Another point where I would like to have expert opinion is how to balance/minimize network calls associated with communication between different instances of microservice. In general should one use http to communicate between different instances? or can any other inter-process communication approach be used for performance perse [particularly if different instances are on same system]?
I feel this could be common problem/confusion for new comers in the world of micro-services. And hence would like to know any general guideline or best practices.
many thanks
Pradeep

make http call to user resource in project-A from project-B. I can use client approach of dropwizard here
I would not pursue this option if I were you. It's going to slow down your service unnecessarily, create potential logging headaches, and just feels wrong. The only time this might make sense is when the code is out of your control (but even then there's probably a better solution).
as user resource is common - I can take it out from project-A to say project-C. And the I need to create client code in both project-A and project-B
i can extract jar containing user code and use in project-B. this will avoid making http calls.
It sounds like project A and project B are logically different units with some common dependencies. It might make sense to consider a multi-module project (or a multi-module Maven project if you're using Maven). You could have a module containing any common code (and resources) that gets referenced by separate project modules. This is where Maven really excels since it can manage all these dependencies for you. It's like a combination of the last two options you listed.

One of the main advantages of micro-services is the opportunity to release and deploy each of them separately. Whatever option you choose make sure you don't loose this property.
Another property of a micro-service should be that it has only one responsibility. So it is all about finding the right boundaries for your services (in DDD-terms 'bounded contexts'), and indeed it is not easy to find the right boundaries. It is a balancing act.
For instance in your theoretical case:
If the communication between A and C will be very chatty, then it is not a great idea to extract C.
If A and C have a different lifecycle (business-wise), then it is a good idea to extract C.

That's essentially a design choice: are you ready to trade the simplicity of each one of your small services against the complexity of having to orchestrate them and the outcome of the overall latency.
If you choose the small service approach, you could stick to the documentation guidelines at http://dropwizard.io/manual/core.html#organizing-your-project : 1 project with 3 modules for api (that can be referenced from consumers), application and the optional client (also potentially used in consumers)
Other questions you will have to answer:
- each of your service will be hosted on a separate SCM repository...or not
- each of your service could (should?) have it's own version

If the user you feel is bounded context as if user management like user registration, authentication etc. This can certainly be a separate micro service. However you should invoke the user API from a single API gateway and convert it to a JWT token and pass it on to your other APIs in header.
In another case if your Business use case requires to invoke multiple micro services that logic (orchestration) should be developed in composite service layer.
Regarding inter micro service communication - talking each other through API calls takes you back to "point to point" communication introducing a lot of complexity and difficult to manage for a large project.
As per bounded context theory none of the transaction should go beyond one micro service. However in real world scenarios I think we still have dependency at least for the validation of the reference data. Example order service needs to validate product IDs. In this case the best I can think is to have eventing between microservices to feed each other with the reference data. You can try event sourcing for generating business events and async io for publish / subscribe.
Thanks,
Amit

Java framework for multiple scheduled jobs

I'd like to ask if anyone can suggest proper framework for backend scheduled jobs. Currently whole backend is based on multiple scheduled jobs. All the jobs are written in java and deployed on linux machine. Those jobs are controlled by cron (using crontab) and simple bash scripts as a wrappers so basically I have a couple of jars (they all are spring based uber-jars [with dependencies]) which are fired periodically. Those java modules are doing various things like processing csv/xml files, getting data from webservices, calling external APIs (HTTP) and collecting data from FTP.
Is there a framework so that I would be able to have all the modules in one place and simply manage them? I was thinking about camel (I used it before) but the must have for me is:
ability to deploy/undeploy single module without interrupting the rest of the modules.
ability to reschedule jobs (cron expression) in the runtime.
Camel is almost perfect because it has all the features for external integration (FTP, HTTP, WS) and also easy quartz integration. I don't know If it's achievable to have multiple modules and deploy/undeploy them in the runtime.
Maybe there is some other frameworks which are going to fit my needs. Please suggest.

If planning to do this in Java/Scala, try using Quartz
It (also) offers a CRON like syntax for scheduling jobs.
We have our "modules" deployed as webapps on a simple servlet container (jetty) and trigger actions on them using a Quartz scheduler (also in a webapp to expose a simple UI)

For managing the loading and unloading of modules, you might want to look at something based on OSGI like Apache ServiceMix which seems especially good with module management in the way you're describing (I admit I don't quite understand your requirement for loading and unloading modules). Add Quartz to ServiceMix for scheduling jobs.

Architecture - Multiple web apps operating on the same data

I'm asking for a suitable architecture for the following Java web application:
The goal is to build several web applications which all operate on the same data. Suppose a banking system in which account data can be accessed by different web applications; it can be accessed by customers (online banking), by service personal (mostly read) and by the account administration department (admin tool). These applications run as separate web applications on different machines but they use the same data and a set of common data manipulation and search queries.
A possible approach is to build a core application which fits the common needs of the clients, namely data storage, manipulation and search facilities. The clients can then call this core application to fulfil their requests. The requirement is the applications are build on top of a Wicket/Spring/Hibernate stack as WARs.
To get a picture, here are some of the possible approaches we thought of:
A The monolithic approach. Build one huge web application that fits all needs (this is not really an option)
B The API approach. Build a core database access API (JAR) for data access/manipulation. Each web application is build as a separate WAR which uses the API to access a database. There is no separate core application.
C RMI approach. The core application runs as a standalone application (possibly a WAR) and offers services via RMI (or HttpInvoker).
D WS approach. Just like C but replace RMI with Web Services
E OSGi approach. Build all the components as OSGi modules and which run in an OSGi container. Possibly use SpringSource dm Server or ModuleFusion. This approach was not an option for us for some reasons ...
Hope I could make clear the problem. We are just going the with option B, but I'm not very confident with it. What are your opinions? Any other solutions? What are the drawbacks of each solution?

I think that you have to go in the oppposite direction - from the bottom up. Of course, you have to go forth and back to verify that everything is playing, but here is the general direction:
Think about your data - DB scheme, how transactions are important (for example in banking systems everything is about transactions) etc.
Then define common access method - from set of stored procedures to distributed transaction engine...
Next step is a business logic/presentation - what could be generalized and what is a subject of customization.
And the final stage are the interfaces, visualisation and reports

B, C, and D are all just different ways to accomplish the same thing.
My first thought would be to simply have all consumer code connecting to a common database. This is certainly doable, and would eliminate the code you don't want to place in the middle. The drawback, of course, is that if the schema changes, all consumers need to be updated.
Another solution you may want to consider is giving each consumer its own database, using some sort of replication to keep them in sync.

It looks like A and E are out of the picture as you have stated in your question for various reasons. Option A would be one huge application which would make maintenance difficult in the future.
B, C and D are essentially the same architecturally since they involve remote access to common libraries from the various web applications, the only difference is the transport mechanism. I would recommend implementing this in EJB 3 or Spring if possible instead of with your own RMI libraries since either of these provide a good framework over RMI / Webservices.
So I think this problem basically boils down to the following two options:
1) Include the business and DAO layer classes as a common jar included in the deployment of all web applications.
Advantages:
Deployment is easier.
Applications will perform better initially since there is no remote access to other servers.
Disadvantages:
You cannot add more hardware to the middle tier specifically (service and DAO layers) since it is included in each web application.
Other business teams in the organisation will not have access to your business services since there is no remote interface.
2) Deploy the business service and DAO layer classes in a separate application server and expose business methods remotely.
Advantages:
You can scale up the business service and DAO layer as needed depending on load from the various web applications calling it.
Other applications in the organisation can make use of your interfaces if needed.
More scalable
You get all the advantages of Java EE.
Disadvantages:
More complex deployment.
Another server to maintain and monitor.
Could be slower since calls will be made over the network although this shouldn't be too much of a problem.
In both cases if the interfaces change the client code will need to change so this isn't a factor in the decision. Transactions should be handled on the business service method level so this shouldn't be a factor either.
I think it depends on the size of the applications as well and how scalable the solution needs to be to warrant the extra complexity of option 2 above.

I think you need to have a separate application that all the client applications will use as their data layer. The reason for this is that you want to ensure they're all accessing the database in the same way. There are also some race conditions you can get into that database transactions may not be able to prevent. The other reason is that using the database as a form of RPC is a known antipattern. If all your apps access the database directly, you will almost inevitably end up with some "event" table that the various applications poll periodically... don't do that.

Apart from the provided responses, if you are considering having multiple applications working with the database at the same time, consider a distributed cache as part of your solution, as well. The beauty of the distributed cache is that it can be accessed by multiple applications at the same time, apart from being distributed. I am not sure if this holds true for all of the Java variations, such as Ehcache, etc, as I do not come from a Java background.
What we are currently doing is abstracting the data a level further than before. We now have a DAL that can be accessed directly, but we have put a "Model Factory" in front of the DAL. The purpose of the Model Factory is to broker both the cache and the data layer, acting as a passthrough. So, the caller always calls the Model Factory and not the DAL or caching code directly. This abstraction layer will basically retrieve data from the DAL on a cache miss without adding the complexity to the API.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.