Coordinating dependent microservices in java

Coordinating dependent microservices in java - java

I have to coordinate 5 separate microservices e.g. A,B,C,D,E
I need to create a coordinator which might monitor a queue for new jobs for A. If A completes ok then a rest request should be sent to B then if everything is ok (happy path) then C is called all the way down to E.
However B,C etc might fail for one reason or another e.g. end point is down or credentials are insufficient causing the flow to fail at a particular stage. I'd like to be able to create something that could check the status of failed job and rerun again e.g. lets try B again, ok now it works the flow would then continue.
Any tips or advice for patterns / frameworks to do this. I'd like something fairly simple and not over complex.
I've already looked briefly at Netflix Conductor / Camunda but ideally I'd like something a bit less complex.
Thanks
W

Any tips or advice for patterns / frameworks to do this. I'd like something fairly simple and not over complex.
What you describe is the good ol' domain of A,B,C,D and E. Because the dependencies and engagement rules between the letters are complex enough, it's good to create a dedicated service for this domain. It could be as simple as this overarching service just being triggered by queue events.
The only other alternative is to do more on the client side and organize the service calls from there. But that isn't feasible in every domain for security reasons or other issues.
And since it sounds like you already got an event queue going, I'll not recommend one (Kafka).

One way apart from Camunda, Conductor is to send a event from Service A on some Messaging Queue (eg. lets say kafka ) which provides at least once delivery semantics.
Then write a consumer which receive the event and do the orchestration part (talking to service B,C,D,E).
As these all operations needs to be idempotent.First before starting orchestration create a RequestAgg. for the event from A and keep updating its state to represent where you reach in your orchestration journey.
Now even if the other services are down or your node goes down. This should either reach the end or you should write functions to rollback as well.
And to check the states and debug , you could see the read model of RequestAgg.

Related

simplest application possible that needs multiple (two) JVMs

I have an actor system "Main" running potentially forever. This main actor understands "Snapshot" or "Stop" messages (defined by me).
I would like to create a bash script that, while Main actor is running, launches a second (short lived) system, or actor, or whatever and sends a snapshot or stop message to the Main actor.
With akka classic that was very easy with actorSelection from a secondary actor
ActorRef mainActorRef = Await.result(system.actorSelection("akka.main.actor.path").resolveOne(timeout));
mainActorRef.send(new StopMessage() or new SnapsthotMessage());
What is the analogous and hopefully equally easy solution in akka typed?

Ok, let's try to sort this mess a bit... First of all, your question is highly unclear:
In the title, you ask for something based on two JVMs, but in the text you ask for a "second (short lived) system, or actor, or whatever". No clue if multiple JVMs are a requirement or just an idea to solve this. Additionally, your example code is something that - disregarding clustering - works in one JVM and you also only mention a second "actor" there.
So, if the requirement is using two JVMs, then I would suggest making it more clear in what way, why, etc. Then people can also actually provide help for that part.
For now, let me assume you want to simply have...
A (typed) actor system
...that can somehow process StopMessage/SnapshotMessage...
...both of which can be triggered from the outside
The way you can do this very simply is the usual typed way:
Define a RootGuardian actor that accepts those two messages (that actor is basically what the implicit /user actor was in classic) - you have to do that for your Typed actor system anyway (because to setup a typed actor system, you supply the behavior of the RootGuardian).
Let it create the needed child actors to process those messages (either at start or when needed). Of course, in your simple example, the root guardian can also process these messages itself, but an actorsystem with only one actor is not a very typical use-case.
Let it delegate the messages to the appropriate child actor(s)
Add a simple api endpoint to call system.tell ( ... ) to send the message into the system, where your RootGuardian actor will delegate it correctly.
Use curl to call your api endpoint (or use any other way to communicate with your application, there are dozens, but most of them are outside the scope of akka itself)
As a general idea, Akka Typed tends to be much more strict about who can send what messages where. In Akka classic, it was easy to basically send everything everywhere and find and access any actor from everywhere, including outside the system. Unfortunately, this "freedom" leads to a huge amount of problems and was thus severely limited in Typed, which makes for clearer contracts and better defined message flows.
Of course, in a highly complex system, you might, for example, want to use a Receptionist instead to find the target actor for your specific message, but since the question was for a simple application, I would skip that for now.
You can, of course, also add ways to get your ActorRefs outside the system, for example by using the Ask Pattern to implement something like an actor discovery in the RootGuardian, but there is simply no need to try to circumvent the concepts of Akka Typed by re-implementing ActorSelection.
Obviously you also could use clustering, start up a 2nd JVM, connect it to the cluster, send the message and shut it down again, but we can assume that this would be overkill and very, very slow (waiting for long seconds while starting up the app, connecting to the cluster, etc. just to then use a few milliseconds to send the message).
If you absolutely want a 2nd JVM there, you can, of course, for example, create simply REST client that sends the message and start that, but... curl exists, so... what for?
So, tl;dr: The "analogous and hopefully equally easy solution" is system.tell( new StopMessage() );, which is basically the same in akka typed as the code for akka classic you provided. Obviously, implementing the actor system in a way that this code works, is the (slightly) more tricky part.

Spring integration finally type handler

I have an application which reads from a kafka queue and and goes on like this.
validate->convert->enrich->persist->notify
In the flow, I'm gathering some performance and other data points into a ThreadLocal container.
In the happy scenario I'm sending these information to a service to be later used in reporting. But the pipeline can stop in any step if one of the step fails due to a known error (eg, convert failed so flow should stop there). I do not like each of these processors to have a code that sends the information in the ThreadLocal to reporting service if the execution resulted in error, as that would couple those services with information not related to its task.
It would be nice to have a way to execute a service at the end of the flow to send this information out, no matter which step the pipeline stops moving forward. Also there could be scenarios some code did throw an exception that was not known or other issue that break the flow.
Is there a way that a final operation to be executed no matter the result of the pipeline so that it can be used to send this information similar to a finally block in java?

The integration flow is like a simple Java try...catch...finally. It is really more about a distributed computation and loosely-coupling principle between components. So, even if you tie endpoints with channels in between, there really have nothing to know about the previous and next step: everything is done in the current endpoint with its input and output channels. Therefore your request about something like finally in the flow does not fit to the EIP concepts and cannot be implement as some primitive in the framework.
You are just lucky in your use-case that you can rely on the ThreadLocal for your flow logic, but you should keep in mind that it is not a way to deal with messaging. It really has to be stateless and have scope only of the message traveling from one endpoint to another. Therefore it might be better to revise your logic in favor of storing such a tracing information into headers of that message on each step. This way in the future you can make the flow fully async or even distributed in the network.
This is just my concern for a design you have so far.
For the current error handling problem consider to have that "final" step behind some well-know channel, so you will be free to send a message to that endpoint from whatever place you need. For example you can wrap problematic endpoints into an ExpressionEvaluatingRequestHandlerAdvice. Handle an error over there and send it to the mentioned channel. This way your business method will be free from error handling and so. See more in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#expression-advice
If your flow starts from some gateway or inbound channel adapter, you can have an errorChannel configured there to be able to catch all the downstream errors in one central place. And again: send the handling result to the mentioned channel.
But no. No finally in the framework at the moment and I doubt it would even be suitable in the future. For the messaging and async reason I explained before.

Axon Framework: Saga project with compensation events between two or three microservices

I have a question about Axon Saga. I have a project where I have three microservices, each microservice has his own database, but the two "Slave" microservice has to share his data to the "Master" microservice, for that I want to use the Axon Saga. I already asked a question about the compensation, when something goes wrong, and I have to deal with the compensation by myself, it is ok, but not ideal. Currently I am using the DistributedCommandBus to communicate between the microservices, is it good for that? I am using the Choreography Saga model, so here is what it is look like now:
Master -> Send command -> Slave1 -> Handles event
Slave1 -> Send back command -> Master -> Handles event
Master -> Send command -> Slave2 -> Handles event
Slave2 -> Send back command -> Master -> Handles event
If something went wrong then comes the compensating Commands/Events backwards.
My question is has anybody did something like this with Axon, with compensation, what the best practices for that? How can I retry the Saga process? With the RetryScheduler? Add a github repo if you can.
Thanks, Máté

First and foremost, let me answer your main question:
My question is has anybody did something like this with Axon?
Shortly, yes, as this is one of the main use cases of for Sagas.
As a rule of thumb, I'd like to state a Saga can be used to coordinate a complex business transaction between:
Several distinct Aggregate Instances
Several Bounded Contexts
On face value, it seems you've landed in option two of delegating a complex business transaction.
It is important to note that when you are using Sagas, you should very consciously deal with any exceptions and/or command dispatching results.
Thus, if you dispatch a command from the "Master" to "Slave 1" and the latter fails the operation, this result will come back in to the Saga.
This thus gives you the first option to retry an operation, which I would suggest to do with a compensating action.
Lastly, with a compensating action, I am talking about dispatching a command to trigger it.
If you can not rely on the direct response from dispatching the command, retrying/rescheduling a message within the Saga would be a reasonable second option.
To that end, Axon has the EventScheduler and DeadlineManager.
Note that the former of the two publishes an event for everyone to see.
The latter schedules a DeadlineMessage within the context of that single Saga instance, thus limiting the scope of who can see a retry is occurring.
Typically, the DeadlineManager would be my preferred mode of operation for thus, unless you require this 'rescheduling action' to be seen by everybody.
FYI, check this page for EventScheduler information and this page for DeadlineManager info.
Sample Update
Here's a bit of pseudo-code to get a feel what a compensating action in a Saga Event Handler would look like:
class SomeSaga {
private CommandGateway commandGateway;
#SagaEventHandler(assocationValue = "some-key")
public void on(SomeEvent event) {
// perform some checks, validation and state setting, if necessary
commandGateway.send(new MyActionCommand(...))
.exceptionally(throwable -> {
commandGateway.send(new CompensatingAction(...));
});
}
}

I don't know your exact use case, but from this and your previous question I get the impression you want to roll back, or in this case undo, the event if one of the event handlers cannot process it.
In general, there are some things you are able to do. You can see if the aggregate that applied the event in the first place has or can have the information to check whether the 'slave' microservice should be able to handle the event before you apply it. If this isn't practical, the slave microservice can also apply a 'failure' event directly on the eventbus to inform the rest of the system that a failure state has occurred that needs to be handled:
https://docs.axoniq.io/reference-guide/implementing-domain-logic/event-handling/dispatching-events#dispatching-events-from-a-non-aggregate

Strategies to call other bounded context

I'm currently on a study project involving Domain-driven design (DDD) and integration scenarios of multiple domains.
I have a use case in one of my bounded context where I need to contact another BC to validate an aggregate. In fact, there could be several BC to ask for validation data in the future (but no for now).
Right now, I'm suffering from DDD obsessive compulsive disorder nervous breakdown where I cannot found a way to apply patterns correctly (lol). I would really appreciate some feedback from people about it.
About the 2 bounded contexts.
- The first one (BC_A) where the use case is taking place would contain a list of elements that are related to the user.
- The external one (BC_B) has some knowledge about those elements
* So, a validation request from BC_A to BC_B would ask a review of all elements of the aggregate from BC_A, and would return a report containing some specifications about what to do with those elements (if we should keep it or not, and why).
*The state of the aggregate would pass through (let say) "draft" then "validating" after a request, and then depending on the report sent back, it would be "valid" or "has_error" in case there is one. If the user later choose to not follow the spec, it could pass the state of the aggregate to "controlled" meaning there is some error but we do not taking care of it.
The command is ValidateMyAggregateCommand
The use case is:
get the target aggregate by id
change its state to "validating"
persist the aggregate
make validation call (to another BC)
persist the validation report
acknowledge the validation report with the target aggregate (which will change its state again depending on the result, should be "OK" or "HAS_ERROR")
persist the aggregate again
generate a domain event depending on the validation result
it contains 8 steps, possibly from 1 to 3 transactions or more.
I need to persist the validation report localy (to access it in the UI) and I think I could do it:
after the validation call independently (the report being its own aggregate)
when I persist the target aggregate (it would be inside it)
I prefer the first option (step 5) because it is more decoupled - even if we could argue that there is an invariant here (???) - and so there is a consistency delay between the persistance of the report and the acknownledgement by the aggregate.
I'm actually struggling with the call itself (step 4).
I think I could do it in several ways:
A. synchronous RPC call with REST implementation
B. call without a response (void) (fire and forget) letting several implementations options on the table (sync/async)
C. domain event translated into technical event to reach other BC
A. Synchronous RPC call
// code_fragment_a
// = ValidateMyAggregateCommandHandler
// ---
myAggregate = myAggregateRepository.find(command.myAggregateId()); // #1
myAggregate.changeStateTo(VALIDATING); // #2
myAggregateRepository.save(myAggregate); // #3
ValidationReport report = validationService.validate(myAggregate); // #4
validationReportRepository.save(report); // #5
myAggregate.acknowledge(report); // #6
myAggregateRepository.save(myAggregate); // #7
// ---
The validationService is a domain service implemented in the infrastructure layer with a REST service bean (could be local validation as well but not in my scenario).
The call needs a response immediately and the caller (the command handler) is blocked until the response is returned. So it introduces a high temporal coupling.
In case the validation call fails because of technical reasons, we take an exception and we have to rollback everything. The command would have to be replayed later.
B. Call without response (sync or async)
In this version, the command handler would persist the "validating" state of the aggregate, and would fire (and forget) the validation request.
// code_fragment_b0
// = ValidateMyAggregateCommandHandler
// ---
myAggregate = myAggregateRepository.find(command.myAggregateId()); // #1
myAggregate.changeStateTo(VALIDATING); // #2
myAggregateRepository.save(myAggregate); // #3
validationRequestService.requestValidation(myAggregate); // #4
// ---
Here, the acknowledgement of the report could happen in a sync or async manner, inside or outside the initial transaction.
Having this code above in a dedicated transaction allows failures in validation call to be harmless (if we have a retry mechanism in the impl).
This solution would allow to start with a sync communication quickly and easily, and switch to an async one later. So it is flexible.
B.1. Synchronous impl
In this case, the implementation of the validationRequestService (in the infrastructure layer) does a direct request/response.
// code_fragment_b1_a
// = SynchronousValidationRequestService
// ---
private ValidationCaller validationCaller;
public void requestValidation(MyAggregate myAggregate) {
ValidationReport report = validationCaller.validate(myAggregate);
validationReportRepository.save(report);
DomainEventPublisher.publish(new ValidationReportReceived(report))
}
// ---
The report is persisted in a dedicated transaction, and the publishing of an event activate a third code fragment (in the application layer) that do the actual acknowledgment work on the aggregate.
// code_fragment_b1_b
// = ValidationReportReceivedEventHandler
// ---
public void when(ValidationReportReceived event) {
MyAggregate myAggregate = myAggregateRepository.find(event.targetAggregateId());
ValidationReport report = ValidationReportRepository.find(event.reportId());
myAggregate.acknowledge(report);
myAggregateRepository.save(myAggregate);
}
// ---
So here, we have an event from infra layer to the app layer.
B.2. Asynchronous
The asynchronous version would change the previous solution in the ValidationRequestService impl (code_fragment_b1_a). The use of a JMS/AMQP bean would allow to send a message in a first time, and receive the response later independently.
I guess the messaging listener would fire the same ValidationReportReceived event, and the rest of the code would be the same for code_fragment_b1_b.
As I write this post, I realize this solution (B2) present a nicer symetry in the exchange and better technical points because it is more decoupled and reliable regarding the network communications. At this point it is not introducing so much complexity.
C. Domain events and bus between BCs
Last implementation, instead of using a domain service to request a validation from other BC, I would raise a domain event like MyAggregateValidationRequested. I realize it is a "forced" domain event, ok the user requested it but it never really emerge in conversation but still it is a domain event.
The thing is, I don't know yet how and where to put the event handlers. Should the infrastructure handlers take it directly ?
Should I translate the domain event into a technical event before sending it to its destination ???
technical event like some kind of DTO if it was a data structure
I guess all the code related to messaging belong to the infrastructure layer (port/adapter slot) because they are used to communicate between systems only.
And the technical events that are transfered inside those pipes with their raising/handling code should belong to the application layer because like commands, they end up on a mutation of the system state. They coordinates the domain, and are fired by the infra (like controllers firing application service).
I read some solutions about translating events in commands but I think it makes the system more complex for no benefits.
So my application facade would expose 3 types of interacion:
- commands
- queries
- events
With this separation, I think we can isolate commands from UI and events from other BCs more clearly.
Ok, I realize that post is pretty long and maybe a little bit messy, but this is where I'm stuck, so I thank you in advance if you can say something that could help me.
So my problem is that I'm struggling with the integration of the 2 BC.
Different solutions: - The service RPC (#A) is simple but limit the scale, - the service with messaging (#B) seems right but I still need feedback, - and the domain events (#C) I don't know really how to cross boudaries with that.
Thank you again!

I have a use case in one of my bounded context where I need to contact another BC to validate an aggregate.
That's a really weird problem to have. Typically, aggregates are valid, or not valid, entirely dependent on their own internal state -- that would be why they are aggregates, and not merely entities in some larger web.
In other words, you may be having trouble applying the DDD patterns because your understanding of the real problem you are trying to solve is incomplete.
As an aside: when asking for help in ddd, you should adhere as closely as you can to your actual problem, rather than trying to make it abstract.
That said, there are some patterns that can help you out. Udi Dahan walks through them in detail in his talk on reliable messaging, but I'll cover the high points here.
When you run a command against an aggregate, there are two different aspects to be considered
Persisting the change of state
Scheduling side effects
"Side effects" can include commands to be run against other aggregates.
In your example, we would see three distinct transactions in the happy path.
The first transaction would update the state of your aggregate to Validating, and schedule the task to fetch the validation report.
That task runs asynchronously, querying the remote domain context, then starts transaction #2 in this BC, which persists the validation report and schedules a second task.
The second task - built from the data copied into the validation report - starts transaction #3, running a command against your aggregate to update its state. When this command is finished, there are no more commands to schedule, and everything gets quiet.
This works, but it couples your aggregates perhaps too tightly to your process. Furthermore, your process is disjoint - scattered about in your aggregate code, not really recognized as being a first class citizen.
So you are more likely to see this implemented with two additional ideas. First, the introduction of a domain event. Domain events are descriptions changes of state that are of special significance. So the aggregate describes the change (ValidationExpired?) along with the local state needed to make sense of it, publishing the event asynchronously. (In other words, instead of asynchronously running an arbitrary task, we run asynchronously schedule a PublishEvent Task, with an arbitrary domain event as the payload).
Second, the introduction of a "process manager". The process manager subscribes to the events, updates its internal state machine, and schedules (asynchronous) tasks to run. (These tasks are the same tasks that the aggregate was scheduling before). Note that the process manager doesn't have any business rules; those belong in the aggregates. But they know how to match commands with the domain events they generate (see the messaging chapter in Enterprise Integration Patterns, by Gregor Hohpe), to schedule timeout tasks that help detect which scheduled tasks haven't completed within their SLA and so on.
Fundamentally, process managers are analogous to aggregates; they themselves are part of the domain model, but access to them is provided to them by the application component. With aggregates, the command handler is part of the application; when the command has been processed by the aggregate, it's the application that schedules the asynchronous tasks. The domain events are published to the event bus (infrastructure), and the application's event handlers subscribe to that bus, loading the process managers via persistence, passing the domain event to be processed, using the persistence component again to save the updated process manager, and then the application schedules the pending tasks.
I realize it is a "forced" domain event, ok the user requested it but it never really emerge in conversation but still it is a domain event.
I wouldn't describe it as forced; if the requirement for this validation process really comes from the business, then the domain event is a thing that belong in the ubiquitous language.
Should I translate the domain event into a technical event before sending it to its destination
I have no idea what you think that means. Event is a message describing something that happened. "Domain event" means that the something happened within the domain. It's still a message to be published.

How can I separate business logic and email sending functionality?

I have a requirement in my java web application where I need to send email alerts for certain conditions. For this I have used javax mail api and sending email works just fine. But the problem is the programs executions waits until the methods for sending the email are executed. As there are hundreds of email to be sent at various points ... this reduces the performance significantly.
I am using spring and have also used spring aop. Can anyone suggest me how can I separate my business logic and sending email functionality. It should be like -
Sending emails is my advice which gets executed when xyz method is called - So main execution should not wait for advice to finish its execution rather it should return back and execute further business logic thus email sending executed separately.
Here creating new threads seems obvious choice. But I think there could be some better way, is there? Thanks.

You can make the mail sending method #Async. This way Spring will execute this in a seperate thread. Read this blog post about it: Creating Asynchronous Methods

What you describe is asynchronous execution and natural way to do async execution is Java is to use threads.
You can introduce some Executor, e.g., Executors.newFixedThreadPool(), and use it to offload mailing task into separate threads.
Aspect itself is a unsuitable place for this, since this would introduce state into aspect, for example, you may want to check if mail task was successful by using returned Future:
class Mailer {
private final ExecutorService executor = Executors.newFixedThreadPool(maxMailingThreads);
//...
public void doMail(MailTask anEmail) {
Future<MailTaskResult> future = executor.submit(new MailTask(anEmail));
future.get().isSuccessful(); // handle success or failure somehow
}
Better move this logic into separate class and call it from aspect somehow.

Treat the email sending functionality like an IO device. Make it a plugin to your business logic. Do not allow any knowledge of the fact that you're even talking to the email code into your business logic. Make the email logic depend on the business logic. Never the other way around.
Here's a very good talk about this kind of architecture:
https://vimeo.com/97530863
Here's a series debating it:
https://www.youtube.com/watch?v=z9quxZsLcfo
Here's a ruby master demonstrating it with real code. We miss him.
https://www.youtube.com/watch?v=tg5RFeSfBM4
If your business rules are interesting enough to be worth respecting than this is the way to make them the masters of your application. Express them only using java. Don't accept any help. No spring, no weird annotations, just business rules. Push all that "help" out to the mail code.
Do this and your app will scale well. I think this is the best way to put it:
That's from a hexagonal architecture post. But the idea of giving your business rules a safe place to live removed from implementation detail shows up in many architectures. This answer rounds them up nicely.

Use a localhost MTA (like OpenSMTPD) and then relay to your real SMTP server, like Amazon SES ("Satellite" mode). It won't block.
I did a test, and sent 1000 emails in 2.8 seconds this way
It's simpler than doing async in java, and is useful across multiple applications.
As for separating logic, raise a Spring Application Event when needed, and make another class to listen to it, and send your email from there. Or consider something like Guava's EventBus

Consider creating a separate thread to send emails within your application. This will allow parallel execution(application+email sending).
If you would want another approach you can create a separate back end application that only sends emails. Although you will need to submit the email messages to the application. An asynchronous way to do this is to send a JMS message to the email application.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.