CQRS/ES - handling projection errors - java

I'm working on a CQRS+ES system, mainly using the axon framework, but really this question applies to any implementation. So I have a command handler and 1 or more event handlers, running on different JVMs, containers etc, and at some point one of these handlers encounters an error.
We have two cases, an 'expected' business error and an 'unexpected' system error. As I understand it, we are now in an asynchronous handler, and the event is now a fact, so in reality we cannot directly rollback the command for neither case (as it could entail rolling it back in numerous other projections and break CQRS).
So my question is, should such an error be 'resolved' in an accounts ledger sort of way, i.e. by sending a new 'reversal' command that is then propagated to the projections in such a way that the event that failed is now resolved?
As an example, let's say we have a command that updates a customer's credit. The event is published, one projection updates its "total credits" statistic, another one publishes the update to some websocket for the UI and, finally, another one which maintains the credit state - and this last handler fails. Should we send a command for rolling back the business transaction, and again deduct the credit, update the websocket again, etc? And in case of axon is there some way in which this is captured as a transaction?

I'd state that the decision making whether taking an action, thus handling a command, is okay, should always lie with the Command Model/Aggregate. The aggregate being in an incorrect state to handle an action will typically lead to a 'business exception/error'.
If you'd make decisions upon event handling failing however, you're adding some decision making logic in a event handling service which in most cases it likely doesn't care about. It such an event handling services updates views/query models, but fails to do so, I'd argue that's not a valid reason to publish a 'compensating command' to your aggregate to 'rollback/undo the event'.
In your example you have a 'credit-state-maintainer', which I'd guess updates a query model. As such I'd deem the problem of dealing with the exception to lie within the service itself, not by performing a compensating action.
From an Axon Framework perspective, you could wrap your CreditStateEventHandler in a TrackingEventProcessor and trigger a reset on that event processor by calling the TrackingEventProcessor#resetTokens() function. This is taken the stance that the exception due to which your CreditStateEventHandler is due to faulty coding of course, otherwise a replay would result in the exact same exception.

Related

Axon Framework: Saga project with compensation events between two or three microservices

I have a question about Axon Saga. I have a project where I have three microservices, each microservice has his own database, but the two "Slave" microservice has to share his data to the "Master" microservice, for that I want to use the Axon Saga. I already asked a question about the compensation, when something goes wrong, and I have to deal with the compensation by myself, it is ok, but not ideal. Currently I am using the DistributedCommandBus to communicate between the microservices, is it good for that? I am using the Choreography Saga model, so here is what it is look like now:
Master -> Send command -> Slave1 -> Handles event
Slave1 -> Send back command -> Master -> Handles event
Master -> Send command -> Slave2 -> Handles event
Slave2 -> Send back command -> Master -> Handles event
If something went wrong then comes the compensating Commands/Events backwards.
My question is has anybody did something like this with Axon, with compensation, what the best practices for that? How can I retry the Saga process? With the RetryScheduler? Add a github repo if you can.
Thanks, Máté
First and foremost, let me answer your main question:
My question is has anybody did something like this with Axon?
Shortly, yes, as this is one of the main use cases of for Sagas.
As a rule of thumb, I'd like to state a Saga can be used to coordinate a complex business transaction between:
Several distinct Aggregate Instances
Several Bounded Contexts
On face value, it seems you've landed in option two of delegating a complex business transaction.
It is important to note that when you are using Sagas, you should very consciously deal with any exceptions and/or command dispatching results.
Thus, if you dispatch a command from the "Master" to "Slave 1" and the latter fails the operation, this result will come back in to the Saga.
This thus gives you the first option to retry an operation, which I would suggest to do with a compensating action.
Lastly, with a compensating action, I am talking about dispatching a command to trigger it.
If you can not rely on the direct response from dispatching the command, retrying/rescheduling a message within the Saga would be a reasonable second option.
To that end, Axon has the EventScheduler and DeadlineManager.
Note that the former of the two publishes an event for everyone to see.
The latter schedules a DeadlineMessage within the context of that single Saga instance, thus limiting the scope of who can see a retry is occurring.
Typically, the DeadlineManager would be my preferred mode of operation for thus, unless you require this 'rescheduling action' to be seen by everybody.
FYI, check this page for EventScheduler information and this page for DeadlineManager info.
Sample Update
Here's a bit of pseudo-code to get a feel what a compensating action in a Saga Event Handler would look like:
class SomeSaga {
private CommandGateway commandGateway;
#SagaEventHandler(assocationValue = "some-key")
public void on(SomeEvent event) {
// perform some checks, validation and state setting, if necessary
commandGateway.send(new MyActionCommand(...))
.exceptionally(throwable -> {
commandGateway.send(new CompensatingAction(...));
});
}
}
I don't know your exact use case, but from this and your previous question I get the impression you want to roll back, or in this case undo, the event if one of the event handlers cannot process it.
In general, there are some things you are able to do. You can see if the aggregate that applied the event in the first place has or can have the information to check whether the 'slave' microservice should be able to handle the event before you apply it. If this isn't practical, the slave microservice can also apply a 'failure' event directly on the eventbus to inform the rest of the system that a failure state has occurred that needs to be handled:
https://docs.axoniq.io/reference-guide/implementing-domain-logic/event-handling/dispatching-events#dispatching-events-from-a-non-aggregate

To publish a custom event and asynchronously handle it in Spring MVC based Rest application

I have written a Rest application based on Spring MVC wherein I am required to do some validations, some of the validations are Hard rules and some of them are Soft rules. Soft rules if they fail generate a warning, but if the hard rules fail they generate an error.
First I am checking the hard rules if any fail then, at that time only, I am returning back the response, but let the process continue to process the subsequent Soft rules.
Herein I would like to know how to create two parallel threads in Spring to do this?
OR How to publish a custom event and asynchronously handle it in another thread and let the original thread continue its work in Spring?
I know about #async and SpringTaskExecutor, but how to best use them here.
I am seeking design and architectural guidelines and ideas to handle this task in best possible way.
As mentioned soft rule(s) validation failure only generates warning it can be handled in a separate background process. This way main thread can focus solely on hard rule(s) without bothering itself about soft rule(s).
For above behavior below points need to be implemented
For every request persist the relevant data, for soft rule processing, with flag processed=false and preferably time stamp (for insertion and processed).
Post persisting the data, let the main thread continue with hard rule processing.
Introduce a scheduled service (via #Scheduled) which will periodically fetch the unprocessed data and mark them as processed=true post soft rule processing along with relevant processed time stamp. (This will as act as the background process which will periodically poll the data for unprocessed rule(s))
Do ensure that the respective transactions viz. soft rule data insertion and processing are well isolated. Also the error handling should be robust in terms of system failures when rule processing are in progress.
Let know in comments if more information is required.

Strategies to call other bounded context

I'm currently on a study project involving Domain-driven design (DDD) and integration scenarios of multiple domains.
I have a use case in one of my bounded context where I need to contact another BC to validate an aggregate. In fact, there could be several BC to ask for validation data in the future (but no for now).
Right now, I'm suffering from DDD obsessive compulsive disorder nervous breakdown where I cannot found a way to apply patterns correctly (lol). I would really appreciate some feedback from people about it.
About the 2 bounded contexts.
- The first one (BC_A) where the use case is taking place would contain a list of elements that are related to the user.
- The external one (BC_B) has some knowledge about those elements
* So, a validation request from BC_A to BC_B would ask a review of all elements of the aggregate from BC_A, and would return a report containing some specifications about what to do with those elements (if we should keep it or not, and why).
*The state of the aggregate would pass through (let say) "draft" then "validating" after a request, and then depending on the report sent back, it would be "valid" or "has_error" in case there is one. If the user later choose to not follow the spec, it could pass the state of the aggregate to "controlled" meaning there is some error but we do not taking care of it.
The command is ValidateMyAggregateCommand
The use case is:
get the target aggregate by id
change its state to "validating"
persist the aggregate
make validation call (to another BC)
persist the validation report
acknowledge the validation report with the target aggregate (which will change its state again depending on the result, should be "OK" or "HAS_ERROR")
persist the aggregate again
generate a domain event depending on the validation result
it contains 8 steps, possibly from 1 to 3 transactions or more.
I need to persist the validation report localy (to access it in the UI) and I think I could do it:
after the validation call independently (the report being its own aggregate)
when I persist the target aggregate (it would be inside it)
I prefer the first option (step 5) because it is more decoupled - even if we could argue that there is an invariant here (???) - and so there is a consistency delay between the persistance of the report and the acknownledgement by the aggregate.
I'm actually struggling with the call itself (step 4).
I think I could do it in several ways:
A. synchronous RPC call with REST implementation
B. call without a response (void) (fire and forget) letting several implementations options on the table (sync/async)
C. domain event translated into technical event to reach other BC
A. Synchronous RPC call
// code_fragment_a
// = ValidateMyAggregateCommandHandler
// ---
myAggregate = myAggregateRepository.find(command.myAggregateId()); // #1
myAggregate.changeStateTo(VALIDATING); // #2
myAggregateRepository.save(myAggregate); // #3
ValidationReport report = validationService.validate(myAggregate); // #4
validationReportRepository.save(report); // #5
myAggregate.acknowledge(report); // #6
myAggregateRepository.save(myAggregate); // #7
// ---
The validationService is a domain service implemented in the infrastructure layer with a REST service bean (could be local validation as well but not in my scenario).
The call needs a response immediately and the caller (the command handler) is blocked until the response is returned. So it introduces a high temporal coupling.
In case the validation call fails because of technical reasons, we take an exception and we have to rollback everything. The command would have to be replayed later.
B. Call without response (sync or async)
In this version, the command handler would persist the "validating" state of the aggregate, and would fire (and forget) the validation request.
// code_fragment_b0
// = ValidateMyAggregateCommandHandler
// ---
myAggregate = myAggregateRepository.find(command.myAggregateId()); // #1
myAggregate.changeStateTo(VALIDATING); // #2
myAggregateRepository.save(myAggregate); // #3
validationRequestService.requestValidation(myAggregate); // #4
// ---
Here, the acknowledgement of the report could happen in a sync or async manner, inside or outside the initial transaction.
Having this code above in a dedicated transaction allows failures in validation call to be harmless (if we have a retry mechanism in the impl).
This solution would allow to start with a sync communication quickly and easily, and switch to an async one later. So it is flexible.
B.1. Synchronous impl
In this case, the implementation of the validationRequestService (in the infrastructure layer) does a direct request/response.
// code_fragment_b1_a
// = SynchronousValidationRequestService
// ---
private ValidationCaller validationCaller;
public void requestValidation(MyAggregate myAggregate) {
ValidationReport report = validationCaller.validate(myAggregate);
validationReportRepository.save(report);
DomainEventPublisher.publish(new ValidationReportReceived(report))
}
// ---
The report is persisted in a dedicated transaction, and the publishing of an event activate a third code fragment (in the application layer) that do the actual acknowledgment work on the aggregate.
// code_fragment_b1_b
// = ValidationReportReceivedEventHandler
// ---
public void when(ValidationReportReceived event) {
MyAggregate myAggregate = myAggregateRepository.find(event.targetAggregateId());
ValidationReport report = ValidationReportRepository.find(event.reportId());
myAggregate.acknowledge(report);
myAggregateRepository.save(myAggregate);
}
// ---
So here, we have an event from infra layer to the app layer.
B.2. Asynchronous
The asynchronous version would change the previous solution in the ValidationRequestService impl (code_fragment_b1_a). The use of a JMS/AMQP bean would allow to send a message in a first time, and receive the response later independently.
I guess the messaging listener would fire the same ValidationReportReceived event, and the rest of the code would be the same for code_fragment_b1_b.
As I write this post, I realize this solution (B2) present a nicer symetry in the exchange and better technical points because it is more decoupled and reliable regarding the network communications. At this point it is not introducing so much complexity.
C. Domain events and bus between BCs
Last implementation, instead of using a domain service to request a validation from other BC, I would raise a domain event like MyAggregateValidationRequested. I realize it is a "forced" domain event, ok the user requested it but it never really emerge in conversation but still it is a domain event.
The thing is, I don't know yet how and where to put the event handlers. Should the infrastructure handlers take it directly ?
Should I translate the domain event into a technical event before sending it to its destination ???
technical event like some kind of DTO if it was a data structure
I guess all the code related to messaging belong to the infrastructure layer (port/adapter slot) because they are used to communicate between systems only.
And the technical events that are transfered inside those pipes with their raising/handling code should belong to the application layer because like commands, they end up on a mutation of the system state. They coordinates the domain, and are fired by the infra (like controllers firing application service).
I read some solutions about translating events in commands but I think it makes the system more complex for no benefits.
So my application facade would expose 3 types of interacion:
- commands
- queries
- events
With this separation, I think we can isolate commands from UI and events from other BCs more clearly.
Ok, I realize that post is pretty long and maybe a little bit messy, but this is where I'm stuck, so I thank you in advance if you can say something that could help me.
So my problem is that I'm struggling with the integration of the 2 BC.
Different solutions: - The service RPC (#A) is simple but limit the scale, - the service with messaging (#B) seems right but I still need feedback, - and the domain events (#C) I don't know really how to cross boudaries with that.
Thank you again!
I have a use case in one of my bounded context where I need to contact another BC to validate an aggregate.
That's a really weird problem to have. Typically, aggregates are valid, or not valid, entirely dependent on their own internal state -- that would be why they are aggregates, and not merely entities in some larger web.
In other words, you may be having trouble applying the DDD patterns because your understanding of the real problem you are trying to solve is incomplete.
As an aside: when asking for help in ddd, you should adhere as closely as you can to your actual problem, rather than trying to make it abstract.
That said, there are some patterns that can help you out. Udi Dahan walks through them in detail in his talk on reliable messaging, but I'll cover the high points here.
When you run a command against an aggregate, there are two different aspects to be considered
Persisting the change of state
Scheduling side effects
"Side effects" can include commands to be run against other aggregates.
In your example, we would see three distinct transactions in the happy path.
The first transaction would update the state of your aggregate to Validating, and schedule the task to fetch the validation report.
That task runs asynchronously, querying the remote domain context, then starts transaction #2 in this BC, which persists the validation report and schedules a second task.
The second task - built from the data copied into the validation report - starts transaction #3, running a command against your aggregate to update its state. When this command is finished, there are no more commands to schedule, and everything gets quiet.
This works, but it couples your aggregates perhaps too tightly to your process. Furthermore, your process is disjoint - scattered about in your aggregate code, not really recognized as being a first class citizen.
So you are more likely to see this implemented with two additional ideas. First, the introduction of a domain event. Domain events are descriptions changes of state that are of special significance. So the aggregate describes the change (ValidationExpired?) along with the local state needed to make sense of it, publishing the event asynchronously. (In other words, instead of asynchronously running an arbitrary task, we run asynchronously schedule a PublishEvent Task, with an arbitrary domain event as the payload).
Second, the introduction of a "process manager". The process manager subscribes to the events, updates its internal state machine, and schedules (asynchronous) tasks to run. (These tasks are the same tasks that the aggregate was scheduling before). Note that the process manager doesn't have any business rules; those belong in the aggregates. But they know how to match commands with the domain events they generate (see the messaging chapter in Enterprise Integration Patterns, by Gregor Hohpe), to schedule timeout tasks that help detect which scheduled tasks haven't completed within their SLA and so on.
Fundamentally, process managers are analogous to aggregates; they themselves are part of the domain model, but access to them is provided to them by the application component. With aggregates, the command handler is part of the application; when the command has been processed by the aggregate, it's the application that schedules the asynchronous tasks. The domain events are published to the event bus (infrastructure), and the application's event handlers subscribe to that bus, loading the process managers via persistence, passing the domain event to be processed, using the persistence component again to save the updated process manager, and then the application schedules the pending tasks.
I realize it is a "forced" domain event, ok the user requested it but it never really emerge in conversation but still it is a domain event.
I wouldn't describe it as forced; if the requirement for this validation process really comes from the business, then the domain event is a thing that belong in the ubiquitous language.
Should I translate the domain event into a technical event before sending it to its destination
I have no idea what you think that means. Event is a message describing something that happened. "Domain event" means that the something happened within the domain. It's still a message to be published.

One upstream event with multiple items, how to set up spring integration pipeline transaction-wise

Let's imagine the situation where you have incoming upstream message which contains multiple items. Each item contains the information which participates in the business logic implemented as part of the pipeline.
Difficulties I can see:
Message has to be split & converted into multiple internal events, those are processed further and if one of them fails, then all internal events should be rolled back
If we had one upstream message = 1 item, it would be much easier
How should one cater for such situation from architecture point of view?
What is the best pattern to employ here?
How should one set up transactions?
Thanks!
Looks like your question isn't clear and that transaction word is used for different subjects...
Anyway let me guess what you want.
If you are going (and can) to roll back part of the business request, you should just ensure global XA transaction for all of them and do all splitted sub-tasks in the same thread. Because only this let you keep and track transaction and roll backs afterwards, if that.
If you can't deal with XA and single thread, than you should take a look to some solutions like compensation transaction or acknowledge with claim-checks.
But that is already outside of Spring Integration scope.

Session management using Hibernate in a *multi-threaded* Swing application

I'm currently working on a (rather large) pet project of mine , a Swing application that by it's very nature needs to be multi-threaded. Almost all user interactions might fetch data from some remote servers over the internet , since I neither control these servers nor the internet itself, long response times are thus inevitable. A Swing UI obviously cannot repaint itself while the EDT is busy so all remote server calls need to be executed by background thread(s).
My problem:
Data fetched by the background threads gets 'enriched' with data from a local (in-memory) database (remote server returns IDs/references to data in the local database). This data later eventually gets passed to the EDT where it becomes part of the view model. Some entities are not completely initialized at this point (lazy-fetching enabled) so the user might trigger lazy-fetching by e.g. scrolling in a JTable. Since the hibernate session is already closed this will trigger a LazyInitializationException. I can't know when lazy-fetching might be triggered by the user so creating a session on demand/attaching the detached object will not work here.
I 'solved' this problem by:
using a single (synchronized , since Session instances are not thread-safe) Session for the whole application
disabling lazy-fetching completely
While this works, the application's performance has suffered greatly (sometimes being close to unusable). The slowdown is mainly caused by the large number of objects that are now fetched by each query.
I'm currently thinking about changing the application's design to 'Session-per-thread' and migrating all entities fetched by non-EDT threads to the EDT thread's Session (similar to this posting on the Hibernate forums).
Side-note: Any problems related to database updates do not apply since all database entities are read-only (reference data).
Any other ideas on how to use Hibernate with lazy-loading in this scenario ?
Don't expose the Session itself in your data API. You can still do it lazily, just make sure that the hydration is being done from the 'data' thread each time. You could use a block (runnable or some kind of command class is probably the best Java can do for you here unfortunately) that's wrapped by code that performs the load async from the 'data' thread. When you're in UI code, (on the UI thread of course) field some kind of a 'data is ready' event that is posted by the data service. You can then get the data from the event use in the UI.
You could look have a look at Ebean ORM. It is session-less and lazy loading just works. This doesn't answer your question but really proposes an alternative.
I know Ebean has built in support for asynchronous query execution which may also be interesting for your scenario.
Maybe worth a look.
Rob.
There are two distinct problems, that should get resolved seperately:
Handling of Hibernate Sessions in Swing Applications. Let me recommend my own article, regarding this problem: http://blog.schauderhaft.de/2008/09/28/hibernate-sessions-in-two-tier-rich-client-applications/
The basic idea is to have a session for every frame, excluding modal frames which use the session of the spawning frame. It is not easy but it works. Meaning, you won't get any LLEs anymore.
How to get your GUI thread separated from the back end.
I recommend to keep the hibernate objects strictly on the back end thread they originate from. Only give wrapper objects to the ETD. If these wrapper objects are asked for a value, they create a request which gets passed to the backend thread, which eventually will return the value.
I'd envision three kinds of wrapper Implementations:
Async: requests the value, and gets notified when the value is available. It would return immediately with some dummy value. On notification it will fire a PropertyChange event i.O. to inform the GUI about the 'changed' value (changed from unknown to a real value).
Sync: requests the value and waits for it to be available.
Timed: a mixture between the two, waiting for a short time (0.01) seconds, before returning. This would avoid plenty change events, compared to the async version.
As a basis for these wrappers a recommend the ValueModel of the JGoodies Binding library: http://www.jgoodies.com/downloads/libraries.html
Obviously You need to take care that any action is only performed on actually loaded values, but since you don't plan on doing updates this shouldn't be to much of an issue.
Let me end with a warning: I have thought about it a lot, but never actually tried it, so move with care.

Categories