Axon recreating aggregate state not clear - java

I am working on my first Axon application and I cant figure out the use of the aggregates. I understand every time a command handler is called, the aggregate is being recreated by all the events, but I dont understand what other usage the recreating of the aggregates could have.
Like when should I manually recreate an aggregate?
What is the benefit of the aggregate being recreated every time I call an command?
The way I set up my application, I use a aggregateview to persist the data I need into the database. So now I feel like the events are just stored in the event store and are only used to recreate the aggregate after I call a command. Is there nothing else I should do with the events being stored and the recreation of the aggregate? Shouldn't I for example recreate the entire aggregate, instead of fetching the aggregateview out of my database by ID to update it.

The idea behind Event Sourcing your Aggregate, is that these events are the source for any model within your system.
Thus, if you create a dedicated Command Model handling the commands like you describe, then this model (which from Axon's perspective is the #Aggregate(Root) annotation class) will be sourced from the events it has published.
Additionally, you can introduce any type of Query Model you want; a RDBMS view, a Text-Based Search solution (e.g. Elastic), a time series database, you name it. Any of these Query Models are however still part of this same root application your Aggregate resides in. As you have the events as the means to notify others of decisions being made, it comes natural to (re)use those to update all your Query Models as well.
Now, it is perfectly true that you are not inclined to use Event Sourcing for your Aggregates in Axon, which from it's perspective is called a State-Stored Aggregate. If you do this however, you'll be back at having distinct models in distinct storage mechanism, without a single source of truth.
So, to circle back to your question with this added knowledge, I'd state the following:
Like when should I manually recreate an aggregate?
You are never inclined to recreate the Aggregate as the Command Model, ever, as the framework does this for you. If you have a mirrored Query Model Aggregate, then you would recreate this whenever you have added/removed/changed fields within the model. Or, if you have introduced entirely new models.
What is the benefit of the aggregate being recreated every time I call an command?
The benefit of recreating it every time, is the assurance that you will be using the latest state always. Even if between release of your application you have added/changed/removed new fields. The #EventSourcingHandler annotated methods would simply fill them in, without the need for you to for example write a database script to adjust it on the database level directly.
Concluding, the reason for this approach lies entirely within the architectural concepts supported through Axon. You can read up on them on AxonIQ's Architectural Concepts page if you want; I am sure it will clarify things even further.
Hope this helps you out #Gisrou8! If not, please come back with more questions, I'd gladly like to explain things further.
Update: Further Command Model explanation
In the comment Gisrou8 placed under my response it becomes apparent that "the unease" with this approach mainly resides in the state of the Aggregate.
As shared in my earlier response, the Aggregate as can be modeled with Axon Framework should be, in an Event Sourced set up, regarded as the Command Model in a CQRS system.
One of the main pillars around the Command Model, is that the only state it contains is the state required for decision making logic. To be more specific on the latter, the only state stored in your Aggregate is the state used to decide if a Command Handler should accepts the incoming command and publish an event as a result.
Thus, the sole fields you would introduce in your Aggregate along side the Aggregate Identifier are the fields you need to drive these decisions.
This is what the Command Model is intended for, so do not worry about this point.
To answer any queries within your application, you'd introduce a dedicated Query Model which is updated as a result of the events published by the Command Handlers within the Aggregate. It's this exact segregation which is the strong suit of this model as it allows for better scaling, performance improvements or required team separations, among other non-functional requirements.

Related

Axon commands on read model

We are currently using axon framework with hikaricp as data source pooling system. We are facing pool exhausting issues from time to time and we have a theory:
To update our read models we use the command bus to send UpdateEntityViewCommand.
As the command bus starts a transaction using the primary transaction manager (the write one) it acquires a connection from the write pool
On the handler, we open an inner transaction using a connection from the read pool, thus blocking the outer one.
This seems to exhaust the pools under some conditions. The question is: should we stop using the command bus to update our read models? Is appropiate to have two buses( one for write and one for read?
Thanks in advance
Axon's intent with the separation of Commands, Events, and Queries, is in general to support CQRS within your system. This means you would introduce dedicated Command Models and Query Models, which respectively only receive Command and Query messages.
The Query Models are typically also referred to as Projections or Read Models.
It is this combination of ideas Axon is normally used for which makes your question a bit vague. Or, perchance, you are using it differently than normally intended.
That Axon supports CQRS through this approach, does not necessitate that you do CQRS within your set up as well, by the way.
At any note, it would be good too (as also stated in the comments):
Include code snippets of the message handlers when these prove useful
Give some deeper insight in your configuration
Whether you aim to use the CQRS pattern in your application, yes or no
Completely separate from this, I can give some guidance on how Axon deals with transactions. Every message inside an Axon application will start a so-called "Unit of Work". It is the UnitOfWork (UoW for short) that will start a transaction. This means that no matter if you use commands, events or queries, Axon will have started that transaction for you already.
Taking this a step further, this also means that whatever you do inside a message handling function (thus the #CommandHandler, #EventHandler and #QueryHandler annotated methods) will always already have an active transaction running. That for example defines that you do not have to include your own transaction management inside a message handling function, with for example the #Transactional annotation.
Concluding, I am guessing you might be mixing some concepts. This can obviously happen, so no worries there; that's what SO is for. Next to this though, you thus do not have to start your own transaction inside any message handling function, as you already have an active transaction at that point in time.

Axon - Easiest way to make projection at query time

I will usually have 5-6 events per aggregate and would like not to store projections in DB. What would be the easiest way always to make view projection at query time?
The short answer to this, is that there is no easy/quick way to do this.
However, it most certainly is doable to implement a 'replay given events at request time' set up.
What I would suggest you do exists in several steps:
Create the query model you would like to return, which can handle events (use #EventHandler annotated methods on the model)
Create a Component which can handle the query that'll return the query model in step one (use a #QueryHandler annotated method for this.
The Query-Handling-Component should be able to retrieve a stream of events from the EventStore. If this is based on an aggregateIdentifier, use the EventStore#readEvents(String) method. If you need the entire event stream, you need to use the StreamableMessageSource#openStream(TrackingToken) method (note: the EventStore interface implements StreamableMessageSource)
Upon query handling, create a AnnotationEventHandlerAdapter, giving it a fresh instance of your Query Model
For every event in the event stream you've created in point 3, call the AnnotationEventHandlerAdapter#handle(EventMessage) method. This method will call the #EventHandler annotated methods on your Query Model object
If the stream is depleted, you are ensured all necessary events for your Query Model have dealt with. Thus, you can now return the Query Model
So, again, I don't think this is overly trivial, easy or quick to set up.
Additionally, step 3 has quite a caveat in there. Retrieving the stream of a given Aggregate based on the Aggregate Identifier is pretty fast/concise, as an Aggregate in general doesn't have a lot of events.
However, retrieving the Event Stream based on a TrackingToken, which you'd need if your Query Model spans several Aggregates, can ensure you pull in the entire event store for instantiating your models on the fly. Granted, you can fine tune the point in time you want the Event Stream to return events from as you're dealing with a TrackingToken, but the changes are pretty high you will be incomplete and relatively slow.
However, you stated you want to retrieve events for a given Aggregate Identifier.
I'd thus think this should be a workable solution in your scenario.
Hope this helps!

Create aggregate root in the context of another aggregate root

i'm currently struggling with the creation of instances in the ddd context.
i have read and searched alot and sometimes thought that i have found the answer only to realize that it doesnt feel right while programming it.
This is my situation:
I have two aggregate roots Scenarioand Step. I made those AR
because they encapsulate related elements of the domain and each AR
should be in a consistent state.
Multiple Steps can exist in the
context of a Scenario. They can not exist on their own.
The "name/natural id" of each Step in the context of its Scenario has to be unique. Changes in Scenario do not automatically influence its Steps and
vice versa (e.g. Step doesnt care if Scenario changes some
descriptions or images).
Different Steps of a Scenario can be used, edited, etc. at the same time.
At the moment, each Step holds a reference to its Scenario by the corresponding natural identifier. The Scenario class doesnt know anything about its Steps, so it does not hold a collection with Step references.
How would i create a new Stepfor a given Scenario?
Should i load the Scenario and call something like
createNewStep(...) on it? That would not enforce the uniqueness
constraint (that is in fact a business constraint and not a
technical one), because Scenario doesnt know about its Steps. I would probably have to go with some kind of a "disconnected domain model" then or pass a repsoitory or service to the method to perform the checks.
Should i use a domain service that enforces the constraint, queries the repository, and finally creates and returns the Step?
Should Scenario simply know about its Steps? I think i would like to avoid this one, since that would create a ugly-to-maintain bidirectional relationship.
One could imagine other use cases like a Step shall be classified by options that are provided by the specific Scenario. In this case and if there would be no constraints regarding the "collection" of Steps, i would probably go with the first "solution". Then again: if the classification is changed afterwards, the access to the scenario would be necessary to check for the allowed classifications. That brings me to a possible 4th solution:
Using some kind of "combination" of some possible solutions. Would it be a good idea to create the domain service (accessing everything needed) and use it as an argument of the method that needs it? The method would then call the service where needed and the "domain logic" stays in the entity/model.
Thank you in advance!
I'll just edit instead of copy paste answering ;)
Thank you all for your responses! :)
Pushing the steps back into the scenario would lead to some pretty big objects which i'm trying to avoid (the current running application really suffers from this). It seems that its pretty much alike the Scrum-Example of Vaughns "Effective Aggregate Design" where he is using DomainServices to get smaller aggregates (i really dont know why i'm so uncertain about using domain services). Looks like i'll have to use domainservices or split the aggregates up into "StepName" and "StepDetails" as suggested.
For background, you should read what Greg Young has to say about set validation (via WaybackMachine). In particular, you really need to evaluate, in the context of your solution, what is the business impact of having a failure?
Accept the failure and escalate is by far your easiest option. In what follows, I assume that the business impact of the failure is large, so we need to prevent it from happening.
The "name/natural id" of each Step in the context of its Scenario has to be unique
That's a classic set validation concern.
The first thing to do is challenge the assumptions in your model
Is your model the book of record for "name"? If your model isn't the authority, you have to be very cautious about introducing constraints. Understanding the boundaries of your model's authority is really important.
Is there an invariant that couples the name of a step to any other part of its state? Aggregate design discipline says that two pieces of state coupled by an invariant need to be in the same aggregate, but its silent about properties that don't participate in an invariant.
Is it reasonable to reject a name change while accepting other changes to a step? This is really a variation of the previous -- can tasks be split into two different commands (one involving name, one not) that can succeed or fail independently?
In short, the invariant may be telling you that "step name", as a piece of state, belongs in the scenario aggregate rather than in the step aggregate.
If you think about the problem from the perspective of a relational model, we're looking at a tuple (scenarioId, name, stepId), and the constraint says that (scenarioId, name) form a unique key. That's a hint that step name belongs to the scenario. In code, that signature looks like a scenario data structure that includes a Map<ScenarioName, ScenarioId>.
That won't necessarily solve all of your problems of course, but it is a step toward aligning the model with your actual business.
When that doesn't work...
The "real" answer is to move the step entity back into the scenario aggregate. One way to think about it is this -- all of the entities taken together form "the model" that we are keeping consistent. The aggregates aren't part of the business, per se; they are artificial, independent subdivisions within the model -- we identify and isolate aggregates as a performance optimization; we can perform concurrent edits, and evaluate the validity of a command while loading a much smaller data set.
If the failures make the performance optimization too expensive, you take it out. So you can see that we have an estimate, of sorts, for what it means that the business impact is "large"; it needs to be bigger than the savings we get from using aggregates on the happy path.
Another possibility is to shift where you enforce the invariant. Relational databases are really really good at set validation. So maybe the right answer is to split the enforcement concern: put the invariant into your schema as a constraint, and ignore that constraint in code.
This isn't ideal for a number of reasons -- you've effectively "hidden" the constraint, you've introduced a constraint on the kind of data store that you use for your aggregates, you've introduced a constraint that requires that you store your step aggregates in the same database as the scenario they belong to, and so on. If you squint, you'll see that this is really just the "make the step entities part of the scenario" solution, but in disguise.
But keep in mind: part of the point of domain-driven-design is that we can push back on the business when the code is telling us that the business model itself is wrong. Where's the cost benefit analysis?
Here's the thing about uniqueness constraints: the model enforces uniqueness, not correctness. Imagine a data race, two different commands that each claim the same "name" for a different step in the scenario -- perhaps caused by a data entry error. The model, presumably, can't tell which command is "right", so it's going to make some arbitrary guess (most likely, first command wins). If the model guesses wrong, it has effectively blocked the client that provided correct data!
In cases where the model is the authority, uniqueness constraints can make sense -- the SeatMap aggregate can enforce the constraint that only one ticket can be assigned to a seat at any given time, because it is the authority for assignment.

DDD structure example

I am trying to structure an application using DDD and onion/hexagonal/clean architecture (using Java and Spring). I find it easier to find guidance on the concepts themselves than actually how to implement them. DDD in particular seems rather tricky to find examples that are instructive because each problem is unique. I have seen numerous examples on SO that have been helpful but I still have questions. I wonder whether going through my example would help me and anyone else.
I hope you can forgive me asking more than one question here. The example seems too big for it to make sense me repeating it in multiple questions.
Context:
We have an application that should display information about soccer stats and has the following concepts (for simplicity I have not included all attributes):
Team, which has many Players.
Player.
Fixture, which has 2 Teams and 2 Halves.
Half, which has 2 FormationsPlayed and many Combinations.
FormationPlayed, which has many PositionsPlayed.
PositionPlayed, which has 1 Player and a position value object.
Combination, which can be of 2 types, and has many Moves.
Move, which can be of 2 types, has 1 Player and an event value object.
As you can imagine, trying to work out which things are aggregate roots here is tricky.
Team can exist independently so is an AR.
Player can exist independently so is an AR.
Fixture, when deleted, must also delete its Halves, so is an AR.
Half must be an entity in Fixture.
FormationPlayed must be deleted when a half is deleted, so perhaps this should be an entity in Half.
PositionPlayed must be deleted when a Formation is deleted, so believe this should be an entity in FormationPlayed.
Combination in a sense can exist independently, though is tied to a particular game half. Perhaps this could be an AR tied by eventual consistency.
Move must be deleted when a Combination is deleted, so believe this should be an entity in Combination.
Questions:
Do you see any errors in the above design? If so what would you change?
The Fixture - Half - FormationPlayed - PositionPlayed aggregate seems too large, so I wonder whether you would agree that this could be split into Fixture - Half and FormationPlayed - PositionPlayed using eventual consistency. The thing I can't find an example of is how this is implemented in Java? If Fixture were deleted, would you fire a FixtureDeleted event that causes its corresponding FormationPlayed entities to also be deleted?
I want to construct a domain model that has no understanding of the way that it will be persisted (as per onion architecture). My understanding is that domain entities here should not have surrogate keys because this relates to persistence. I also believe that entities should only reference entities in other aggregates by ids. How then, for example, would PositionPlayed reference Player in the domain model?
Initially the aim is only to allow the client to get the data and display it. Ultimately I want clients to be able to perform CRUD themselves, and I want all invariants to be held together by the domain model when this happens. Would it simplify things (and can you show me or point me to example explaining how) to have two domain models, one simple for data retrieval and one rich for the operations to be performed later? Two BCs, as it were. The reason I ask is that a rich domain model seems rather time consuming to come up with when initially we only want to display stats in the database, but I also don't want to create trouble for myself down the line if it is better to create one rich domain model now in view of the usecases envisioned later. I wonder, if I were to create a simpler model for data retrieval only, which concepts in DDD could be ignored (would I still need to break up large aggregates, for example?)
I hope this all makes sense. Obviously happy to explain further if needed. Realise I'm asking a lot here and I may have confused some ideas. Any answers and wisdom you can give to this would be greatly appreciated !
Do you see any errors in the above design? If so what would you change?
There might be a big one: is your system the book of record? or is it just keeping track of events that happen in the "real world". In a sense, the point of aggregates is to ensure that the book of record is internally consistent, but if you aren't the book of record....
For an example of what I mean
http://www.soccerstats.com/ -- the book of record is the real world.
https://www.easports.com/fifa -- the games are played in the computer
If Fixture were deleted, would you fire a FixtureDeleted event that causes its corresponding FormationPlayed entities to also be deleted?
Udi Dahan wrote: Don't Delete, Just Don't. If an entity has a lifecycle, and that lifecycle has an end, then you mark it, but you don't remove the entity.
I want to construct a domain model that has no understanding of the way that it will be persisted (as per onion architecture)
Great! Be warned, a lot of the examples that you will find online don't get this part right -- for historical reasons, many demonstrations of model are tightly coupled to the side effects that they have on persistence.
My understanding is that domain entities here should not have surrogate keys because this relates to persistence. I also believe that entities should only reference entities in other aggregates by ids. How then, for example, would PositionPlayed reference Player in the domain model?
Ah -- OK, this one is fun. Don't confuse surrogate keys used in the persistence layer with identifiers in the domain model. For instance, when I look at my purchasing history on Amazon, each of my orders (presumably an aggregate) has an ORDER # associated with it. That would imply that the domain level knows about OrderNumber as a value type. The persistence solution in the back end might introduce surrogate keys when storing that data, but those keys are not used by the model.
Note that's I've chosen an example where the aggregate is clearly the authority -- the order only really exists within the model. When the real world is the book of record, you often don't have a unique identifier available (what is Lionel Messi's PlayerId?)
The reason I ask is that a rich domain model seems rather time consuming to come up with when initially we only want to display stats in the database
A couple of thoughts on this -- ddd is usually saved for more complicated use cases (Greg Young: "is this where you get a competitive advantage?"). Most of the power of aggregates comes from the fact that they ensure the consistency of changes of state. When your real problem is data entry and reporting, it tends to be overkill.
Detection and remediation of inconsistencies is often easier/cheaper than trying to get prevention right; and may be satisfactory to the business, given the costs. Something to keep in mind.
The application is keeping track of events in the real world. At the moment, they are recorded manually in a database. Can you be explicit why you believe the distinction is important?
Very roughly -- events indicate things that have already happened. It's too late for the domain to veto them; the real world is outside of the domain's control.
Furthermore, we have to keep in mind that, since the real world is the book of record, things may have happened in the real world that our domain model doesn't know about yet (the reporting of events may be delayed, lost, reordered, and so on).
Aggregates are supposed to be a source of truth. Which means that they can only govern entities in the digital world.
One kind of information resource that you could create is a report of Messi's goals in a season. So every time a goal is reported, you run a command to update the report aggregate. That's not anemic -- not exactly -- but it's not very interesting. It's really just a view (in CQRS terms, it's a read model) that you can recreate from the history of events. It doesn't have any intelligence in it.
The interest aggregates are those that make decisions for themselves, based on the information that they are given.
A contrived example of an aggregate would be one that, if a player scores more than 10 goals in a season, orders that players jersey for you. Notice that while "goals" are something already present in your event stream, the business rule doesn't. That's purely a domain model thing.
So the way that this would work is that each time a goal event appeared, you would load the JerseyPerchasing aggregate, and tell it about the goal. And that aggregate would make sure that this was a new goal (not one that had previously been reported), and determine if the number of goals called for ordering a shirt, check to see if the order for the shirt had already been placed.
Key idea here -- the goals are something that the aggregate is told about. The decision to purchase a jersey is made by the aggregate, and shared with the world.
Later, you realize that sometimes a player gets traded, and then scores a 10th goal. And you have to determine as a business whether that means you get one shirt (which?) or one shirt for each jersey, or maybe you only order jerseys if he scored 10 goals for a specific team in a season. All of this logic goes into the aggregate.
a domain model as per onion architecture that, can you point me to any good examples?
Best place to look, as weird as it sounds, is among the functional programming types. Mark Seemann's blog includes a lot of important ideas that will help here.
The main idea to keep in mind that the model sits at the bottom. The app passes state to the model, and gets state back (in CQS terminology, you query the model). The app is responsible for sharing the results obtained from the model with the persistence component.
do you believe the accepted view would be that an anaemic model should be adopted for a domain this size
In the case where you are just re-organizing information from the real world for easier consumption? Yeah - load document, update document, store document makes a lot more sense to me than going overboard with a bunch of aggregate modeling. But don't read too much into that -- I don't know more about your model than what you have written here. If there's real business complexity in how you evaluate the information from the real world, then the answer would be different.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Categories