I am new to Optaplanner. I thought I had understood what planning entities are, as well as planning variables, genuine or some inverse-kind shadow ones. I have started studying the documentation, the examples and old StackOverflow's questions, but some doubts remain.
When trying to make incremental my score calculator, I have found some unexpected methods in the IncrementalScoreCalculator interface. Together with beforeVariableChanged and afterVariableChanged, I find *EntityAdded and *EntityRemoved, which make me suspect that entity objects may be added and removed. Moreover, these methods are implemented in the NQueens documented example, but in the kind of examples I looking at, examples of distributing shifts, resources, time slots, etc., I find that the domain is designed in such a way that planning entities are expected to be modified, but not added or removed.
I don't know if the addition/removal of entity objects is something used somewhere, as in route planning problems which I haven't dove into, and if these additions and removals are explicit or implicit. So, might planning entities be added or removed by Optaplanner without being asked to?
No, OptaPlanner out-of-the-box will not add or remove planning entity instances, because the default move selectors only modify planning entities, they don't create or destroy them.
OptaPlanner doesn't have any generic move selectors yet that can do that (and once we do, they won't be on by default).
If you write a custom move (see MoveListFactory and MoveIteratorFactory in docs), then you could choose to add/remove entities in moves, which is why those methods exists, but very few users do that.
Related
I am trying to structure an application using DDD and onion/hexagonal/clean architecture (using Java and Spring). I find it easier to find guidance on the concepts themselves than actually how to implement them. DDD in particular seems rather tricky to find examples that are instructive because each problem is unique. I have seen numerous examples on SO that have been helpful but I still have questions. I wonder whether going through my example would help me and anyone else.
I hope you can forgive me asking more than one question here. The example seems too big for it to make sense me repeating it in multiple questions.
Context:
We have an application that should display information about soccer stats and has the following concepts (for simplicity I have not included all attributes):
Team, which has many Players.
Player.
Fixture, which has 2 Teams and 2 Halves.
Half, which has 2 FormationsPlayed and many Combinations.
FormationPlayed, which has many PositionsPlayed.
PositionPlayed, which has 1 Player and a position value object.
Combination, which can be of 2 types, and has many Moves.
Move, which can be of 2 types, has 1 Player and an event value object.
As you can imagine, trying to work out which things are aggregate roots here is tricky.
Team can exist independently so is an AR.
Player can exist independently so is an AR.
Fixture, when deleted, must also delete its Halves, so is an AR.
Half must be an entity in Fixture.
FormationPlayed must be deleted when a half is deleted, so perhaps this should be an entity in Half.
PositionPlayed must be deleted when a Formation is deleted, so believe this should be an entity in FormationPlayed.
Combination in a sense can exist independently, though is tied to a particular game half. Perhaps this could be an AR tied by eventual consistency.
Move must be deleted when a Combination is deleted, so believe this should be an entity in Combination.
Questions:
Do you see any errors in the above design? If so what would you change?
The Fixture - Half - FormationPlayed - PositionPlayed aggregate seems too large, so I wonder whether you would agree that this could be split into Fixture - Half and FormationPlayed - PositionPlayed using eventual consistency. The thing I can't find an example of is how this is implemented in Java? If Fixture were deleted, would you fire a FixtureDeleted event that causes its corresponding FormationPlayed entities to also be deleted?
I want to construct a domain model that has no understanding of the way that it will be persisted (as per onion architecture). My understanding is that domain entities here should not have surrogate keys because this relates to persistence. I also believe that entities should only reference entities in other aggregates by ids. How then, for example, would PositionPlayed reference Player in the domain model?
Initially the aim is only to allow the client to get the data and display it. Ultimately I want clients to be able to perform CRUD themselves, and I want all invariants to be held together by the domain model when this happens. Would it simplify things (and can you show me or point me to example explaining how) to have two domain models, one simple for data retrieval and one rich for the operations to be performed later? Two BCs, as it were. The reason I ask is that a rich domain model seems rather time consuming to come up with when initially we only want to display stats in the database, but I also don't want to create trouble for myself down the line if it is better to create one rich domain model now in view of the usecases envisioned later. I wonder, if I were to create a simpler model for data retrieval only, which concepts in DDD could be ignored (would I still need to break up large aggregates, for example?)
I hope this all makes sense. Obviously happy to explain further if needed. Realise I'm asking a lot here and I may have confused some ideas. Any answers and wisdom you can give to this would be greatly appreciated !
Do you see any errors in the above design? If so what would you change?
There might be a big one: is your system the book of record? or is it just keeping track of events that happen in the "real world". In a sense, the point of aggregates is to ensure that the book of record is internally consistent, but if you aren't the book of record....
For an example of what I mean
http://www.soccerstats.com/ -- the book of record is the real world.
https://www.easports.com/fifa -- the games are played in the computer
If Fixture were deleted, would you fire a FixtureDeleted event that causes its corresponding FormationPlayed entities to also be deleted?
Udi Dahan wrote: Don't Delete, Just Don't. If an entity has a lifecycle, and that lifecycle has an end, then you mark it, but you don't remove the entity.
I want to construct a domain model that has no understanding of the way that it will be persisted (as per onion architecture)
Great! Be warned, a lot of the examples that you will find online don't get this part right -- for historical reasons, many demonstrations of model are tightly coupled to the side effects that they have on persistence.
My understanding is that domain entities here should not have surrogate keys because this relates to persistence. I also believe that entities should only reference entities in other aggregates by ids. How then, for example, would PositionPlayed reference Player in the domain model?
Ah -- OK, this one is fun. Don't confuse surrogate keys used in the persistence layer with identifiers in the domain model. For instance, when I look at my purchasing history on Amazon, each of my orders (presumably an aggregate) has an ORDER # associated with it. That would imply that the domain level knows about OrderNumber as a value type. The persistence solution in the back end might introduce surrogate keys when storing that data, but those keys are not used by the model.
Note that's I've chosen an example where the aggregate is clearly the authority -- the order only really exists within the model. When the real world is the book of record, you often don't have a unique identifier available (what is Lionel Messi's PlayerId?)
The reason I ask is that a rich domain model seems rather time consuming to come up with when initially we only want to display stats in the database
A couple of thoughts on this -- ddd is usually saved for more complicated use cases (Greg Young: "is this where you get a competitive advantage?"). Most of the power of aggregates comes from the fact that they ensure the consistency of changes of state. When your real problem is data entry and reporting, it tends to be overkill.
Detection and remediation of inconsistencies is often easier/cheaper than trying to get prevention right; and may be satisfactory to the business, given the costs. Something to keep in mind.
The application is keeping track of events in the real world. At the moment, they are recorded manually in a database. Can you be explicit why you believe the distinction is important?
Very roughly -- events indicate things that have already happened. It's too late for the domain to veto them; the real world is outside of the domain's control.
Furthermore, we have to keep in mind that, since the real world is the book of record, things may have happened in the real world that our domain model doesn't know about yet (the reporting of events may be delayed, lost, reordered, and so on).
Aggregates are supposed to be a source of truth. Which means that they can only govern entities in the digital world.
One kind of information resource that you could create is a report of Messi's goals in a season. So every time a goal is reported, you run a command to update the report aggregate. That's not anemic -- not exactly -- but it's not very interesting. It's really just a view (in CQRS terms, it's a read model) that you can recreate from the history of events. It doesn't have any intelligence in it.
The interest aggregates are those that make decisions for themselves, based on the information that they are given.
A contrived example of an aggregate would be one that, if a player scores more than 10 goals in a season, orders that players jersey for you. Notice that while "goals" are something already present in your event stream, the business rule doesn't. That's purely a domain model thing.
So the way that this would work is that each time a goal event appeared, you would load the JerseyPerchasing aggregate, and tell it about the goal. And that aggregate would make sure that this was a new goal (not one that had previously been reported), and determine if the number of goals called for ordering a shirt, check to see if the order for the shirt had already been placed.
Key idea here -- the goals are something that the aggregate is told about. The decision to purchase a jersey is made by the aggregate, and shared with the world.
Later, you realize that sometimes a player gets traded, and then scores a 10th goal. And you have to determine as a business whether that means you get one shirt (which?) or one shirt for each jersey, or maybe you only order jerseys if he scored 10 goals for a specific team in a season. All of this logic goes into the aggregate.
a domain model as per onion architecture that, can you point me to any good examples?
Best place to look, as weird as it sounds, is among the functional programming types. Mark Seemann's blog includes a lot of important ideas that will help here.
The main idea to keep in mind that the model sits at the bottom. The app passes state to the model, and gets state back (in CQS terminology, you query the model). The app is responsible for sharing the results obtained from the model with the persistence component.
do you believe the accepted view would be that an anaemic model should be adopted for a domain this size
In the case where you are just re-organizing information from the real world for easier consumption? Yeah - load document, update document, store document makes a lot more sense to me than going overboard with a bunch of aggregate modeling. But don't read too much into that -- I don't know more about your model than what you have written here. If there's real business complexity in how you evaluate the information from the real world, then the answer would be different.
As far as I know, once a NodeEntity in Spring Data Neo4j is loaded, the default behaviour is to lazily load its relations by fetching only ids of related nodes.
While it seems quite ok in most situation, I have doubts about it in the case of so called "supernodes" - the nodes that have numerous relations to other nodes. That kind of nodes, even if small by themselves, will hold a huge collection of ids, using more memory than we would like it to use, and possibly being not "lazily loaded enough" in effect...
So my question is - how shall I deal with that kind of supernode?
My first idea is to simply remove all #RelatedTo/#RelatedToVia mappings (or at least the ones with relation types that are "numerous") from that kind of nodes and simply bypass SDN when operations on those relations are needed, and use SDN in other cases.
Does it seem to have sense? Do you have some other suggestions or some experience in that kind of situations?
I have not worked with SDN but I will give a try to the approximation of metanodes. With this approximation you build a structure that split the total number of relations into the number of metanodes (if a node has 1000 connections and you use 10 metanodes, each metanode will have 100 connection while the supernode just 4. You can see a graphic representation in the folowing image: http://i.stack.imgur.com/DMQGs.png.
In this way you can have a good control of how many relations can have a node and therefore how many node will be maximal loaded by SDN.
You can read more about it on http://neo4j.com/book-learning-neo4j/ and also in this similar post Neo4j how to avoid supernodes
For supernodes I'd just not specify the relationship on the supernode entity. But only on the related nodes.
And if you're interested in the relationship you either lookup the related node and follow to the supernode.
Or if you really need to load the millions of relationships, use a cypher statement.
You can also put the many relationships on a separate node for that purpose or add a tree-like-substructure which also allows to deal with subselections.
First, can you provide the version of SDN you are using so we can target the question to the right maintainers of the library.
Secondly, while I don't know really the internals of SDN but have worked heavily with other OGMs, my understanding of LazyLoading is quite different that the one you provide, for the simple reason that lazy loading the ids can be very harmful in the sense that you can have corrupted data if another process is deleting one of the nodes having one of these ids.
Generally, and it is quite common in other OGMs, in the case of an object has no annotations representing relationships, you would just recreate the object from his metadata and the loaded node.
However if it has relationships, you would then create a proxy of that object that will extend the entity itself.
The entity values on the proxy will not be instantiated in the first instance, you would then override all getters and add in the proxy the methods for retrieving the related nodes (so the Entity manager would be injected in the proxy).
So basically, a proxy will be empty until you call one of the getters on it.
You can also "fine-grain" this behavior by creating Custom repositories that extend the default one, in the sense you can choose to only LAZY_LOAD one type of relationships and EAGER_LOAD the others.
The method described by albert makes lot of sense in some cases, however it is hard to accomplish on the basic OGM side, you would better have a BehaviorComponent that will handle this for you during lifecycle events, or add some kind of pagination to the getter method, which I think is not part of the OGM right now.
So the question - I have a lot of tables in database and almost all of them have on delete cascade. What is the best way to inform user what will be deleted in entire database if he deletes one certain row. What algorithm/patterns should I read? It's desirable with implementation in java. Thank you.
First off, this seems like a wrong design when you consider the fact that the cascade paths through the object graph are known at compile time and you are asking how to construct them, on demand, at runtime. You could build them and store them once.
That said, there probably isn't much reason for a design pattern. Mostly you are going to need Reflection, including the ability to find annotations on either properties or methods.
Then as you navigate the graph, you will look for the target annotations and either add or not add, and of course, if you don't find a cascade, you can stop going down that branch of the graph.
If there were some reason to handle types differently, Visitor would apply, but there isn't. The annotation processing tool from Sun used visitor, but that was for compile time processing.
Probably don't have the ability to do this, but it would be interesting to do it in Java 8 because you could more cleanly separate the navigation code from the test code, by defining a Predicate (as a Lambda) and then just having that be evaluated at each node. Your predicate would simple check for the presence of the Cascade annotation. Sounds like maybe you are not using an ORM so might not have annotations in your code for the cascades, all the more reason to have a separate predicate because then you could have a metadata version that actually looks at the specific database (Postgres), but if you wanted to use it with an ORM, you'd literally be changing a few lines of code.
Usually with Java EE when we create Model, we define the fields and types of fields through XML or annotation before compilation time. Is there a way to change those in runtime? Or better, is it possible to create a new Model based on the user's input during the runtime? Such that the number of columns and types of fields are dynamic (determined at runtime)?
Help is much appreciated. Thank you.
I felt the need to clarify myself.
Yes, I meant database modeling, when talking about Model.
As for the use cases, I want to provide a means for users to define and create their own tables. Infinite flexibility is not required. However some degree of freedom has to be there: e.g. the users can define what fields are needed to describe their product.
You sound like you want to be able to change both objects and schema according to user input at runtime. This sounds like a chaotic recipe for disaster to me. I've never seen it done.
I have seen general schemas that incorporate foreign key relationships to generic tables of name/value pairs, but these tend to become infinitely flexible abstractions that can neither be easily understood nor get out of their own way when it comes to performance.
I'm betting that your users really don't want infinite flexibility. I'd caution you against taking this direction. Better to get your real use cases straight.
Anything is possible, of course. My direct experience tells me that it's a bad idea that your users will hate if you can pull it off. Best of luck.
I worked on a system where we had such facilities. To stay efficient, we would generate/alter the table dynamically for the customer schema. We also needed to embed a meta-model (the model of the model) to process information in the entities dynamically.
Option 1: With custom tables, you have full flexibility, but it also increases the complexity significantly, notably the update/migration of existing data. Here is a list of things you will need to consider:
What if the type of a column change?
What if a column is added? Is there a default value?
What if a column is removed? Can I discard the existing information?
How to manage renaming of a column?
How to make things portable across databases?
How to make it efficient at database-level (e.g. indexes) ?
How to manage a human error (e.g. user removes a column then changes its mind)?
How to manage migration (script, deployment, etc.) when new version of the system is installed at customer site?
How to have this while using an ORM?
Option 2: A lightweight alternative is to add a few "spare" columns in the business tables of different types (e.g.: "USER_DATE_1", "USER_DATE_2", etc.) I've seen that a few times. It will makes your DBA scream and is not really considered a good practice, but at least can facilitates a few things, e.g. (migration scripts, ORM integration).
Option 3: Another option is to store everything in a table with a structure property/data. But then it's really a disaster for database performance. Anything that is not completely trivial will require many joins. And the DBA will scream even more.
Option 4: It is a mix of options 2 and 3. Core tables are fixed, but a table with property/data can be used to somehow extend them.
In summary: think twice before you go this way. It can be done, but has a significant impact on the design and maintenance of the application.
This is somehow possible using meta-modeling techniques:
tables for table / column / types at the database level
key/value structures at the Java level
But this has obvious limitations (lack of strong typed objects) and can IMHO get quickly very complicated (not even sure how to deal with relations). I wouldn't use this approach to define domain objects entirely, but only to extend existing ones (products, articles, etc).
If I remember well, this is what some e-commerce solutions (e.g. BroadVision) were doing.
I think I have found a good answer myself. Those new no-sql (hbase, cassandra) database seems to be exactly what I was looking for. Thanks everyone for your answeres.
I am looking for what most people use as their collection type when making one-to-many associations in Hibernate. The legacy application I am maintaining uses bags exclusively, but keeps them as lists in code. The tables associated have an id field, so an idbag seems more appropriate, but documentation recommends a Set.
EDIT: I mistakenly referenced that the documentation recommends a set. In reality, the official documentation is equally vague on all collection types. What I find is that some websites seem to infer that Set is the most common, and the Hibernate book I am reading explicitly says this about sets:
This is the most common persistent collection in a typical Hibernate application. (see: page 242 of 'Java Persistence with Hibernate' by Christian Bauer and Gavin King)
I guess that is what threw me and made me seek out what others are using.
EDIT2: note that Gavin King is the creator of Hibernate
Based on the experience of using both I would recommend using a List. If you are getting data out of the database and displaying / manipulating it then it nearly always needs to be kept in a consistent order. You can use SortedSet but that can add a whole world of pain (overriding equals, hashcode etc. and sorting in different ways) compared to just adding an order by and storing it in a List. Lists are easier to manipulate - if a user deletes line 3 on the page, then just remove item 3 in the List. Working with a Set seems to involve lots of unnecessary code and messing about with iterators.
When I have used Sets with Hibernate I have frequently found myself ripping all the Sets out after a few weeks and replacing with Lists because Sets are giving me too many limitations.
The Hibernate documentation and third party tools seem to use Sets by default but from hard experience I have found it much more productive to use Lists.
Ok, after quite some time I have found a reason NOT to use a Set as a collection type. Due to problems with the hashcode/equals overrides and the way hibernate persists, using any java API functionality that calls hashcode/equals is a bad idea. There is no good way to consistently compare objects pre- and post-persistence. Stick with collections that do not rely on equals/hashcode like bag.
More info here:
http://community.jboss.org/wiki/EqualsandHashCode (this link makes it sound like a business key is the way to go, but read the next link fully to see why that is not always a good idea)
https://forum.hibernate.org/viewtopic.php?f=1&t=928172 (read the whole discussion to make your head spin)
I'm guessing people use all kinds of things :-) - different collection types serve different purposes so the "best" one depends on what you need it for.
That said, using List in code is usually more convenient than using Set even though said List is unordered. If nothing else, '.get(0)' is easier on the eyes than .iterator().next() :-) Hibernate bag support is definitely adequate for this purpose plus you can even add an order-by declaration (if applicable) and have your list sorted.
idbag is a whole different animal used for many-to-many associations; you can't really compare it to regular Set or List.
I would recommend using a set because a set is defined as a collection of unique items and thats normally what you deal with.
And .iterator().next() is save when there is no element in your collection.
.get(0) might throw an IndexOutOfBoundsException if you access an empty list.