drools validation unique and dependent - java

I have a collection of POJOs in memory and these POJOs come from other system. I have the following two problems with them:
I want to know what POJO is duplicate in terms of properties value.
I also validate against other collection i.e I have 200 shops in a city and shop ids start from 1 and ends at 200. I got a data from the shop and it submits me 500 as shop id. I want to verify the data is correct according to my collection of data or not?
I am currently stuck and don't know how to perform these operation.
I am collecting data for the market trends, shops from all over the city are registered with us. We assigned ID to each of the store. Store keeper will send us their selling details in the plain file format. My task is to collect correct data in DB. If shop or goods id doesn't match with my collection, then that record would be incorrect and I notify the shop keeper that this record is invalid. If file contains same row two or more times then also, I notify that it is duplicate.
Thanks,

I think you need to have equals() method implemented for these POJOs to say that an object is duplicate of another one. Then, you can keep inserting the POJOs you received, into a java.util.Set and everytime you receive a new one, you can check if the POJO is already received, using set.contains().
For #2, you can maintain a Map of ID to the object in that other collection to check if a newly arrived POJO is a valid one present in that map.
AFAIK, drools requires you to provide a canonical representation of your object to run the rules on the state of the object. The above two validations require you to maintain those data structures irrespective of using drools or any other rule engine.

Related

What is the DDD way to make sure that there is only one obj created with 2 attribute combinations

im pretty new to the whole DDD concept and i have the following question:
Lets say i have a UI where Users can save cars by putting in a id and a name. What is the DDD way to make sure that every unique id and name combination is only created once. The cars are all Entities and will be stored in a database. Usually i would just have put a primary and a foriegn key in a DB and just check if the combination is already there and if not create/store the obj and if there is the same combination then don´t.
Now i´m thinking if this is domain logic or just a simple CRUD. If it is domain logic and if i udnerstood correctly i should make my car object decide if it is valid or not. If thats the case how would i do that?
thanks in advance!
edit:
another thing: What if every created object should be deleted after 10 days. That would be a concept in the domain and would hence be also part of the domain logic. But how should the Object know when to delete itself and how should it do it? Would that be a domain service that checks the creation date of the objects and if it is older than 10 days it should perform a delete operation inside the DB?
I would go with a UNIQUE constraints on the 2 fields if you don't care about the validity of the values entered. That way even if someone, for some reasons, inserts/updates the records directly in the DB, the DB will prevent it.
If you care about the validity of the combined values entered, then you will have to add on top of that some logic in your code before saving it in the DB.
About your deletion mechanism, you can have a scheduler that check every day what are the data older than 10 days by checking a previously filled DB column (eg CREATED_ON) and delete them.
"It depends".
If id and name are immutable properties that are assigned at the beginning of the objects lifetime, then the straight forward thing to do is incorporate them into the key that you use to look up the aggregate.
car = Garage.get(id, name)
If instead what you have is a relation that changes over time (for instance, if you have to worry about name being corrupted by a data entry error) then things become more complicated.
The general term for the problem you are describing is set-validation. And the riddle is this: in order to reliably verify that a set has some property, you need to know that the property doesn't change between when you check it and when you commit your own change. In other words, you need to be able to lock the entire set.
Expressed more generally, the set is a collection of associated objects that we treat as a unit for the purpose of data changes. And we have a name for that pattern: aggregate.
So "the registry of names" becomes an aggregate in its own right - something that you can load, modify, store, and so on.
In some cases, it can make sense to partition that into smaller aggregates ("the set of things named Bob") - that reduces the amount of data you need to load/store when managing the aggregate itself, but adds some complexity to the use case when you change a name.
Is this "better" than the answer of just using database constraints? It depends on which side of the trade off you value more -- enforcing part of the domain invariant in the domain model and part of it in the data store adds complexity. Also, when you start leaning on the data store to enforce part of the invariant, you begin to limit your choices of what data store to use.

Access to one data collection from multiple logged users

I am solving following problem and I will grateful for some advice because I can move my project ahead, so I hope this question don't break any rules.
In my app I have 2 rest controller:
for data storing
for data fetching
It should works something like that: Some user send data to database with first rest controller. Each data object has also recipient property, so I need put this recipient into collection. That collection will be available for each user. When data are stored, user get response 200.
When another use controller for data fetching hi check this collection, if contains his id. If yes, he load it from database and return to user response with these data. Else he is waiting and in loop is checking this collection until it will be contains his id or waiting time expire. If during checking this collection he is found his id, he remove this id and fetch data and return it as response else return empty object.
Can you tell me if spring contains some feature for that? Or if not how it could be did by pure java? Thanks in advice.
As per my understanding , you need to preserve some value after each POST/PUT. So that you can use this value before GET.
I would suggest to you some in memory cache. ( Something like guava as it provides many methods for operating on cache)
Or check if you can achieve this with Spring Cache.
If you are using MongoDB, you can use cacheable collection.

Mixed list of new/updated objects: how to efficiently store them to the DB?

OK, so let's say I have a list that contains the following types of objects:
Objects that are already stored in the database (have the same PK),
and are the same as in the database, not modified
Objects that are already stored in the database (have the same PK), and are modified in regards to the stored ones, so they need to be updated
Objects that don't yet exist in the database, and are about to be saved
Such list of objects is being sent as a JSON to the web-service, and the web-service now has to communicate to the database, and decide what objects to store, update or ignore.
My question is how to do this effectively?
One idea is to iterate the list, and for every object's PK make a query to the database, and check if the object in the database is non-existent, the same, or modified. And then choose the action based on that information.
What bothers me with that approach is a whole lot of queries to the database, just to save some objects. What if only 1 of 100 should really be saved? It is so ineffective.
Is there any better way to do that?
You can send the whole list to DB (MYSQL) and do upsert :
INSERT ... ON DUPLICATE KEY UPDATE

Java - Google App Engine - modelling graph structures in Google Datastore

Google Apps Engine offers the Google Datastore as the only NoSQL database (I think it is based on BigTable).
In my application I have a social-like data structure and I want to model it as I would do in a graph database. My application must save heterogeneous objects (users,files,...) and relationships among them (such as user1 OWNS file2, user2 FOLLOWS user3, and so on).
I'm looking for a good way to model this typical situation, and I thought to two families of solutions:
List-based solutions: Any object contains a list of other related objects and the object presence in the list is itself the relationship (as Google said in the JDO part https://developers.google.com/appengine/docs/java/datastore/jdo/relationships).
Graph-based solution: Both nodes and relationships are objects. The objects exist independently from the relationships while each relationship contain a reference to the two (or more) connected objects.
What are strong and weak points of these two approaches?
About approach 1: This is the simpler approach one can think of, and it is also presented in the official documentation but:
Each directed relationship make the object record grow: are there any limitations on the number of the possible relationships given for instance by the object dimension limit?
Is that a JDO feature or also the datastore structure allows that approach to be naturally implemented?
The relationship search time will increase with the list, is this solution suitable for large (million) of relationships?
About approach 2: Each relationship can have a higher level of characterization (it is an object and it can have properties). And I think memory size is not a Google problem, but:
Each relationship requires its own record, so the search time for each related couple will increase as the total number of relationships increase. Is this suitable for large amount of relationships(millions, billions)? I.e. does Google have good tricks to search among records if they are well structured? Or I will be soon in a situation in which if I want to search a friend of User1 called User4 I have to wait seconds?
On the other side each object doesn't increase in dimension as new relationships are added.
Could you help me to find other important points on the two approaches in such a way to chose the best model?
First, the search time in the Datastore does not depend on the number of entities that you store, only on the number of entities that you retrieve. Therefore, if you need to find one relationship object out of a billion, it will take the same time as if you had just one object.
Second, the list approach has a serious limitation called "exploding indexes". You will have to index the property that contains a list to make it searchable. If you ever use a query that references more than just this property, you will run into this issue - google it to understand the implications.
Third, the list approach is much more expensive. Every time you add a new relationship, you will rewrite the entire entity at considerable writing cost. The reading costs will be higher too if you cannot use keys-only queries. With the object approach you can use keys-only queries to find relationships, and such queries are now free.
UPDATE:
If your relationships are directed, you may consider making Relationship entities children of User entities, and using an Object id as an id for a Relationship entity as well. Then your Relationship entity will have no properties at all, which is probably the most cost-efficient solution. You will be able to retrieve all objects owned by a user using keys-only ancestor queries.
I have an AppEngine application and I use both approaches. Which is better depends on two things: the practical limits of how many relationships there can be and how often the relationships change.
NOTE 1: My answer is based on experience with Objectify and heavy use of caching. Mileage may vary with other approaches.
NOTE 2: I've used the term 'id' instead of the proper DataStore term 'name' here. Name would have been confusing and id matches objectify terms better.
Consider users linked to the schools they've attended and vice versa. In this case, you would do both. Link the users to schools with a variation of the 'List' method. Store the list of school ids the user attended as a UserSchoolLinks entity with a different type/kind but with the same id as the user. For example, if the user's id = '6h30n' store a UserSchoolLinks object with id '6h30n'. Load this single entity by key lookup any time you need to get the list of schools for a user.
However, do not do the reverse for the users that attended a school. For that relationship, insert a link entity. Use a combination of the school's id and the user's id for the id of the link entity. Store both id's in the entity as separate properties. For example, the SchoolUserLink for user '6h30n' attending school 'g3g0a3' gets id 'g3g0a3~6h30n' and contains the fields: school=g3g0a3 and user=6h30n. Use a query on the school property to get all the SchoolUserLinks for a school.
Here's why:
Users will see their schools frequently but change them rarely. Using this approach, the user's schools will be cached and won't have to be fetched every time they hit their profile.
Since you will be getting the user's schools via a key lookup, you won't be using a query. Therefore, you won't have to deal with eventual consistency for the user's schools.
Schools may have many users that attended them. By storing this relationship as link entities, we avoid creating a huge single object.
The users that attended a school will change a lot. This way we don't have to write a single, large entity frequently.
By using the id of the User entity as the id for the UserSchoolLinks entity we can fetch the links knowing just the id of the user.
By combining the school id and the user id as the id for the SchoolUser link. We can do a key lookup to see if a user and school are linked. Once again, no need to worry about eventual consistency for that.
By including the user id as a property of the SchoolUserLink we don't need to parse the SchoolUserLink object to get the id of the user. We can also use this field to check consistency between both directions and have a fallback in case somehow people are attending hundreds of schools.
Downsides:
1. This approach violates the DRY principle. Seems like the least of evils here.
2. We still have to use a query to get the users who attended a school. That means dealing with eventual consistency.
Don't forget Update the UserSchoolLinks entity and add/remove the SchoolUserLink entity in a transaction.
You question is too complex but I try explain the best solution (I will answer in Python but same can be done in Java).
class User(db.User):
followers = db.StringListProperty()
Simple add follower.
user = User.get(key)
user.followers.append(str(followerKey))
This allow fast query who is followed and followers
User.all().filter('followers', followerKey) # -> followed
This query i/o costly so you can make it faster but more complicated and costly in i/o writes:
class User(db.User):
followers = db.StringListProperty()
follows = db.StringListProperty()
Whatever this is complicated during changes since delete of Users need update follows so you need 2 writes.
You can also store relationships but it is the worse scenario since it is more complex than second example with followers and follows ... - keep in mind than entity can have 1Mb it is not limit but can be.

Is it advisable to store some information (meta-data) about a content in the id (or key) of that content?

It is advisable to store some information(meta-data) about a content in the Id(or key) of that content ?
In other words, I am using a time based UUIDs as the Ids (or key) for some content stored in the database. My application first accesses the list of all such Ids(or keys) of the content (from the database) and then accessed the corresponding content(from the database). These Ids are actually UUIDs(time based). My idea is to store some extra information about the content, in the Ids itself, so that the my software can access this meta-content without accessing the entire content from the database again.
My application context is a website using Java technology and Cassandra database.
So my question is,
whether I should do so ? I am concerned since lots of processing may be required (at the time of presentation of data to user) in order to retrieve the meta data from the ids of the content!! Thus it may be instead better to retrieve it from database then getting it through processing of the Id of that content.
If suggested then , How should I implement that in an efficient manner ? I was thinking of following way :-
Id of a content = 'Timebased UUID' + 'UserId'
where, 'timebasedUUID' is the generated ID based on the timestamp when that content was added by a user & 'userId' represents the Id of the user who put that content.
so my example Id would look something like this:- e4c0b9c0-a633-15a0-ac78-001b38952a49(TimeUUID) -- ff7405dacd2b(UserId)
How should I extract this userId from the above id of the content, in most efficient manner?
Is there a better approach to store meta information in the Ids ?
I hate to say it since you seem to have put a lot of thought into this but I would say this is not advisable. Storing data like this sounds like a good idea at first but ends up causing problems because you will have many unexpected issues reading and saving the data. It's best to keep separate data as separate variables and columns.
If you are really interested in accessing meta-content with out main content I would make two column families. One family has the meta-content and the other the larger main content and both share the same ID key. I don't know much about Cassandra but this seems to be the recommended way to do this sort of thing.
I should note that I don't think that all this will be necessary. Unless the users are storing very large amounts of information their size should be trivial and your retrievals of them should remain quick
I agree with AmaDaden. Mixing IDs and data is the first step on a path that leads to a world of suffering. In particular, you will eventually find a situation where the business logic requires the data part to change and the database logic requires the ID not to change. Off the cuff, in your example, there might suddenly be a requirement for a user to be able to merge two accounts to a single user id. If user id is just data, this should be a trivial update. If it's part of the ID, you need to find and update all references to that id.

Categories