How should one go about filtering a series of domain objects according to user-defined criteria? Should the filtering methods be in the model or should they be in the DAO?
If you want to use your model objects only (mainly) as data containers you should put the filter into the DAOs you are already using. It is a good practice to make sure, that the user defined criteria are database independent (so pass your own filter object instead of plain e.g. Hibernate Criteria. query by example may work great, too).
Then your DAO methods can look like this:
public interface BeanDao
{
List<Bean> findAllByFilter(BeanFilter filter);
}
The choice of whether to retrieve a greater number of objects and then filter or simply to retrieve the correct objects in the first place depends on the underlying data. Most applications will use a combination of the two.
The things I would consider are:
Network bandwidth & Memory requirements
If after filtering there are a small number of results but a significantly larger number of results before filtering then it could be a waste of bandwidth and memory resources to do the filtering in code.
Query speed
Filtering the results in the database can be more expensive than doing the logic in code - disk vs memory. Indexes are required to make it worthwhile.
Maintainability
Creating new queries can be time consuming. It will definitely mean writing some sql and revisiting various phases of testing. It may require modifying the db schema such as adding an index to speed up the query.
When solving this problem in Java it might be worth considering the visitor pattern. I often use two interfaces SingleMatcher and MultipleMatcher to filter a collection of objects. Implementations of these can be combined to create new user-defined criteria. Another advantage of this is that once you have a the matchers users may want to use you won't have to go back to testing to create new criteria.
Related
I am wondering which approach is better. Should we use fine grained entities on the grid and later construct functionaly rich domain objects out of the fined grained entities.
Or alternatively we should construct the course grained domain objects and store them directly on the grid and the entities we just use for persistence.
Edit: I think that this question is not yet answered completely. So far we have comments from Hazelcast,Gemfire and Ignite. We are missing Infinispan, Coherence .... That is for completion sake :)
I agree with Valentin, it mainly depends on the system you want to use. Normally I would consider to store enhanced domain objects directly, anyhow if you would just have very few objects but their size is massive you end up with bad distribution and unequal memory usage on the nodes. If your domain object are "normally" sized and you have plenty, you shouldn't worry.
In Hazelcast it is better to store those objects directly but be aware of using a good serialization system as Java Serialization is slow. If you want to query on properties inside your domain objects you should also consider adding indexes.
I believe it can differ from one Data Grid to another. I'm more familiar with Apache Ignite, and in this case fine grained approach works much better, because it's more flexible and in many cases gives better data distribution and therefore better scalability. Ignite also provides rich SQL capabilities [1] that allow to join different entities and execute indexed search. This way you will not lose performance with fine grained model.
[1] https://apacheignite.readme.io/docs/sql-queries
One advantage of a coarse-grained object is data consistency. Everything in that object gets saved atomically. But if you split that object up into 4 small objects, you run the risk that 3 objects save and 1 fails (for whatever reason).
We use GemFire, and tend to favor coarse-grained objects...up to a point. For example our Customer object contains a list of Addresses. An alternative design would be to create one GemFire region for "Customer" and a separate GemFire region for "CustomerAddresses" and then hope you can keep those regions in sync.
The downside is that every time someone updates an Address, we re-write the entire Customer object. That's not very efficient, but our traffic patterns show that address changes are very rare (compared to all the other activity), so this works out fine.
One experience we've had though is the downside of using Java Serialization for long-term data storage. We avoid it now, because of all the problems caused by object compatibility as objects change over time. Not to mention it becomes headache for .NET clients to read the objects. :)
I am in the process of designing a new java application which has very strict requirements for auditing. A brief context is here:
I have a complex entity with multiple one to many nested relationships. If any of the field changes, I need to consider it as a new version of the object and all this need to be audited as well. Now I have two options:
1.) Do not do any update operation, just insert a new entity whenever anything changes. This would require me to create all the relational objects (even if they have not been changed) as I do not want to hold references to any previous version objects. My data tables becomes my auditing table as well.
OR
2.) Always do an update operation and maintain the auditing information in separate tables. That would add some more complexity in terms of implementation.
I would like to know if there is a good vs bad practice for any of these two approaches.
Thanks,
-csn
What should define your choice is your insert/update/read patterns for both the "live" data and the audits.
Most commonly these pattern are very different for both kinds.
- Conserning "live" it depends a lot on your application but I can imagine you have significants inserts; significatant updates; lot of reads. Live data also require transactionality and have lot relationship between tables for which you need to keep consistency. They might require fast and complex search. Indexes on many columns
- Audits have lot of inserts; almost no update; few reads. Read, search don't requires complex search (e.g. you only consult audits and sort them by date) and indexes on many columns.
So with increased load and data size you will probably need to split the data and optimize tables for your use cases.
i want to do CQRS. where should i put queries? currently i can think of two options:
1) each query should be an independent class that contains just a string? and such object should be passed to simple/stupid repository
in this approach we have potentially thousands of small queries/classes. also we have problems with complex queries (e.g. in oracle we can't have empty where in (...) part) so there is no good place to check if that part is empty and simply return empty collection without even touching database
also it's a bit hard to use different queries when working on different databases
2) create 1 method per query in repository object
is it still CQRS? don't we loose the ability to easily select and pass query? or maybe it's not really needed?
I think you may be mixing concepts here. CQRS only states that there are separate models for query and commands, which is really broad.
For instance, one possible implementation is having two separate generic repositories, one for queries and the other for commands. Query repository implementations may use a database while command repositories implementations may use a different one. Or not.
Passing query classes to your repository versus having your repository implement many different methods is just a matter of organizing your (query) repository, not a command-query segregation concern.
I want to change a flag on a series of objects. What is the standard DAO practise for:
Changing a property of all objects represented in a table?
Merging a list of objects?
You stumbled upon one of the things where the classical DAO approach can often lead to bad performance. Depending on your persistence engine, it will be extremely tricky to turn this into ONE efficient UPDATE statement instead of having to update hundreds of objects individually.
I would look at my business objects, estimate the amount of objects one can change at the same time and measure the impact on having a 'pure' oo domain model (which usually boils down to iterating through those objects and changing them one at a time) or adding a custom method that will do a batch update call for just this situation.
This task occurs from time to time in my projects. I need to handle a collection of some complex elements, having different attributes, such as login, password_hash, role, etc. And, I need to be able to query that collection, just like I query a table in the database, having only partial data. For example: get all users, with role "user". Or check, if there's a user with login "root" and role "superuser". Removing items, based on same data is also needed. The first attempt I've tried is to use Google collections, Apache collections, and lambdaj. All of them have very similar preicate mechanism, but with a great disadvantage: it is based on iteration, one by one, over the collection of items, which is not good, for often used collections, containing big amounts of data. Could you please suggest me some solution? Thanks.
UPDATE:
Currently I've solved this problem by implementing my custom collection, with multiple indexes , so that I can perform direct queries: http://code.google.com/p/tablej/
The database isn't automagically efficient either; you actually need to configure a database (by putting indices on relevant columns) for it to be able to do searches efficiently.
In a similar way you can optimize your code for speed. If you need to do a lot of lookups based on a few criteria you could make dedicated collections for that. Similar to an index on a database.
You'd do that by not only inserting your User into a Users list, but by putting that same user in for instance a Map keyed on Role:
public void addUser(User user) {
users.add(user);
// your index
if (!usersByRole.containsKey(user.getRole()) {
usersByRole.put(user.getRole(), new ArrayList<User>());
}
usersByRole.get(user.getRole()).add(user);
}
public List<User> findByRole(String role) {
if (!usersByRole.containsKey(role)) {
return Collectsions.emptyList();
}
return Collections.unmodifieableList(usersByRole.get(role));
}
Java can't do LINQ. It doesn't even come close, due to the lack of lambdas, yield, extension methods and expression trees. The quaere project offers a poor-man's substitute.
I don't think this will satisfy your algorithmic efficiency requirements, however. These can only be done in one of two ways, AFAIK:
Hand-coded data structures, optimised for the questions you want to ask.
In-memory HSQLDB.
The former is hard, but likely to yield the best performance. The latter won't be as fast, but it will be OK, especially if properly tuned with indexes, and much simpler to work with.
You could embed a database into your app.
Here is a discussion on selecting the best embedded Java database.
The ones below are probably the top three (They are all free).
HSQLDB has a good reputation.
Java DB Java DB is included in
the Java SE Development Kit, and is
the developer database for the Sun
GlassFish Enterprise Server. It also
has a good reputation.
Berkeley DB Is a mature product, also high reputation.