Handling Aggregations in a Service or DAO

Handling Aggregations in a Service or DAO - java

The ever so popular discussion on designing proper DAOs always concludes with something along the lines of "DAOs should only perform simple CRUD operations".
So what's the best place to perform things like aggregations and such? And should DAOs return complex object graphs resembling your data source's schema?
Assume I have the following DAO interface:
public interface UserDao {
public User getByName(String name);
}
And here are the Objects it returns:
public class Transaction {
public int amount;
public Date transactionDate;
}
public class User {
public String name;
public Transaction[] transactions;
}
First of all, I consider the DAO to be returning a standard Value Object if all it does is CRUD operations.
So now I have modeled by DAO to return something based on a data store relationship. Is this correct? What if I have a more complex object graph?
Update: I guess what I am asking in this part is, should the return value of a DAO, be it VO, DTO, or whatever you want to call it, be modeled after the data store's representation of the data? Or should I, say introduce a new DAO to get a user's transactions and for each user pulled by the UserDAO, invoke a call to the TransactionDAO to get them?
Secondly, let's say I want to perform an aggregation for all of a user's transactions. Using this DAO, I can simply get a user, and in my Service loop though the transactions array and perform the aggregation myself. After all, it's perfectly reasonable to say that such an aggregation is a business rule that belong in the Service.
But what if a user's transactions number in the tens of thousands. That would have a negative impact on application performance. Would it be incorrect to introduce a new method on the DAO that does said aggregation?
Of course this might be making an assumption that the DAO is backed up by a database where I can write a simple SELECT SUM() query. And if the DAO implementation changes to say a flat file or something, I would need to do the aggregation in memory anyway.
So what's the best practice here?

I use the DAO as the translation layer: read the db objects, create the java side business objects and vice versa. Sometimes a couple of calls might be used to store or create a business object. For the provided example, I would make two calls: one for the user info, one for the list of the user's transactions. The cost is an extra database call. I'm not afraid to make an extra call if I'm using connection pooling and I'm not repeating calculations. Separate calls are simpler to use (unpacking an array of composite types from a jdbc call is not simple and typically requires the proprietary connection object) and provide resusable components. Let's say you wanted the user object for a login screen: you can use the same user dao and not have to pull in the transaction stuff.
If you didn't actually want the transaction details but were just interested in the aggregate, I would do the aggregate work on the database side and expose it via a view or a stored procedure. Relational databases are built for and excel at these kinds of set operations. You are unlikely to perform the operations better. Also, there is no point sending all the data over the wire if the result will do. So sure, add another dao for the aggregate if there are times you are only interested in that.
Is it safe to assume the dao maps to a relational db? If that is how you are starting, I would wager that the backing datastore will remain a relational db. Sometimes there is a lot of fuss and worry to keep it generic, and if you can, great. But it seems to me just changing the type of relational db in the back is further than most apps would go (let alone changing to a non-relational store like a flat file).

Related

should dao (or perhaps repository) take id's or entities as arguments

In our code base we make extensive use of DAOs. In essence a layer that exposes a low level read/write api and where each DAO maps to a table in the database.
My question is should the dao's update methods take entity id's or entity references as arguments if we have different kinds of updates on an entity.
For example, say we have customers and adressess. We could have
customer.address = newAddress;
customerDao.updateCustomerAddress(customer);
or we could have
customerDao.updateCustomerAddress(customer.getId(), newAddress);
Which approach would you say is better?
The latter is more convenient since if we have the entity we always have the id, so it will always work. The converse is not always the case though, but would have to be preceded with getting the entity before performing the update.

In DDD we have Aggregates and Repositories. Aggregates ensure that the business invariants hold and Repositories handle the persistence.
I recommend that Aggregates should be pure, with no dependencies to any infrastructure code; that is, Aggregates should not know anything about persistence.
Also, you should use the Ubiquitous language in your domain code. That being said, your code should look like this (in the application layer):
customer = customerRepository.loadById(customerId);
customer.changeAddress(address);
customerRepository.save(customer);

I assume your question is
Which approach of the two is better?
I would prefer the second approach. It states clearly what will be done. The update object will be freshly loaded and it is absolutely clear that only the address will be updated. The first approach leaves room for doubt. What happens if customer.name has a new value aswell? Will it also be update?

What can I use instead of an entity bean?

I would like to write a Java EE app for online learning. I'm thinking of this data representation:
#Entity
public class Course {
private String name;
private String[] instructors;
private String[] teachingAssistants;
private GradeBook gradeBook;
// plenty of methods
}
#Entity
public class GradeBook {
private GradeBookColumn[];
// some methods
}
#Entity
public abstract class GradeBookColumn {
private String name;
private GradeBookColumnType type;
// more methods
}
I would have quite a lot more than just that, but you get the point.
Now, in the EJB 3.2 spec, entity beans were removed and replaced with JPA entities. My question is how to cope with this tragic loss. There are three reasons why serialized JPA entities won't work for me:
Performance. I will need to push the whole entity and all of its data through the net. There is quite a lot of that data, and it could take an unacceptably long time to get it all through.
Security. If all the data in the entity is transferred over the net, then all of the data is accessible to the program that downloaded it. However, I want certain data to only be accessible if the user has sufficient permissions.
Write Access. Once the copy of the data has been downloaded, the client should be able to make changes to it. However, if the changes are made, they won't be persisted to the server. Of course, I could always send the entity back to the server for persisting, but I would have to send all the data through an even slower upstream connection.
So, how do I design this system to meet these requirements without entity beans?

I'm not sure that the loss of entity beans is really tragic, but that's a matter of opinion :)
You seem that have a rich client on the desktop that connects to a remote server. You have two options:
A. You exchange "detached" object graphs between the client and server. The client receives some data, modifies it, then sends it back. The server then "merges" the data it receives. There is one transaction on the server when you load the data, and one when you merge back. To ensure you don't have conflict, you can version the data.
B. You use an "extended persistence context". In that case, the client receives entites that are still "attached" to a session. Modification to the entities on the client side are cached, and will be synchronized when you call a method on the server.
So, regarding the three design issue you face, here is my take on it:
Performance. JPA and other modern ORM rely on laziness to avoid unnecessary data transfer: data is loaded on demand. You can choose which part of the graph can be loaded eagerly or lazily. With option A, you need to make sure that you load all the necessary data before you send them to the client; if the client attempts to access data that aren't loaded, it gets an exception since it's outside of a transaction. With option B, I guess the client can lazy load data anytime (it would be worth double checking that).
Security. The JPA entities should be business objects, not data object. They can encapsulate business methods that do the necessary checks and preserve the desired invariants. In other words: security is not handled at the data level but at the business logic level. This applies for both options A and B.
Write Access. With option A, you need to send back the whole graph and merge it. With option B, the framework should merge the changes that have been cached in a more optimized way.
Conclusions:
Extended persistence contexts have been designed for GUI applications with long units of work. They should in theory solve your problems. In practice, extended persistence contexts have their share of complexitiy though (e.g. needs to use stateful session beans).
The approach to detach and merge the graph is simpler, but raises the issues that you mention in terms of performance.
The third options is to go back to traditional data transfer object (DTO) to optimize performance. In that case the JPA entites stay exclusively on the server side. Instead of transferring JPA entites, you transfer only the subset of the data really needed into DTOs. The drawback is that DTOs will proliferate, and you will have boilerplate code to create DTOs from JPA entites, and update the JPA enties from DTOs.

java classes and database queries

Can someone please explain the best way to solve this problem.
Suppose I have three classes
1.Person
2.Venue
3.Vehicle
I have a DAO method that needs to return some or all of these attributes from each of the classes after doing a query.
How do I accomplish this ? It seems very wrong to make a class PersonVenueVehicle and return that as an object to get the instance field, values.
I was taught that the database entities must be reflected by classes, if this is case how is it implemented in such a situation

Try the Spring-ish solution. Besides your three classes, you can have 3 DAO classes, one for each. But you have a task to perform; I don't know what it is; I'm just going to guess.
Suppose you are running a taxi service; Persons schedule through your company taxis to pick them up at a Venue, and you send them a Vehicle. Call this combination a Trip, and now you want a class that manages Trips in the database. Create a class called TripService. This should use your PersonDao, your VenueDao, and your VehicleDao to create if necessary person and venue records in the database, and should do the calculations needed to choose which Vehicle to use. When it does, it should use a new TripDao to persist a new Trip object. But, as the organizer, it should create and vend the database connection to all the DAOs, and should do the commit or rollback itself.
If you're using Hibernate or JPA, your classes could be modified. But the principle is the same. Even if I have your motivation wrong, you can write a service that coordinates the three DAOs and vends the connection. It can, if it has to, use the connection itself to do a SELECT on the three tables JOINed together.
You lose much of the benefits of a database if the only statements you write are simple SELECTs and UPDATEs and INSERTs

controllers, entity classes or dao - what goes where?

With the introduction of Hibernate in my project, my code started getting really coupled, and boilerplate in many places (and it should be the other way round, right?)
I got pretty confused by a particular example. I've always considered DAO objects to be pretty generic in their nature (mostly encapsulating the basic CRUD oeprations as well as the backend storage implementation)
Unfortunately, as my entity classes started to get more complicated, I started offloading more and more logic to the DAO objects. I have a particular example:
my entity class User should have a relation called friends, which is essentially a collection of users. However, I have to map my class to a collection of UserFriendship objects instead, each of which contains a ref to the friend object, but also other specific friendship data (the date when the friendship occurred)
Now, it is easy to introduce a custom getter in the entity class, which will take the collection of UserFriendship objects and turn it into a collection of User objects instead. However, what if I need only a subset of my friends collection, say, like in paging. I cannot really do that in the entity object, because it doesn't have access to the session, right? This also applies to when I need to make a parametrized query on the relationship. The one that has the access to the session is the UserDAO. So I ended up with this
UserDAO
=> normal CRUD methods
=> getFriends(Integer offset, Integer limit);
=> a bunch of similar getters and setters responsible for managing the relationships within the User instance.
This is insane. But I cannot really do anything else. I am not aware if it is possible to declare computed properties within the entity classes, which could also be parametrized.
I could technically also wrap the DAO within the entity, and put the helper getters and setters back into the entity class, where they should be, but I am not sure whether if that is a good practice as well.
I know that the DAO should only be accessed by the controller object, and it should provide a more or less complete entity object or a set of entity objects.
I am deeply confused. More or less all of my DAO objects now couple logic that should be either in the Entity objects or in the controllers.
I am sorry if my question is a bit confusing. It is a bit hard to formulate it.

My general rules are:
in the entity classes, respect the law of Demeter: don't talk to strangers
the entity classes must not use the session
the controller/service classes must not use the session. They may navigate in the graph of entities and call DAO methods
DAO methods should be the ones using the session. Their work consists in getting, saving, merging entities and executing queries. If several queries or persistence-related actions should be executed for a single use-case, the controller/service should coordinate them, not the DAO.
This way, I can test the business logic relatively easily by mocking the DAOs, and I can test the DAOs relatively easily because they don't contain much logic. Most of the tests verify that the queries find what they're supposed to find, return them in the appropriate order, and initialize the associations that must be initialized (to avoid lazy loading exceptions in the presentation layer, where I'm using detached objects)

How can I resolve the conflict between loose coupling/dependency injection and a rich domain model?

Edit: This is not a conflict on the theoretical level but a conflict on an implementation level.
Another Edit:
The problem is not having domain models as data-only/DTOs versus richer, more complex object map where Order has OrderItems and some calculateTotal logic. The specific problem is when, for example, that Order needs to grab the latest wholesale prices of the OrderItem from some web service in China (for example). So you have some Spring Service running that allows calls to this PriceQuery service in China. Order has calculateTotal which iterates over every OrderItem, gets the latest price, and adds it to the total.
So how would you ensure that every Order has a reference to this PriceQuery service? How would you restore it upon de-serializations, loading from DBs, and fresh instantiations? This is my exact question.
The easy way would be to pass a reference to the calculateTotal method, but what if your Object uses this service internally throughout its lifetime? What if it's used in 10 methods? It gets messy to pass references around every time.
Another way would be to move calculateTotal out of the Order and into the OrderService, but that breaks OO design and we move towards the old "Transaction Script" way of things.
Original post:
Short version:
Rich domain objects require references to many components, but these objects get persisted or serialized, so any references they hold to outside components (Spring beans in this case: services, repositories, anything) are transient and get wiped out. They need to be re-injected when the object is de-serialized or loaded from the DB, but this is extremely ugly and I can't see an elegant way to do it.
Longer version:
For a while now I've practiced loose coupling and DI with the help of Spring. It's helped me a lot in keeping things manageable and testable. A while ago, however, I read Domain-Driven Design and some Martin Fowler. As a result, I've been trying to convert my domain models from simple DTOs (usually simple representations of a table row, just data no logic) into a more rich domain model.
As my domain grows and takes on new responsibilities, my domain objects are starting to require some of the beans (services, repositories, components) that I have in my Spring context. This has quickly become a nightmare and one of the most difficult parts of converting to a rich domain design.
Basically there are points where I am manually injecting a reference to the application context into my domain:
when object is loaded from Repository or other responsible Entity since the component references are transient and obviously don't get persisted
when object is created from Factory since a newly created object lacks the component references
when object is de-serialized in a Quartz job or some other place since the transient component references get wiped
First, it's ugly because I'm passing the object an application context reference and expecting it to pull out by name references to the components it needs. This isn't injection, it's direct pulling.
Second, it's ugly code because in all of those mentioned places I need logic for injecting an appContext
Third, it's error prone because I have to remember to inject in all those places for all those objects, which is harder than it sounds.
There has got to be a better way and I'm hoping you can shed some light on it.

I would venture to say that there are many shades of gray between having an "anemic domain model" and cramming all of your services into your domain objects. And quite often, at least in business domains and in my experience, an object might actually be nothing more than just the data; for example, whenever the operations that can be performed on that particular object depend on multitude of other objects and some localized context, say an address for example.
In my review of the domain-driven literature on the net, I have found a lot of vague ideas and writings, but I was not unable to find a proper, non-trivial example of where the boundaries between methods and operations should lie, and, what's more, how to implement that with current technology stack. So for the purpose of this answer, I will make up a small example to illustrate my points:
Consider the age-old example of Orders and OrderItems. An "anemic" domain model would look something like:
class Order {
Long orderId;
Date orderDate;
Long receivedById; // user which received the order
}
class OrderItem {
Long orderId; // order to which this item belongs
Long productId; // product id
BigDecimal amount;
BigDecimal price;
}
In my opinion, the point of the domain-driven design is to use classes to better model the relationships between entities. So, an non-anemic model would look something like:
class Order {
Long orderId;
Date orderDate;
User receivedBy;
Set<OrderItem> items;
}
class OrderItem {
Order order;
Product product;
BigDecimal amount;
BigDecimal price;
}
Supposedly, you would be using an ORM solution to do the mapping here. In this model, you would be able to write a method such as Order.calculateTotal(), that would sum up all the amount*price for each order item.
So, the model would be rich, in a sense that operations that make sense from a business perspective, like calculateTotal, would be placed in an Order domain object. But, at least in my view, domain-driven design does not mean that the Order should know about your persistence services. That should be done in a separate and independent layer. Persistence operations are not part of the business domain, they are the part of the implementation.
And even in this simple example, there are many pitfalls to consider. Should the entire Product be loaded with each OrderItem? If there is a huge number of order items, and you need a summary report for a huge number of orders, would you be using Java, loading objects in memory and invoking calculateTotal() on each order? Or is an SQL query a much better solution, from every aspect. That is why a decent ORM solution like Hibernate, offers mechanisms for solving precisely these kind of practical problems: lazy-loading with proxies for the former and HQL for the latter. What good would be a theoretically sound model be, if report generation takes ages?
Of course, the entire issue is quite complex, much more that I'm able to write or consider in one sitting. And I'm not speaking from a position of authority, but simple, everyday practice in deploying business apps. Hopefully, you'll get something out of this answer. Feel free to provide some additional details and examples of what you're dealing with...
Edit: Regarding the PriceQuery service, and the example of sending an email after the total has been calculated, I would make a distinction between:
the fact that an email should be sent after price calculation
what part of an order should be sent? (this could also include, say, email templates)
the actual method of sending an email
Furthermore, one has to wonder, is sending of an email an inherent ability of an Order, or yet another thing that can be done with it, like persisting it, serialization to different formats (XML, CSV, Excel) etc.
What I would do, and what I consider a good OOP approach is the following. Define an interface encapsulating operations of preparing and sending an email:
interface EmailSender {
public void setSubject(String subject);
public void addRecipient(String address, RecipientType type);
public void setMessageBody(String body);
public void send();
}
Now, inside Order class, define an operation by which an order "knows" how to send itself as an email, using an email sender:
class Order {
...
public void sendTotalEmail(EmailSender sender) {
sender.setSubject("Order " + this.orderId);
sender.addRecipient(receivedBy.getEmailAddress(), RecipientType.TO);
sender.addRecipient(receivedBy.getSupervisor().getEmailAddress(), RecipientType.BCC);
sender.setMessageBody("Order total is: " + calculateTotal());
sender.send();
}
Finally, you should have a facade towards your application operations, a point where the actual response to user action happens. In my opinion, this is where you should obtain (by Spring DI) the actual implementations of services. This can, for example, be the Spring MVC Controller class:
public class OrderEmailController extends BaseFormController {
// injected by Spring
private OrderManager orderManager; // persistence
private EmailSender emailSender; // actual sending of email
public ModelAndView processFormSubmission(HttpServletRequest request,
HttpServletResponse response, ...) {
String id = request.getParameter("id");
Order order = orderManager.getOrder(id);
order.sendTotalEmail(emailSender);
return new ModelAndView(...);
}
Here's what you get with this approach:
domain objects don't contain services, they use them
domain objects are decoupled from actual service implementation (e.g. SMTP, sending in separate thread etc.), by the nature of the interface mechanism
services interfaces are generic, reusable, but don't know about any actual domain objects. For example, if order gets an extra field, you need change only the Order class.
you can mock services easily, and test domain objects easily
you can test actual services implementations easily
I don't know if this is by standards of certain gurus, but it a down-to-earth approach that works reasonably well in practice.

Regardinig
What if your Order needs to send out
an e-mail every time the total is
calculated?
I would employ events.
If it has some meaning for you when an order computes its total, let it raise an event as eventDispatcher.raiseEvent(new ComputedTotalEvent(this)).
Then you listen for this type of events, and callback your order as said before to let it format an email template, and you send it.
Your domain objects remains lean, with no knowledge about this your requirement.
In short, split your problem into 2 requirements:
- I want to know when an order computes its total;
- I want to send an email when an order has a (new and different) total;

I've found the answer, at least for those using Spring:
6.8.1. Using AspectJ to dependency inject domain objects with Spring

The simplest approach that I can think is to add some logic into your data access layer that will inject a domain object with its dependencies before returning it to a higher layer (usually called the service layer). You could annotate each class's properties to indicate what needs to get wired up. If you're not on Java 5+, you could implement an interface for each component that needs to be injected, or even declare this all in XML and feed that data to the context that will do the wiring. If you wanted to get fancy, you could pull this out into an aspect and apply it globally across your data access layer so all methods that pull out domain objects will wire up them up just after they are returned.

Perhaps what you want is a kind on reference object, that would serialize as a global reference (an URI for instance) and that would be able to resurrect as a proxy when de-serialized elsewhere.

The Identity Map pattern may help with your scenario. Check the article Patterns In Practice written by Jeremy Miller where he discuss about this pattern.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.