Difference between Repository and DAO design patterns [duplicate]

Difference between Repository and DAO design patterns [duplicate] - java

What is the difference between Data Access Objects (DAO) and Repository patterns? I am developing an application using Enterprise Java Beans (EJB3), Hibernate ORM as infrastructure, and Domain-Driven Design (DDD) and Test-Driven Development (TDD) as design techniques.

DAO is an abstraction of data persistence.
Repository is an abstraction of a collection of objects.
DAO would be considered closer to the database, often table-centric.
Repository would be considered closer to the Domain, dealing only in Aggregate Roots.
Repository could be implemented using DAO's, but you wouldn't do the opposite.
Also, a Repository is generally a narrower interface. It should be simply a collection of objects, with a Get(id), Find(ISpecification), Add(Entity).
A method like Update is appropriate on a DAO, but not a Repository - when using a Repository, changes to entities would usually be tracked by separate UnitOfWork.
It does seem common to see implementations called a Repository that is really more of a DAO, and hence I think there is some confusion about the difference between them.

OK, think I can explain better what I've put in comments :).
So, basically, you can see both those as the same, though DAO is a more flexible pattern than Repository. If you want to use both, you would use the Repository in your DAO-s. I'll explain each of them below:
REPOSITORY:
It's a repository of a specific type of objects - it allows you to search for a specific type of objects as well as store them. Usually it will ONLY handle one type of objects. E.g. AppleRepository would allow you to do AppleRepository.findAll(criteria) or AppleRepository.save(juicyApple).
Note that the Repository is using Domain Model terms (not DB terms - nothing related to how data is persisted anywhere).
A repository will most likely store all data in the same table, whereas the pattern doesn't require that. The fact that it only handles one type of data though, makes it logically connected to one main table (if used for DB persistence).
DAO - data access object (in other words - object used to access data)
A DAO is a class that locates data for you (it is mostly a finder, but it's commonly used to also store the data). The pattern doesn't restrict you to store data of the same type, thus you can easily have a DAO that locates/stores related objects.
E.g. you can easily have UserDao that exposes methods like
Collection<Permission> findPermissionsForUser(String userId)
User findUser(String userId)
Collection<User> findUsersForPermission(Permission permission)
All those are related to User (and security) and can be specified under then same DAO. This is not the case for Repository.
Finally
Note that both patterns really mean the same (they store data and they abstract the access to it and they are both expressed closer to the domain model and hardly contain any DB reference), but the way they are used can be slightly different, DAO being a bit more flexible/generic, while Repository is a bit more specific and restrictive to a type only.

DAO and Repository pattern are ways of implementing Data Access Layer (DAL). So, let's start with DAL, first.
Object-oriented applications that access a database, must have some logic to handle database access. In order to keep the code clean and modular, it is recommended that database access logic should be isolated into a separate module. In layered architecture, this module is DAL.
So far, we haven't talked about any particular implementation: only a general principle that putting database access logic in a separate module.
Now, how we can implement this principle? Well, one know way of implementing this, in particular with frameworks like Hibernate, is the DAO pattern.
DAO pattern is a way of generating DAL, where typically, each domain entity has its own DAO. For example, User and UserDao, Appointment and AppointmentDao, etc. An example of DAO with Hibernate: http://gochev.blogspot.ca/2009/08/hibernate-generic-dao.html.
Then what is Repository pattern? Like DAO, Repository pattern is also a way achieving DAL. The main point in Repository pattern is that, from the client/user perspective, it should look or behave as a collection. What is meant by behaving like a collection is not that it has to be instantiated like Collection collection = new SomeCollection(). Instead, it means that it should support operations such as add, remove, contains, etc. This is the essence of Repository pattern.
In practice, for example in the case of using Hibernate, Repository pattern is realized with DAO. That is an instance of DAL can be both at the same an instance of DAO pattern and Repository pattern.
Repository pattern is not necessarily something that one builds on top of DAO (as some may suggest). If DAOs are designed with an interface that supports the above-mentioned operations, then it is an instance of Repository pattern. Think about it, If DAOs already provide a collection-like set of operations, then what is the need for an extra layer on top of it?

Frankly, this looks like a semantic distinction, not a technical distinction. The phrase Data Access Object doesn't refer to a "database" at all. And, although you could design it to be database-centric, I think most people would consider doing so a design flaw.
The purpose of the DAO is to hide the implementation details of the data access mechanism. How is the Repository pattern different? As far as I can tell, it isn't. Saying a Repository is different to a DAO because you're dealing with/return a collection of objects can't be right; DAOs can also return collections of objects.
Everything I've read about the repository pattern seems rely on this distinction: bad DAO design vs good DAO design (aka repository design pattern).

Repository is more abstract domain oriented term that is part of Domain Driven Design, it is part of your domain design and a common language, DAO is a technical abstraction for data access technology, repository is concerns only with managing existing data and factories for creation of data.
check these links:
http://warren.mayocchi.com/2006/07/27/repository-or-dao/
http://fabiomaulo.blogspot.com/2009/09/repository-or-dao-repository.html

A DAO allows for a simpler way to get data from storage, hiding the ugly queries.
Repository deals with data too and hides queries and all that but, a repository deals with business/domain objects.
A repository will use a DAO to get the data from the storage and uses that data to restore a business object.
For example, A DAO can contain some methods like that -
public abstract class MangoDAO{
abstract List<Mango>> getAllMangoes();
abstract Mango getMangoByID(long mangoID);
}
And a Repository can contain some method like that -
public abstract class MangoRepository{
MangoDao mangoDao = new MangDao;
Mango getExportQualityMango(){
for(Mango mango:mangoDao.getAllMangoes()){
/*Here some business logics are being applied.*/
if(mango.isSkinFresh()&&mangoIsLarge(){
mango.setDetails("It is an export quality mango");
return mango;
}
}
}
}
This tutorial helped me to get the main concept easily.

The key difference is that a repository handles the access to the aggregate roots in a an aggregate, while DAO handles the access to entities. Therefore, it's common that a repository delegates the actual persistence of the aggregate roots to a DAO. Additionally, as the aggregate root must handle the access of the other entities, then it may need to delegate this access to other DAOs.

Repository are nothing but well-designed DAO.
ORM are table centric but not DAO.
There's no need to use several DAO in repository since DAO itself can do exactly the same with ORM repositories/entities or any DAL provider, no matter where and how a car is persisted 1 table, 2 tables, n tables, half a table, a web service, a table and a web service etc.
Services uses several DAO/repositories.
My own DAO, let's say CarDao only deal with Car DTO,I mean, only take Car DTO in input and only return car DTO or car DTO collections in output.
So just like Repository, DAO actually is an IoC, for the business logic, allowing persitence interfaces not be be intimidated by persitence strategies or legacies.
DAO both encapsulates the persistence strategy and does provide the domaine-related persitence interface.
Repository is just an another word for those who had not understood what a well-defined DAO actualy was.

DAO provides abstraction on database/data files or any other persistence mechanism so that, persistence layer could be manipulated without knowing its implementation details.
Whereas in Repository classes, multiple DAO classes can be used inside a single Repository method to get an operation done from "app perspective". So, instead of using multiple DAO at Domain layer, use repository to get it done.
Repository is a layer which may contain some application logic like: If data is available in in-memory cache then fetch it from cache otherwise, fetch data from network and store it in in-memory cache for next time retrieval.

Try to find out if DAO or the Repository pattern is most applicable to the following situation :
Imagine you would like to provide a uniform data access API for a persistent mechanism to various types of data sources such as RDBMS, LDAP, OODB, XML repositories and flat files.
Also refer to the following links as well, if interested:
http://www.codeinsanity.com/2008/08/repository-pattern.html
http://blog.fedecarg.com/2009/03/15/domain-driven-design-the-repository/
http://devlicio.us/blogs/casey/archive/2009/02/20/ddd-the-repository-pattern.aspx
http://en.wikipedia.org/wiki/Domain-driven_design
http://msdn.microsoft.com/en-us/magazine/dd419654.aspx

Per Spring documentation there is no clear difference:
The #Repository annotation is a marker for any class that fulfills the
role or stereotype of a repository (also known as Data Access Object
or DAO).

DAOs might not always explicitly be related to Only DataBase,
It can be just an Interface to access Data, Data in this context might be accessed from DB/Cache or even REST (not so common these days, Since we can easily separate them in their respective Rest/IPC Clients),
Repo here in this approach be implemented by any of the ORM Solutions, If the underlying Cache/Repo Changes it'll not be propagated/impacts Service/Business Layers.
DAOs can accept/return Domain Types. Consider for a Student Domain, the associated DAO class will be StudentDao
StudentDao {
StudentRepository,
StudentCache,
Optional<Student> getStudent(Id){
// Use StudentRepository/StudentCache to Talk to DD & Cache
// Cache Type can be the same as Domain Type, DB Type(Entities) should be a Same/Different Type.
}
Student updateStudent(Student){
// Use StudentRepository/StudentCache to Talk to DD & Cache
// Cache Type can be the same as Domain Type, DB Type(Entities) should be a Same/Different Type.
}
}
DAOs can accept/return SubDomain Types. Consider a Student Domain, that has Sub Domain, say Attendance / Subject that will have a DAO class StudentDao,
StudentDao {
StudentRepository, SubjectRepository, AttendanceRepository
StudentCache, SubjectCache, AttendanceCache
Set<Subject> getStudentSubject(Id){
// Use SubjectRepository/SubjectCache to Talk to DD & Cache
}
Student addNewSubjectToStudent(ID, Subject){
// Use SubjectRepository/SubjectCache to Talk to DD & Cache
}
}

If we consider the original definitions of both design patterns DAO and Repository appear very similar. The main difference is the dictionary and the source from which they came (Oracle vs. Fowler).
Quote:
DAO - "separates a data resource's client interface from its data access mechanisms" and "The DAO implements the access mechanism required to work with the data source. The data source could be a persistent store like an RDBMS, an external service like a B2B exchange, a repository like an LDAP database, or a business service accessed via CORBA Internet Inter-ORB Protocol (IIOP) or low-level sockets."
Repository - "Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects." and "Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers."
Based on these citations, both design patterns mediate communication between the domain layer and the data layer. Moreover, Repository is associated with ORM and on the contrary DAO is a more generic interface for accessing data from anywhere.

in a very simple sentence: The significant difference being
that Repositories represent collections, whilst DAOs are closer to the database, often being far more
table-centric.

Related

How to write DAO for JPA having One entity with One to many relationship

I am writing DAO for eg I have Employee and Address entity
Where One employee can have multiple address.
Now my EmployeeDao have CRUD operation, I have a getEmployeeById method which takes String as input and returns EmployeeDto ( note Dto ), now in some case I need only EmployeeDto, and in some I need EmployeeDto with AddressDto as well.
My question is should I write AddreddEntity to AddressDto conversion in EmployeeDao's getEmployeeById method, is this the right way ? is there better approach to solve such kind of real world problem?
I am using JPA

I would say there should be service layer which will process your JPA entity and and provide you the Dto.
A DAO should provide access to a single related source of data and its methods should reflect the database somewhat closely.
A Service can provide a higher level interface to process your business objects, it can interact with different databases (and different DAO's). It may have certain business logic that converts several data objects into a single, robust, business object.

DDD When should I create a domain object and a persistence object instead of using a persistence object as a domain object?

As I work with my understanding of Domain Driven Design I find I have a rule that seems to work, though I would like to see if it is overkill and also would like to see other perspectives of the same situation.
My question is this: "When should the domain model and persistence model be contained in separate objects?"
My language of choice is Java at the moment and I am using Spring Data's repository model.
I see three main answers to my question.
Always use separate domain objects from persistence objects.
Use separate domain objects only when putting domain methods (behaviors) on persistence objects is not practical.
Use persistence objects as domain objects in all cases.
In order to ask questions about DDD I find that I have to use an example bounded context since I don't yet know enough about DDD to ask in a more abstract way.
Here is my illustrative bounded context: say I have a law codification system with the following business rules:
Each law on the books must be classified.
Each law has an identifier with two parts, a codification number prefix and a codification coassign suffix. (Example: 100-0100, 599-2030).
There are multiple legal jurisdictions that are using the law codification system and they should be able to make their own coassigns but the codification prefixes are global and must be the same across all jurisdictions to facilitate general comparability.
the codification number prefixes are grouped into broad codification categories. Codification categories have a number range, such as 100-199, 200-299, 700-799, etc.
To express this bounded context as a persistence model I have the following:
table: codification
fields: chart_code, prefix, coassign, codification_category
table: codification_chart
fields: chart_code, jurisdiction_description
table: codification_category
fields: category, low_category_number, high_category_number, description
table: global_codification
fields: prefix, coassign, codification_category
I know, I should be starting from the domain model first. I have a persistence model and a domain model
In my domain model I have three domain objects
public Codification {
private String prefix, coassign;
codificationCategory codificationCaegory; // an enum type
public Codification(...) { // set private vars }
// getters for private variables
}
public CodificationChart {
private List<Codification> chartCodifications = new ArrayList<>();
private String chartCode;
// public constructor to initialize private variables
// getters for private variables
public Codification addCodificationToChart(Codification){...}
public void removeCodificationFromChart(Codification){...}
public boolean checkCodificationInChart(Codification){...}
}
public enum CodificationCategory {
CIVIL, CRIMINAL, PROPERTY, CORPORATE, FAMILY, CONSUMER, ETHICS, BANKRUPTCY;
}
ORM Objects:
JPA Mappings of the tables mentioned earlier with the "Entity" suffix added to their table names.
They are omitted for brevity.
Each one contains getters and setters like JPA Pojos do.
If someone asks for the Persistence objects code I will post it.
The only point at which my domain objects know about the persistence model is in my factory object CodificationChartFactory, which has the repository interfaces I am using to interact with the ORM objects mentioned earlier. This factory is the only part of the domain that uses the persistence repositories, thus the only part that interacts with the persistence layer.
Is creating a separate domain model here wasteful effort?
I can see how it is possible for me to put my CodificationChart behaviors on a Persistence object. It just somehow feels wrong to put those behaviors on a persistence object who's only job is to retrieve a record from the database.
I definitely stand to be corrected.

Both approaches are correct and are a matter of taste from a design point of view. Some people don't want their domain object to have absolutely anything to do with persistence and do create an extra layer of Entity objects... some people don't think this is a major problem and are happy to go ahead and use the domain objects as the persistence objects.
Personally (and subjectively), I think that using JPA and have an extra layer of Entity objects is the wrong approach. The aim of ORMs like Hibernate is to be a bridge between Object and Relational models (I know it's in the name :). I think a way better approach, in the case one wants to keep things separated, is to use something like mybatis or plain SQL, but definitely not JPA... otherwise it's just adding complexity for the sake of complexity (JPA is not the easiest framework to learn)
I'm happy to live with the mix and annotate my domain objects. As I know it makes the persistence easier to manage... but at the same time, I feel very comfortable with Hibernate/JPA and been using it for 10 years :).
I had a very similar question 3 years ago, which I posted on programmers site - Do ORMs enable the creation of rich domain models?

Naming convention when using hibernate

Me and my team are building java EE app as a school project and we've decided to use hibernate. We also want to make the whole project as nice and clean as possible, so we're trying to follow recommended conventions. Nevertheless I wasn't able to find out, what are the conventions for hibernate files.
I.E. I've got a folder /cz/fit/cvut/nameofmyproject/ and there I've got packages controllers, models, utils. In controllers I've got Spring controllers, in models I want to have models for my entities and in utils I've got SessionFactory for hibernate. And now my question:
How shall I name classes in model package? Should it be MyEntityNameDTO, or did I misunderstand the meaning of the DTO and should I just name them MyEntityNameModel? And what should be the proper name for the folder for my DAO classes? Will this simple division be enough for a middle-size project (max ~20 classes/folder) or would it be too confusing? Thanks for any tips from praxis :)

DTO stands for Data Transfer Object. A DTO is a class which is more a data structure than a real class, usually, and which is created to transfer information from one layer to another, often across the network. It's not a model entity.
A DTO is often used
when serializing real model objects is not paractical (because the structure doesn't fit, or because the receiver doesn't have access to Hibernate classes, or because lazy-loaded entities are a problem)
when you want to transfer information that is an aggregation, or a complex view, over your model objects (like data of a statistical report for example)
So naming your entities DTO is not a good idea. DTOs and entities are different things. The Model suffix is also cumbersome. Entities are usually named after what they represent: Customer, Company, Player, Order, etc.
Segregating classes based on their technical role is an often used solution. But it tends not to scale when the application grows. I usually have a first level of segregation based on a functional aspect (like customer management, order management, security, etc.), and then a second level based on technical aspects (service, dao, model, etc.)

UserDAO - interface
UserDAOImpl - implements UserDAO
That is generally what I use. Sometimes the Default prefix like DefaultUserDAO might make more sense if you're creating an interface that you expect others to implement but you're providing the reference implementation.
Most of the time I feel those two can be used interchangeably but in some situations one provides a little more clarity than the other.
There are two conventions that I've seen:
FooDao for the interface and FooDaoImpl for the implementation
IFooDao for the interface and FooDao for the implementation
The former has its roots in CORBA; the latter is a Microsoft COM/.NET convention. (Thanks to Pascal for the correction.)
Hibernate provides the Naming Strategy interface to be implemented by the implementation.
I am listing here few methods.
String classToTableName(String className) – should return the table name for an entity class.
String columnName(String columnName) – handle to alter the column name specified in the mapping document.
String tableName(String tableName) – handle to alter the column name specified in the mapping document.
String propertyToColumnName(String propertyName) – handle to map property name to column name.

How do you keep clean layers separation with Hibernate/ORM?

How is it possible to keep clean layers with Hibernate/ORM (or other ORMs...)?
What I mean by clean layer separation is for exemple to keep all of the Hibernate stuff in the DAO layer.
For example, when creating a big CSV export stream, we should often do some Hibernate operations like evict to avoid OutOfMemory... The filling of the outputstream belong to the view, but the evict belongs to the DAO.
What I mean is that we are not supposed to put evict operations in the frontend / service, and neither we are supposed to put business logic in the DAO... Thus what can we do in such situations?
There are many cases where you have to do some stuff like evict, flush, clear, refresh, particularly when you play a bit with transactions, large data or things like that...
So how do you do to keep clear layers separation with an ORM tool like Hibernate?
Edit: something I don't like either at work is that we have a custom abstract DAO that permits a service to give an Hibernate criterion as an argument. This is practical, but for me in theory a service that calls this DAO shouldn't be aware of a criterion. I mean, we shouldn't have in any way to import Hibernate stuff into the business / view logic.
Is there an answer, simple or otherwise?

If by "clean" you mean that upper layers don't know about implementations of the lower layers, you can usually apply the
Tell, don't ask principle. For your CSV streaming example, it would be something like, say:
// This is a "global" API (meaning it is visible to all layers). This is ok as
// it is a specification and not an implementation.
public interface FooWriter {
void write(Foo foo);
}
// DAO layer
public class FooDaoImpl {
...
public void streamBigQueryTo(FooWriter fooWriter, ...) {
...
for (Foo foo: executeQueryThatReturnsLotsOfFoos(...)) {
fooWriter.write(foo);
evict(foo);
}
}
...
}
// UI layer
public class FooUI {
...
public void dumpCsv(...) {
...
fooBusiness.streamBigQueryTo(new CsvFooWriter(request.getOutputStream()), ...);
...
}
}
// Business layer
public class FooBusinessImpl {
...
public void streamBigQueryTo(FooWriter fooWriter, ...) {
...
if (user.canQueryFoos()) {
beginTransaction();
fooDao.streamBigQueryTo(fooWriter, ...);
auditAccess(...);
endTransaction();
}
...
}
}
In this way you can deal with your specific ORM with freedom. The downside of this "callback" approach: if your layers are on different JVMs then it might not be very workable (in the example you would need to be able to serialize CsvFooWriter).
About generic DAOs: I have never felt the need, most object access patterns I have found are different enough to make an specific implementation desirable. But certainly doing layer separation and forcing the business layer to create Hibernate criteria are contradictory paths. I would specify a different query method in the DAO layer for each different query, and then I would let the DAO implementation get the results in whatever way it might choose (criteria, query language, raw SQL, ...). So instead of:
public class FooDaoImpl extends AbstractDao<Foo> {
...
public Collection<Foo> getByCriteria(Criteria criteria) {
...
}
}
public class FooBusinessImpl {
...
public void doSomethingWithFoosBetween(Date from, Date to) {
...
Criteria criteria = ...;
// Build your criteria to get only foos between from and to
Collection<Foo> foos = fooDaoImpl.getByCriteria(criteria);
...
}
public void doSomethingWithActiveFoos() {
...
Criteria criteria = ...;
// Build your criteria to filter out passive foos
Collection<Foo> foos = fooDaoImpl.getByCriteria(criteria);
...
}
...
}
I would do:
public class FooDaoImpl {
...
public Collection<Foo> getFoosBetween(Date from ,Date to) {
// build and execute query according to from and to
}
public Collection<Foo> getActiveFoos() {
// build and execute query to get active foos
}
}
public class FooBusinessImpl {
...
public void doSomethingWithFoosBetween(Date from, Date to) {
...
Collection<Foo> foos = fooDaoImpl.getFoosBetween(from, to);
...
}
public void doSomethingWithActiveFoos() {
...
Collection<Foo> foos = fooDaoImpl.getActiveFoos();
...
}
...
}
Though someone could think that I'm pushing some business logic down to the DAO layer, it seems a better approach to me: changing the ORM implementation to an alternative one would be easier this way. Imagine, for example that for performance reasons you need to read Foos using raw JDBC to access some vendor-specific extension: with the generic DAO approach you would need to change both the business and DAO layers. With this approach you would just reimplement the DAO layer.

Well, you can always tell your DAO layer to do what it needs to do when you want to. Having a method like cleanUpDatasourceCache in your DAO layer, or something similar (or even a set of these methods for different objects), is not bad practice to me.
And your service layer is then able to call that method without any assumption on what is done by the DAO under the hood. A specific implementation which uses direct JDBC calls would do nothing in that method.

Usually a DAO layer to wrap the data access logic is necessary. Other times is just the EntityManager what you want to use for CRUD operations, for those cases, I wouldn't use a DAO as it would add unnecessary complexity to the code.
How should EntityManager be used in a nicely decoupled service layer and data access layer?

If you don't want to tie your code to Hibernate you can use Hibernate through JPA instead and not bother too much about abstracting everything within your DAOs. You are less likely to switch from JPA to something else than replacing Hibernate.

my 2 cents: i think the layer separation pattern is great as a starting point for most cases, but there is a point where we have to analyze each specific application case by case and design a more flexible solution. what i mean is, ask yourself for example:
is your DAO expected to be reused in another context other than
exporting csv data?
does it make sense to have another implementation of the same DAO
interface without hibernate ?
if both answers were no, maybe a little bit of coupling between persistence and data presentation is ok. i like the callback solution proposed above.
IMHO sometimes strict implementation of a pattern has a higher cost in readability, mantainability, etc. which are the very issues we were trying to fix by adopting a pattern in the first place

you can achieve layer separation by implementing DAO pattern and and doing all hibernate/JDBC/JPA related stuff in Dao itself
for eg:
you can specify a Generic Dao interface as
public interface GenericDao <T, PK extends Serializable> {
/** Persist the newInstance object into database */
PK create(T newInstance);
/** Retrieve an object that was previously persisted to the database using
* the indicated id as primary key
*/
T read(PK id);
/** Save changes made to a persistent object. */
void update(T transientObject);
/** Remove an object from persistent storage in the database */
void delete(T persistentObject);
}
and its implementaion as
public class GenericDaoHibernateImpl <T, PK extends Serializable>
implements GenericDao<T, PK>, FinderExecutor {
private Class<T> type;
public GenericDaoHibernateImpl(Class<T> type) {
this.type = type;
}
public PK create(T o) {
return (PK) getSession().save(o);
}
public T read(PK id) {
return (T) getSession().get(type, id);
}
public void update(T o) {
getSession().update(o);
}
public void delete(T o) {
getSession().delete(o);
}
}
so whenever service classes calls any method on any Dao without any assumption of the internal implementation of the method
have a look at the GenericDao link

Hibernate (either as a SessionManager or a JPA EntityManager) is the DAO. The Repository pattern is, as far as I have seen, the best starting place. There is a great image over at the DDD Sample Website which I think speaks volumes about how you keep things things separate.
My application layer has interfaces that are explicit business actions or values. The business rules are in the domain model and things like Hibernate live in the infrastructure. Services are defined at the domain layer as interfaces, and implemented in the infrastructure in my case. This means that for a given Foo domain object (an aggregate root in the DDD terminology) I usually get the Foo from a FooService and the FooService talks to a FooRepository which allows one to find a Foo based on some criteria. That criteria is expressed via method parameters (possibly complex object types) which at the implementation side, for example in a HibernateFooRepository, would be translated in to HQL or Hibernate criterion.
If you need batch processing, it should exist at the application level and use domain services to facilitate this. StartBatchTransaction/EndBatchTransaction. Hibernate may listen to start/end events in order to coordinate purging, loading, whatever.
In the specific case of serializing domain entities, though, I see nothing wrong with taking a set of criteria and iterating over them one at a time (from root entities).
I find that often, in the pursuit of separation, we often try to make things completely general. They are not one in the same - your application has to do something, and that something can and should be expressed rather explicitly.
If you can substitute an InMemoryFooRepository where a HibernateFooRepository was previously being used, you're on the right path. The natural flow through unit and integration testing your objects encourages this when you adhere or at least try to respect the layering outlined in the image I linked above.

You got some good answers here, I would like to add my thoughts on this (by the way, this is something to take care of in our code as well) I would also like to focus on the issue of having Hibernate annotations/JPA annotations on entities that you might need to use outside of your DAL (i.e - at business logic, or even send to your client side) -
A. If you use the GenericDAO pattern for a given entity, you may find your entity being annotated with Hibernate (or maybe JPA annotation) such as #Table, #ManyToOne and so on - this means that you client code may contain Hibernate/JPA annotations and you would require an appropriate jar to get it compiled, or have some other support at your client code this is for example if you use GWT as your client (which can have support for JPA annotations in order to get entities compiled), and share the entities between the server and the client code, or if you write a Java client that performs a bean lookup using InitialContext against a Java application server (in this case you will need a JAR
B. Another approach that you can have is work with Hibernate/JPA annotated code at server side, and expose Web Services (let's say RESTFul web service or SOAP) - this way, the client works with an "interface" that does not expose knowledge on Hibernate/JPA (for example - a WSDL in case of SOAP defines the contract between the client of the service and the service itself). By breaking the architecture to service oriented one, you get all kinds of benefits such as loose coupling, ease of replacement of pieces of code, and you can concentrate all the DAL logic in one service that serves the rest of your services, and later own replace the DAL if needed by another service.
C. You can use an "object to object" mapping framework such as dozer to map objects of classes with Hibernate/JPA annotations to what I call "true" POJOs - i.e - java beans with no annotations whatsoever on them.
D. Finally regarding annotations - why use annotations at all? Hibernate uses hbm xml files an alternative for doing the "ORM magic" - this way your classes can remain without annotations.
E. One last point - I would like to suggest you look at the stuff we did at Ovirt - you can dowload the code by git clone our repo. You will find there under engine/backend/manager/modules/bll - a maven project holding our bll logic, and under engine/backend/manager/moduled/dal - our DAL layer (although currently implemented with Spring-JDBC, and some hibernate experiments, you will get some good ideas on how to use it in your code. I would like to add that if you go for a similar solution, I suggest that you inject the DAOs in your code, and not hold them in a Singletone like we did with getXXXDao methods (this is legacy code we should strive to remove and move to injections).

I would recommend you let the database handle the export to CSV operation rather than building it yourself in Java, it isn't as efficient. ORM shouldn't really be used for those large scale batch operations, because ORM should only be used to manipulate transactional data.
Large scale Java batch operations should really be done by JDBC directly with transactional support turned off.
However, if you do this regularly, I recommend setting up a reporting database which is a delayed replica of the database that is not used by the application and utilizes database specific replication tools that may come with your database.
Your solution architect should be able to work with the other groups to help set this up for you.
If you really have to do it in the application tier, then using raw JDBC calls may be the better option. With raw JDBC you can perform a query to assemble the data that you require on the database side and fetch the data one row at a time then write to your output stream.
To answer your layers question. Though I don't like using the word layers because it usually implies one thing on top of another. I would rather use the word "components" and I have the following component groups.
application
domain - just annotated JPA classes, no persistence logic, usually a plain JAR file, but I recommend just plop it as a package in the EJB rather than having to deal with class path issues
contracts - WSDL and XSD files that define an interface between different components be it web services or just UI.
transaction scripts - Stateless EJBs that would have a transaction and persistence units injected into them and do the manipulation and persistence of the domain objects. These may implement the interfaces generated by the contracts.
UI - a separate WAR project with EJBs injected into them.
database
O/R diagram - this is the contract that is agreed upon by application and data team to ensure THE MINIMUM that the database will provide. It does not have to show everything.
DDLs - this is the database side implementation of the O/R diagram which will contain everything, but generally no one should care because it implementation details.
batch - batch operations such as export or replicate
reporting - provides queries to get business value reports from the system.
legacy
messaging contracts - these are contracts used by messaging systems such as JMS or WS-Notifications or standard web services.
their implementation
transformation scripts - used to transform one contract to another.

It seems to me we need to take another look at the layers.
(I hope someone corrects me if I get this wrong.)
Front End/UI
Business
Service/DAO
So for the case of Generating a Report, THe layers break down like so.
Front End/UI
will have a UI with a button "Get Some Report"
the button will then call the Business layer that knows what the report is about.
The data returned by the report generator is given any final formatting before being returned to the user.
Business
MyReportGenerator.GenerateReportData() or similar will be called
Service/DAO
inside of the report generator DAOs will be used. DAOLocator.GetDAO(Entity.class); or similar factory type methods would be used to get the DAOs. the returned DAOs will extend a Common DAO interface

Well, to get a clean separation of concern or you can say clean layer separation you can add Service layer to your application, which lies between you FrontEnd and DaoLayer.
You can put your business logic in Service layer and database related things in Dao layer using Hibernate.
So if you need to change something in your business logic, you can edit your service layer without changing the DAO and if you want to change the Dao layer, you can do without changing actual business logic i.e. Service Layer.

Is a DAO Only Meant to Access Databases?

I have been brushing up on my design patterns and came across a thought that I could not find a good answer for anywhere. So maybe someone with more experience can help me out.
Is the DAO pattern only meant to be used to access data in a database?
Most the answers I found imply yes; in fact most that talk or write on the DAO pattern tend to automatically assume that you are working with some kind of database.
I disagree though. I could have a DAO like follows:
public interface CountryData {
public List<Country> getByCriteria(Criteria criteria);
}
public final class SQLCountryData implements CountryData {
public List<Country> getByCriteria(Criteria criteria) {
// Get From SQL Database.
}
}
public final class GraphCountryData implements CountryData {
public List<Country> getByCriteria(Criteria criteria) {
// Get From an Injected In-Memory Graph Data Structure.
}
}
Here I have a DAO interface and 2 implementations, one that works with an SQL database and one that works with say an in-memory graph data structure. Is this correct? Or is the graph implementation meant to be created in some other kind of layer?
And if it is correct, what is the best way to abstract implementation specific details that are required by each DAO implementation?
For example, take the Criteria Class I reference above. Suppose it is like this:
public final class Criteria {
private String countryName;
public String getCountryName() {
return this.countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
}
For the SQLCountryData, it needs to somehow map the countryName property to an SQL identifier so that it can generate the proper SQL. For the GraphCountryData, perhaps some sort of Predicate Object against the countryName property needs to be created to filter out vertices from the graph that fail.
What's the best way to abstract details like this without coupling client code working against the abstract CountryData with implementation specific details like this?
Any thoughts?
EDIT:
The example I included of the Criteria Class is simple enough, but consider if I want to allow the client to construct complex criterias, where they should not only specify the property to filter on, but also the equality operator, logical operators for compound criterias, and the value.

DAO's are part of the DAL (Data Access Layer) and you can have data backed by any kind of implementation (XML, RDBMS etc.). You just need to ensure that the project instance is injected/used at runtime. DI frameworks like Spring/Guice shine in this case. Also, your Criteria interface/implementation should be generic enough so that only business details are captured (i.e country name criteria) and the actual mapping is again handled by the implementation class.
For SQL, in your case, either you can hand generate SQL, generate it using a helper library like Spring or use a full fledged framework like MyBatis. In our project, Spring XML configuration files were used to decouple the client and the implementation; it might vary in your case.
EDIT: I see that you have raised a similar concern in the previous question. The answer still remains the same. You can add as much flexibility as you want in your interface; you just need to ensure that the implementation is smart enough to make sense of all the arguments it receives and maps them appropriately to the underlying source. In our case, we retrieved the value object from the business layer and converted it to a map in the SQL implementation layer which can be used by MyBatis. Again, this process was pretty much transparent and the only way for the service layer to communicate with DAO was via the interface defined value objects.

No, I don't believe it's tied to only databases. The acronym is for Data Access Object, not "Database Access Object" so it can be usable with any type of data source.
The whole point of it is to separate the application from the backing data store so that the store can be modified at will, provided it still follows the same rules.
That doesn't just mean turfing Oracle and putting in DB2. It could also mean switching to a totally non-DBMS-based solution.

ok this is a bit philosophical question, so I'll tell what I'm thinking about it.
DAO usually stands for Data Access Object. Here the source of data is not always Data Base, although in real world, implementations are usually come to this.
It can be XML, text file, some remote system, or, like you stated in-memory graph of objects.
From what I've seen in real-world project, yes, you right, you should provide different DAO implementations for accessing the data in different ways.
In this case one dao goes to DB, and another dao implementation goes to object graph.
The interface of DAO has to be designed very carefully. Your 'Criteria' has to be generic enough to encapsulate the way you're going to get the data from.
How to achieve this level of decoupling? The answer can vary depending on your system, by in general, I would say, the answer would be "as usual, by adding an another level of indirection" :)
You can also think about your criteria object as a data object where you supply only the data needed for the query. In this case you won't even need to support different Criteria.
Each particular implementation of DAO will take this data and treat it in its own different way: one will construct query for the graph, another will bind this to your SQL.
To minimize hassling with maintenance I would suggest you to use Dependency Management frameworks (like Spring, for example). Usually these frameworks are suited well to instantiate your DAO objects and play good together.
Good Luck!

No, DAO for databases only is a common misconception.
DAO is a "Data Access Object", not a "Database Access Object". Hence anywhere you need to CRUD data to/from ( e.g. file, memory, database, etc.. ), you can use DAO.
In Domain Driven Design there is a Repository pattern. While Repository as a word is far better than three random letters (DAO), the concept is the same.
The purpose of the DAO/Repository pattern is to abstract a backing data store, which can be anything that can hold a state.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.