I have an application that has the following java files:
Services:
AccountService.java
UserService.java
MessageService.java
DAOs:
AccountDAO.java
UserDAO.java
MessageDAO.java
Tables:
ACCOUNTS
USERS
MESSAGES
In MessageService.java, I have a function newMessage() that has to query data from all the 3 tables.
(1) According to Spring's decoupling standards, this is how the calls should be made:
AccountDAO.java -- ACCOUNTS
/
MessageService.java -- MessageDAO.java -- MESSAGES
\
UserDAO.java -- USERS
But the problem is, this approach makes 3 DB calls.
(2) For better Performance, I would do:
MessageService.java -- MessageDAO.java -- Join ACCOUNTS, MESSAGES and USERS
But this way, it's tightly coupled and if there's a change in USERS table, I'll have to change MessageDAO.java (and all other DAOs that I have, that use the USERS table) too. That is really bad, since (in the non-hypothetical) we have a LOT of DAOs
Which approach is considered a better practice? Or is there another approach that I'm missing?
According to Spring's decoupling standards, this is how the calls should be made
This is false. There are no "decoupling standards" with Spring. Please find me a reference in the Spring documentation that tells you how you must structure your persistence layer code.
Typically, you would have one DAO for each "entity" that your application wants to operate on, but it would be foolish to take this pattern to the extreme of deconstructing a query that joins multiple tables together into three distinct queries.
If you need to have a newMessage() method that joins some tables together in a query, choose which DAO that makes the most sense in - probably the MessageDAO and write the query/method in the way that makes sense.
But there is no rule saying that you must have distinct queries for each entity and that one DAO class is not allowed to make queries that touch the tables of other entities. This is too extreme and has no benefit.
If one the other hand you are worried about the maintainability of having multiple data layer classes that are aware of all of your tables, then look into an ORM solution as parsifal mentioned to alleviate some of this work.
The alternative is to use an ORM such as Hibernate, mapping each of your tables to an entity. Then define the logical relationships between those entities. For example, a 1:M relationship between users and messages. As your tables change, your mappings will need to change, but your SQL won't (because Hibernate will generate it).
For most relationships, Hibernate is very good at creating joins to retrieve related entities in one query. You can control whether this happens; I recommend using lazy loads as a default for most relationships, and switching to eager loads as-needed.
Doing this as 3 separate queries might impact correctness if the data might change between one query and the next. Don't let (your idea of) Spring's guidelines make you write code that gets the wrong results.
It sounds like a join in SQL is the right approach.
Whichever method you follow, it's always about finding a sweet spot between decoupling and performance. It holds with even the selection of number of layers, etc.
So I guess as #mattb recommended, it's completely fine to join tables in a DAO if it makes sense in the particular context
Related
In our code base we make extensive use of DAOs. In essence a layer that exposes a low level read/write api and where each DAO maps to a table in the database.
My question is should the dao's update methods take entity id's or entity references as arguments if we have different kinds of updates on an entity.
For example, say we have customers and adressess. We could have
customer.address = newAddress;
customerDao.updateCustomerAddress(customer);
or we could have
customerDao.updateCustomerAddress(customer.getId(), newAddress);
Which approach would you say is better?
The latter is more convenient since if we have the entity we always have the id, so it will always work. The converse is not always the case though, but would have to be preceded with getting the entity before performing the update.
In DDD we have Aggregates and Repositories. Aggregates ensure that the business invariants hold and Repositories handle the persistence.
I recommend that Aggregates should be pure, with no dependencies to any infrastructure code; that is, Aggregates should not know anything about persistence.
Also, you should use the Ubiquitous language in your domain code. That being said, your code should look like this (in the application layer):
customer = customerRepository.loadById(customerId);
customer.changeAddress(address);
customerRepository.save(customer);
I assume your question is
Which approach of the two is better?
I would prefer the second approach. It states clearly what will be done. The update object will be freshly loaded and it is absolutely clear that only the address will be updated. The first approach leaves room for doubt. What happens if customer.name has a new value aswell? Will it also be update?
I've been trying to improve the separation of concerns when it comes to applications that access a database (via Hibernate).
On one of the applications I've been using the following approach:
Create services with business logic that have no connection/awareness of the database. They only communicate with GeneralDAO (and with other services);
A GeneralDAO responsible for CRUD/find operations, and with methods that involve more complex database queries.
The problems I see with this approach are:
GeneralDAO slowly becomes a God Object, when your application grows and require lots of specific database queries.
Sometimes the more specific Services become only proxies to the GeneralDAO, since the method is simple and only requires a database query. See example 1.
Example 1: Service is just a proxy
BookService manages things related to books in the Library application. Let's consider 2 methods:
archiveBook(Book)
findByIsbn(String isbn)
In archiveBook(Book) there might be considerable business logic involved - we might imagine this involves calls to:
distributionService.unbox(Book);
archivalBook.archive(Book);
librarianService.informNewBook(Book);
But findByIsbn(String isbn) is a lot more simple: it just needs to execute an SQL call to the database. So in this case I see two options:
Redirect the call to an object that can speak to the database to execute the query. For example generalDAO.findByIsbn(String isbn), that uses a db communication layer (in Hibernate it would use a sessionFactory or EntityManager) to execute the query.
Make that database layer available to the BookService, so that it executes the query itself
Questions/opinions (first number identifies the option above):
1.1. Isn't it strange to have 2 methods with the exact same signature, even if this is done to keep the BookService independent of the database layer (and ORM)?
1.2. How do you suggest avoiding The God anti-pattern? Would you suggest breaking the GeneralDAO into several DAOs depending on what the methods do? In this case, won't we risk needing to inject lots of DAOs into some Services, leading to a Service having too many objects injected into it?
2.1 What do you think of this alternative? Doesn't it break the "separation of concerns" by having the BookService be aware of objects at two different levels of abstraction (the DAO and the sessionFactory/EntityManager)?
3.1. Would you suggest any other approach/pattern/best practise?
Thanks!
1.2. How do you suggest avoiding The God anti-pattern? Would you suggest breaking the GeneralDAO into several DAOs depending on what
the methods do? In this case, won't we risk needing to inject lots of
DAOs into some Services, leading to a Service having too many objects
injected into it?
Generally, a DAO class should handle a specific entity.
If one of your entities require many kinds of queries, you could divide it again into two or more DAOs by grouping them by common concern (for example : reading, writing, selecting on agregates, etc...) as you said.
If you have too many queries and too many DAO, maybe, you should check if you don't write almost the same queries in several methods. It it the case, use specification or Criteria API to allow the client to custom queries by parameters. If the queries are really different, you have various processings. So, using multiple DAOs seems a suitable solution. It avoids increasing the complexity and the rise of god objects.
1.1. Isn't it strange to have 2 methods with the exact same signature, even if this is done to keep the BookService independent of the
database layer (and ORM)?
When you divide you app in logic layers, as you noticed, in some operations, some layers perform only delegation calls to the below layer. So in these cases, it is rather common to have method names which are the same. I would go further : it is a good practice to have the same name if it is just delegation call. Why do we create a variation in the conveyed behavior if they both address the same need?
2.1 What do you think of this alternative? Doesn't it break the "separation of concerns" by having the BookService be aware of objects
at two different levels of abstraction (the DAO and the
sessionFactory/EntityManager)?
BookService depends on DAOs but should not depend on sessionFactory/EntityManager which makes part of the DAO implementation.
BookService calls DAO which uses a sessionFactory/EntityManager.
If necessary, BookService may specify transactional details on itself or on its methods with #Transactional annotation.
3.1. Would you suggest any other approach/pattern/best practice?
As you use Spring, try to rely on the Sping JPA repository (less boiler plate to handle for common cases and extensible class)
Using specification or criteria patterns when you have several variants of some queries.
I have an odd business requirement.
We have multiple, unrelated entity types that will need to be displayed in a unified list, with some basic information from the entity, sorted by the only field they are all guaranteed to have, DATE. These entities may or may not even be in the same database. The result set needs to be pageable.
Is there any feasible way of achieving this through Criteria, HQL or some sane means?
Normally you would let all these classes extend common base class and use polymorphic Hibernate query. From your description this doesn't seem to be feasible.
Of course if you want to go the Hibernate way, you would have to first fetch the size of each unrelated table, determine in which table do the records in requested page lie (or maybe in several ones) and manually fetch proper page. This is really cumbersome and definitely should be hidden under some deep DAO.
Looks like do only sane solution is the good old SQL with UNION and mapping native query to your domain objects. Hibernate supports native queries quite well.
I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?
An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.
I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.
I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.
As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).
I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.
We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.
I was asked to have a look at a legacy EJB3 application with significant performance problems. The original author is not available anymore so all I've got is the source code and some user comments regarding the unacceptable performance. My personal EJB3 skill are pretty basic, I can read and understand the annotated code but that's all until know.
The server has a database, several EJB3 beans (JPA) and a few stateless beans just to allow CRUD on 4..5 domain objects for remote clients. The client itself is a java application. Just a few are connected to the server in parallel. From the user comments I learned that
the client/server app performed well in a LAN
the app was practically unusable on a WAN (1MBit or more) because read and update operations took much too long (up to several minutes)
I've seen one potential problem - on all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations, is it advisable to start tuning with the fetching strategies?
But that would not explain performance issues on update operations, or would it? Update is handled by an EntityManager, the client just passes the domain object to the manager bean and persisting is done with nothing but manager.persist(obj). Maybe the domain objects that are sent to the server are just too big (maybe a side effect of the EAGER strategy).
So my actual theory is that too many bytes are sent over a rather slow network and I should look at reducing the size of result sets.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
On all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations?
Depending on the relations betweens classes, you might be fetching much more (the whole database?) than actually wanted when retrieving entities?
is it advisable to start tuning with the fetching strategies?
I can't say that making all relations EAGER is a very standard approach. To my experience, you usually keep them lazy and use "Fetch Joins" (a type of join allowing to fetch an association) when you want to eager load an association for a given use case.
But that would not explain performance issues on update operations, or would it?
It could. I mean, if the app is retrieving a big fat object graph when reading and then sending the same fat object graph back to update just the root entity, there might be a performance penalty. But it's kinda weird that the code is using em.persist(Object) to update entities.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
The obvious ones include:
Retrieving more data than required
N+1 requests problems (bad fetching strategy)
Poorly written JPQL queries
Non appropriate inheritance strategies
Unnecessary database hits (i.e. lack of caching)
I would start with writing some integration tests or functional tests before touching anything to guarantee you won't change the functional behavior. Then, I would activate SQL logging and start to look at the generated SQL for the major use cases and work on the above points.
From DBA position.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
Turn off caching
Enable sql logging Ejb3/Hibernate generates by default a lots of extremely stupid queries.
Now You see what I mean.
Change FetchType.EAGER to FetchType.LAZY
Say "no" for big business logic between em.find em.persist
Use ehcache http://ehcache.org/
Turn on entity cache
If You can, make primary keys immutable ( #Column(updatable = false, ...)
Turn on query cache
Never ever use Hibernate if You want big performance:
http://www.google.com/search?q=hibernate+sucks
I my case a similar performance problem wasn't depending on the fetch strategy. Or lets say it was not really possible to change the business logic in the existing fetch strategies. In my case the solution was simply adding indices.
When your JPA Object model have a lot of relationsships (OneToOne, OneToMany, ...) you will typical use JPQL statements with a lot of joins. This can result in complex SQL translations. When you take a look at the datamodel (generated by the JPA) you will recognize that there are no indices for any of your table rows.
For example if you have a Customer and a Address object with an oneToOne relationship everything will work well on the first look. Customer and Address have an foreign key. But if you do selections like this
Select c from Customer as c where c.address.zip='8888'
you should take care about your table column 'zip' in the table ADDRESS. JPA will not create such an index for you during deployment. So in my case I was able to speed up the database performance by simply adding indices.
An SQL Statement in your database looks like this:
ALTER TABLE `mydatabase`.`ADDRESS` ADD INDEX `zip_index`(`IZIP`);
In the question, and in the other answers, I'm hearing a lot of "might"s and "maybe"s.
First find out what's going on. If you haven't done that, we're all just poking in the dark.
I'm no expert on this kind of system, but this method works on any language or OS.
When you find out what's making it take too long, why don't you summarize it here?
I'm especially interested to know if it was something that might have been guessed.