A project I am working on, we have decided to implement ehcache for our caching purposes in our web based app. For the most part, we would be using it for Hibernate caching (of master tables) and query-caching.
However, we are also considering method caching at the DAO layer. Honestly, I am a bit skeptical about that. It could mean that say, we have a method at the DAO layer which in turn fires a query (which is already cached), would it make any sense to cache that method then? My feeling is that either we should cache the method OR cache the query that method eventually fires.
Please do let me know your inputs!
From my experience it very much depends on you application (and the kind / structure your data). The application I'm working currently on has 3 built-in layers of cache (all backed by Ehcache): one as Hibernate 2nd level cache, one for short termed hot objects at the middle tier and one for longer termed fat value object at the facade layer. The caches are tuned (cache query parameters, lifetimes, sizes, ...) to supplement each other well.
So a priori, I wouldn't say it doesn't work. There is a possiblity you can skip the entire ORM layer with that. If you have some profiling in place (I like perf4j for that) you can at least optimize for "hot" objects that are expensive to get from your DAOs. If you are using Spring Framework you can do this easily by applying e.g. #Cacheable annotations to your methods. Do your performance testing with (near) live data and requests if possible.
In fact, I think using Hibernate 2nd level cache is the easy / lazy thing to do (so to say a good first step), but the performance gains are limited. With some more specific caching you can easily get factors of hundreds or thousands as speedup for parts (hopefully the important ones) of you application, usually at reduced load.
Related
I am currently developing an application using Spring MVC4 and hibernate 4. I have implemented hibernate second level cache for performance improvement. If I use Redis which is an in-memory data structure store, used as a database, cache etc, the performance will increase but will it be a drastic change?
Drastic differences you may expect if you cache what is good to be cached and avoid caching data that should not be cached at all. Like beauty is in the eye of the beholder the same is with the performance. Here are several aspects you should have in mind when using hibernate AS second level cache provider:
No Custom serialization - Memory intensive
If you use second level caching, you would not be able to use fast serialization frameworks such as Kryo and will have to stick to java serializable which sucks.
On top of this for each entity type you will have a separate region and within each region, you will have an entry for each key of each entity.
In terms of memory efficiency, this is inefficient.
Lacks the ability to store and distribute rich objects
Most of the modern caches also present computing grid functionality having your objects fragmented into many small pieces decrease your ability to execute distributed tasks with guaranteed data co-location. That depends a little bit on the Grid provider, but for many would be a limitation.
Sub optimal performance
Depending on how much performance you need and what type of application you are having using hibernate second level cache might be a good or a bad choice. Good in terms that it is plug and play...." kind of..." bad because you will never squeeze the performance you would have gained. Also designing rich models mean more upfront work and more OOP.
Limited querying capabilities ON the Cache itself
That depends on the cache provider, but some of the providers really are not good doing JOINs with Where clause different than the ID. If you try to build and in memory index for a query on Hazelcast, for example, you will see what I mean.
Yes, if you use Redis, it will improve your performance.
No, it will not be a drastic change. :)
https://memorynotfound.com/spring-redis-application-configuration-example/
http://www.baeldung.com/spring-data-redis-tutorial
the above links will help you to find out the way of integration redis with your project.
It depends on the movement.
If You have 1000 or more requests per second and You are low on RAM, then Yes, use redis nodes on other machine to take some usage. It will greatly improve your RAM and request speed.
But If it's otherwise then do not use it.
Remember that You can use this approach later when You will see what is the RAM and database Connection Pool usage.
Your question was already discussed here. Check this link: Application cache v.s. hibernate second level cache, which to use?
This was the most accepted answer, which I agree with:
It really depends on your application querying model and the traffic
demands.
Using Redis/Hazelcast may yield the best performance since there won't
be any round-trip to DB anymore, but you end up having a normalized
data in DB and denormalized copy in your cache which will put pressure
on your cache update policies. So you gain the best performance at the
cost of implementing the cache update whenever the persisted data
changes.
Using 2nd level cache is easier to set up but it only stores
entities by id. There is also a query cache, storing ids returned by a
given query. So the 2nd level cache is a two-step process that you
need to fine tune to get the best performance. When you execute
projection queries the 2nd level object cache won't help you, since it
only operates on entity load. The main advantage of 2nd level cache is
that it's easier to keep it in sync whenever data changes, especially
if all your data is persisted by hibernate.
So, if you need ultimate
performance and you don't mind implementing your cache update logic
that ensures a minimum eventual consistency window, then go with an
external cache.
If you only need to cache entities (that usually don't change that
frequently) and you mostly access those through Hibernate entity
loading, then 2nd level cache can help you.
Hope it helps!
I am a fan of ORM - Object Relational Mapping and I have been using it with Rails for the past year and a half. Prior that, I use to write raw queries using JDBC and make Database do the heavy lifting via Stored Procedures. With ORM, I was initially happy to do stuff like coach.manager and manager.coaches which were very simple and easy to read.
But as time went by there were in-numerous associations creeping up and I ended up doing a.b.c.d which were firing queries in all directions, behind the scenes. With rails and ruby, the garbage collector went nuts and took insane time to load a very complex page which involves relatively lesser data. I had to replace this ORM style code by a simple Stored procedure and the result I saw was enormous. A page that took 50 seconds to load now takes only 2 seconds.
With this huge difference, should I continue using ORM? It is very clear it has severe overheads compared to a raw query.
In general, what are the general pitfalls of using an ORM framework like Hibernate, ActiveRecord?
An ORM is only a tool. If you don't use it correctly, you'll have bad results.
Nothing stops you from using dedicated HQL/criteria queries, with fetch joins or projections, to return the information that your page must display in as few queries as possible. This will take more or less the same time as dedicated SQL queries.
But of course, if you just get everything by ID and navigate through your objects without realizing how many queries it generates, it will lead to long loading times. The key is to know exactly what the ORM does behind the scene, and decide if it's appropriate or if another strategy must be adopted.
I think you've already identified the major tradeoff associated with ORM software. Every time you add a new layer of abstraction that tries to provide a generalized implementation of something that you used to do by hand there is going to be some loss of performance/efficiency.
As you noted, traversing multiple relationships such as a.b.c.d can be inefficient, because most ORM software will be doing an independent database query for each . along the way. But I'm not sure that means you should eliminate ORM altogether. Most ORM solutions (or at least, certainly Hibernate) allow you to specify custom queries where you can bring back exactly what you want in a single database operation. This should be about as fast as your dedicated SQL.
Really the issue is about understanding how the ORM layer is working behind the scenes, and realizing that while something like a.b.c.d is simple to write, what it causes the ORM layer to do as it is evaluated is not. As a general rule I always go with the simplest possible approach to begin, and then write optimized queries in areas where it makes sense/where it is obvious that the simple approach will not scale.
I'd say, one should use the appropriate tool for different tasks.
E.g., for CRUD operations, ORM frameworks like Hibernate can speed up development and it will perform well enough. Sometimes you need to do some necessary tweaks to achieve acceptable performance. I'm not sure, your task (what took 50 sec with Hibernate) could not be done properly with Hibernate, because you did not provide us with the details.
On the other hand, for example bulk operations involving hundreds of thousands of records is not the type of task you'd expect Hibernate will do without significant performance penalty.
As it was mentioned already, ORM is only a tool and you can use it eiter good or bad.
One of the most typical performance problems in ORMs is 1+N queries problem. It is caused by loading additional objects for each of objects from the list. This is caused by eager fetch of 1-to-n-relation entities for each element on list, the dealing is using HQL queries, specifying fields in projection or marking fetching 1-to-n relations to lazy.
Any time, you must exactly know what the ORM is doing in order to achieve good performance. Not understanding what operations are done in background is a way to disaster (slow, buggy and hard to analyze code because of unnecessary and wrongly written work-arounds).
I'm with Petar from your comments regarding the lazy fetching. Say you have an html table filled fields from object a.b.c.d. You could find your framework round-tripping the database thousands of times(possibly many more) . The disadvantage of ORM in this case is you have to read the documentation thoroughly. Most frameworks support disabling lazy fetching and many even support adding your own processing logic to bind the data set.
The net out is that almost any ORM is almost undoubtedly better than anything you are going to write yourself. You will find yourself saddled with maintaining huge libraries of boilerplate or worse writing the same code over and over again.
We are currently investigating to switch from our own data store layer with clean separation of transfer objects and data access objects to JPA. We used a generator to create the TOs, the DAOs and the SQL DDL as well from some documentation in docbook format. By this all of our stuff from documentation, the database structure and the generated Java classes where always in sync with a good documentation of the database itself.
What we discovered so far by using JPA:
Foreign key references cannot be used for imports, some special
queries and so on because they must not be placed in a managed
entity. JPA only allows the target class there.
Access to some user session scope is difficult upto impossible. We
still have no clue how to get the users id into the column
'userWhoLastMadeAnUpdate' in some PrePersist method.
Something expected to be quite easy with an ORM, namely "class
mapping" does not work at all. We are using HalDateTime
(http://sourceforge.net/projects/haldatetime/) internally.
Especially in the client. Mapping it with JPA directly is not
possible although HalDateTime supports it. Due to JPA restrictions
we have to use two fields in the entity.
JPA uses either one XML file to describe the mapping. So you have to
look at least into two files to even understand the relationship
between the Java class and the database. And the XML file becomes
huge for large applications.
Alternatively ORMs provide annotations in the Java class itself. So
its easier to learn and understand the relationship. But it forces
you to see all that database stuff in the client layer (which
completely breaks a proper layering).
You will have to restrict yourself to stay as close to a clean
database structure as anyhow possible. Otherwise you will for sure
end up with a mess of queries and statements by the ORM.
Use an ORM which provides a query language which is close to SQL
itself (JPA seems quite acceptable here). An ORM induced language
makes supporting a large application really expensive.
I tend to use Hibernate in combination with Spring framework and it's declarative transaction demarcation capabilities (e.g., #Transactional).
As we all known, hibernate tries to be as non-invasive and as transparent as possible, however this proves a bit more challenging when employing lazy-loaded relationships.
I see a number of design alternatives with different levels of transparency.
Make relationships not lazy-loaded (e.g., fetchType=FetchType.EAGER)
This vioalites the entire idea of lazy loading ..
Initialize collections using Hibernate.initialize(proxyObj);
This implies relatively high-coupling to the DAO
Although we can define an interface with initialize, other implementations are not guaranteed to provide any equivalent.
Add transaction behaviour to the persistent Model objects themselves (using either dynamic proxy or #Transactional)
I've not tried the dynamic proxy approach, although I never seemed to get #Transactional working on the persistent objects themselves. Probably due to that hibernate is operation on a proxy to bein with.
Loss of control when transactions are actually taking place
Provide both lazy/non-lazy API, e.g, loadData() and loadDataWithDeps()
Forces the application to know when to employ which routine, again tight coupling
Method overflow, loadDataWithA(), ...., loadDataWithX()
Force lookup for dependencies, e.g., by only providing byId() operations
Requires alot of non-object oriented routines, e.g., findZzzById(zid), and then getYyyIds(zid) instead of z.getY()
It can be useful to fetch each object in a collection one-by-one if there's a large processing overhead between the transactions.
Make part of the application #Transactional instead of only the DAO
Possible considerations of nested transactions
Requires routines adapted for transaction management (e.g., suffiently small)
Small programmatic impact, although might result in large transactions
Provide the DAO with dynamic fetch profiles, e.g., loadData(id, fetchProfile);
Applications must know which profile to use when
AoP type of transactions, e.g., intercept operations and perform transactions when necessary
Requires byte-code manipulation or proxy usage
Loss of control when transactions are performed
Black magic, as always :)
Did I miss any option?
Which is your preferred approach when trying to minimize the impact of lazy-loaded relationships in your application design?
(Oh, and sorry for WoT)
As we all known, hibernate tries to be as non-invasive and as transparent as possible
I would say the initial assumption is wrong. Transaparent persistence is a myth, since application always should take care of entity lifecycle and of size of object graph being loaded.
Note that Hibernate can't read thoughts, therefore if you know that you need a particular set of dependencies for a particular operation, you need to express your intentions to Hibernate somehow.
From this point of view, solutions that express these intentions explicitly (namely, 2, 4 and 7) look reasonable and don't suffer from the lack of transparency.
I am not sure which problem (caused by lazyness) you're hinting to, but for me the biggest pain is to avoid losing session context in my own application caches. Typical case:
object foo is loaded and put into a map;
another thread takes this object from the map and calls foo.getBar() (something that was never called before and is lazy evaluated);
boom!
So, to address this we have a number of rules:
wrap sessions as transparently as possible (e.g. OpenSessionInViewFilter for webapps);
have common API for threads/thread pools where db session bind/unbind is done somewhere high in the hierarchy (wrapped in try/finally) so subclasses don't have to think about it;
when passing objects between threads, pass IDs instead of objects themselves. Receiving thread can load object if it needs to;
when caching objects, never cache objects but their ids. Have an abstract method in your DAO or manager class to load the object from 2nd level Hibernate cache when you know the ID. The cost of retrieving objects from 2nd level Hibernate cache is still far cheaper than going to DB.
This, as you can see, is indeed nowhere close to non-invasive and transparent. But the cost is still bearable, to compare with the price I'd have to pay for eager loading. The problem with latter is that sometimes it leads to the butterfly effect when loading single referenced object, let alone a collection of entities. Memory consumption, CPU usage and latency to mention the least are also far worse, so I guess I can live with it.
A very common pattern is to use OpenEntityManagerInViewFilter if you're building a web application.
If you're building a service, I would open the TX on the public method of the service, rather than on the DAOs, as very often a method requires to get or update several entities.
This will solve any "Lazy Load exception". If you need something more advanced for performance tuning, I think fetch profiles is the way to go.
I'm asking this question given my chosen development frameworks of JPA (Hibernate implementation of), Spring, and <insert MVC framework here - Struts 1, Struts 2, Spring MVC, Stripes...>.
I've been thinking a bit about relationships in my entity layer - for example I have an order entity that has many order lines. I've set up my app so that it eagerly loads the order lines for every order. Do you think this is a lazy way to get around the lazy initialization problems that I would come across if I was to set the fetch strategy to false?
The way I see it, I have the following alternatives when retrieving entities and their associations:
Use the Open Session In View pattern to create the session on each request and commit the transaction before returning the response.
Implement a DTO (Data Transfer Object) layer such that every DAO query I execute returns the correctly initialized DTO for my purposes. I don't really like this option much because in my experience I've found that it creates a lot of boilerplate copying code and becomes messy to maintain.
Don't map any associations in JPA so that every query I execute returns only the entities I'm interested in - this will probably require me to have DTOs anyway and will be a pain to maintain and I think defeats the purpose of having an ORM in the first place.
Eagerly fetch all (or most associations) - in the example above, always fetch all order lines when I retrieve an order.
So my question is, when and under what circumstances would you use which of these options? Do you always stick with one way of doing it?
I would ask a colleague but I think that if I even mentioned the term 'Open Session in View' I would be greeted with blank stares :( What I'm really looking for here is some advice from a senior or very experienced developer.
Thanks guys!
Open Session in View has some problems.
For example, if the transaction fails, you might know it too late at commit time, once you are nearly done rendering your page (possibly the response already commited, so you can't change the page !) ... If you had know that error before, you would have followed a different flow and ended up rendering a different page...
Other example, reading data on-demand might turn to many "N+1 select" problems, that kill your performance.
Many projects use the following path:
Maintain transactions at the business layer ; load at that point everything you are supposed to need.
Presentation layer runs the risk of LazyExceptions : each is considered a programming error, caught during tests, and corrected by loading more data in the business layer (you have the opportunity to do it efficiently, avoiding "N+1 select" problems).
To avoid creating extra classes for DTOs, you can load the data inside the entity objects themselves. This is the whole point of the POJO approach (uses by modern data-access layers, and even integration technologies like Spring).
I've successfully solved all my lazy initialization problems with Open Session In View -pattern (ie. the Spring implementation). The technologies I used were the exact same as you have.
Using this pattern allows me to fully map the entity relationships and not worry about fetching child entities in the dao. Mostly. In 90% of the cases the pattern solves the lazy initialization needs in the view. In some cases you'll have to "manually" initialize relationships. These cases were rare and always involved very very complex mappings in my case.
When using Open Entity Manager In View pattern it's important to define the entity relationships and especially propagation and transactional settings correctly. If these are not configured properly, there will be errors related to closed sessions when some entity is lazily initialized in the view and it fails due to the session having been closed already.
I definately would go with option 1. Option 2 might be needed sometimes, but I see absolutely no reason to use option 3. Option 4 is also a no no. Eagerly fetching everything kills the performance of any view that needs to list just a few properties of some parent entities (orders in tis case).
N+1 Selects
During development there will be N+1 selects as a result of initializing some relationships in the view. But this is not a reason to discard the pattern. Just fix these problems as they arise and before delivering the code to production. It's as easy to fix these problems with OEMIV pattern as it's with any other pattern: add the proper dao or service methods, fix the controller to call a different finder method, maybe add a view to the database etc.
I have successfully used the Open-Session-in-View pattern on a project. However, I recently read in "Spring In Practice" of an interesting potential problem with non-repeatable reads if you manage your transactions at a lower layer while keeping the Hibernate session open in the view layer.
We managed most of our transactions in the service layer, but kept the hibernate session open in the view layer. This meant that lazy reads in the view were resulting in separate read transactions.
We managed our transactions in our service layer to minimize transaction duration. For instance, some of our service calls resulted in both a database transaction and a web service call to an external service. We did not want our transaction to be open while waiting for a web service call to respond.
As our system never went into production, I am not sure if there were any real problems with it, but I suspect that there was the potential for the view to attempt to lazily load an object that has been deleted by someone else.
There are some benefits of DTO approach though. You have to think beforehand what information you need. In some cases this will prevent you from generating n+1 select statements. It helps also to see where to use eager fetching and/or optimized views.
I'll also throw my weight behind the Open-Session-in-View pattern, having been in the exact same boat before.
I work with Stripes without spring, and have created a manual filter before that tends to work well. Coding transaction logic on the backend turns messy really quick as you've mentioned. Eagerly fetching everything becomes TERRIBLE as you map more and more objects to each other.
One thing I want to add that you may not have come across is Stripersist and Stripernate - Stripersist being the more JPA flavor - auto-hydration filters that take a lot of the work off your shoulders.
With Stripersist you can say things like /appContextRoot/actions/view/3 and it will auto-hydrate the JPA Entity on the ActionBean with id of 3 before the event is executed.
Stripersist is in the stripes-stuff package on sourceforge. I now use this for all new projects, as it's clean and easily supports multiple datasources if necessary.
Does the Order and Order Lines compose a high volume of data? Do they take part in online processes where real-time response is required? If so, you might consider not using eager fetching - it does make a huge diference in performance. If the amount of data is small, there is no problem in eager fetching.
About using DTOs, it might be a viable implementation.
If your business layer is used internally by your own application (i.e a small web app and its business logic) it'd probably be best to use your own entities in your view with open session in view pattern since it's simpler.
If your entities are used by many applications (i.e a backend application providing a service in your corporation) it'd be interesting to use DTOs since you would not expose your model to your clients. Exposing it could mean you would have a harder time refactoring your model since it could mean breaking contracts with your clients. A DTO would make that easier since you have another layer of
abstraction. This can be a bit strange since EJB3 would theorically eliminate the need of DTOs.
I'm thinking about using the Open Session In View (OSIV) filter or interceptor that comes with Spring, as it seems like a convenient way for me as a developer. If that's what you recommend, do you recommend using a filter or an interceptor and why?
I'm also wondering how it will mix with HibernateTemplate and if I will lose the ability to mark methods as #Transactional(readOnly = true) etc and thus lose the ability to get some more fine grained transaction control?
Is there some kind of best practice for how to integrate this kind of solution with a three tier architecture using Hibernate and Spring (as I suppose my decision to use Wicket for presentation shouldn't matter much)?
If I use OSIV I will at least never run into lazy loading exceptions, on the other hand my transaction will live longer before being able to commit by being uncommitted in the view as well.
It's really a matter of personal taste.
Personally, I like to have transaction boundaries at the service layer. If you start thinking SOA, every call to a service should be independent. If your view layer has to call 2 different services (we could argue that this is already a code smell) then those 2 services should behave independently of each other, could have different transaction configurations, etc... Having no transactions open outside of the services also helps make sure that no modification occurs outside of a service.
OTOH you will have to think a bit more about what you do in your services (lazy loading, grouping functionalities in the same service method if they need a common transactionality, etc ...).
One pattern that can help reduce lazy-loading error is to use Value Object outside of the service layer. The services should always load all the data needed and copy it to VOs. You lose the direct mapping between your persistent objects and your view layer (meaning you have to write more code), but you might find that you gain in clarity ...
Edit: The decision will be based on trade offs, so I still think it is at least partly a matter of personal taste. Transaction at the service layer feels cleaner to me (more SOA-like, the logic is clearly restrained to the service layer, different calls are clearly separated, ...). The problem with that approach is LazyLoadingExceptions, which can be resolved by using VO. If VO are just a copy of your persistent objects, then yes, it is clearly a break of the DRY principle. If you use VO like you would use a database View, then VO are a simplification of you persistent objects. It will still be more code to write, but it will make your design clearer. It becomes especially useful if you need to plug some authorization scheme : if certain fields are visible only to certain roles, you can put the authorization at the service level and never return data that should not be viewed.
If I use OSIV I will at least never
run into lazy loading exceptions
that is not true, in fact its extremely easy to run into the infamous LazyInitializationException, just load an object, and try to read an attribute of it, after the view, depending on your configuration you WILL get the LIE