Hibernate Cache1 OutOfMemory with OpenSessionInView

Hibernate Cache1 OutOfMemory with OpenSessionInView - java

I've got a Vehicle entity, with a Vehicle DTO.
I use OpenSessionInView with Stripes.
In my Stripes action bean, i need to generate a csv containing the data for about 50000 Vehicles.
Thus, as a Stripes developer told me to do, i write the file to the outputstream in the following method:
StreamingResolution() {...}.stream(HttpServletResponse)
I have a service that take some pagination information, load a part of the vehicles and transform them into DTO's.
These dto's are returned to the view and written to the csv.
The pagination system (500 items for each page) was made to avoid having a list of 50000 DTO and have some memory problems.
But it doesn't work perfectly yet. With Jmap i saw that at the end of the csv process, there are more than 40000 vehicules loaded in heap space and not garbage collected.
With Yourkit profiler, it seems to me that these entities are still in the L1 cache of hibernate (referenced in StatefulPersistenceContext), and since i have OpenSessionInView, i guess the problem is that the conversation is a bit long and the cache need to be cleaned...
I just wonder how to do that in an elegant way since my dao methods loading vehicles are used by a lot of services that doesn't necessary need a session clean/flush.
Someone know i could do? I guess i can make a method in the DAO/Service to clear the session but it's not very elegant...
This is a pretty big project and i made a very simple description of it. Please don't tell me to not use opensessioninview or something like that, it's not my decision... ;)

It may not be elegant, but in cases like this evicting entities from the session is the only practical solution.
For example, once you've finished writing an entity's data to the output stream, call session.evict(entity) to remove it from the session cache. Alternatively, call this at the end of each "page".
The combination of the paging mechanism and the eviction should ensure you never have more than 500 entities in the cache at one time.

Evicting objects is elegant and it is the only good solution when doing reports, exports to CSV etc.
The best elegant way is to implement it inside a service, which will open iterator, evict each item after process. The class would be provided for service call, that would be item consumer.
public interface ItemConsumer {
void consume(Item item);
}
public void processAllItems(ItemConsumer consumer) {
.. do your job
}

You probably want to explicitly open a stateless session.
e.g.
StatelessSession session = sessionFactory.openStatelessSession();
From the docs:
A StatelessSession has no persistence
context associated with it and does
not provide many of the higher-level
life cycle semantics. In particular, a
stateless session does not implement a
first-level cache nor interact with
any second-level or query cache. It
does not implement transactional
write-behind or automatic dirty
checking
See here for more details:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html

Related

Struts2 static data storage / access

I am trying to find what is the usual design/approach for "static/global"! data access/storage in a web app, I'm using struts 2. Background, I have a number of tables I want to display in my web app.
Problem 1.
The tables will only change and be updated once a day on the server, I don't want to access a database/or loading a file for every request to view a table.
I would prefer to load the tables to some global memory/cache once (a day), and each request get the table from there, rather than access a database.
I imagine this is a common scenario and there is an established approach? But I cant find it at the moment.
For struts 2, Is the ActionContext the right place for this data.
If so, any link to a tutorial would be really appreciated.
Problem 2.
The tables were stored in a XML file I unmarshalled with JAXB to get the table objects, and so the lists for the tables.
For a small application this was OK, but I think for the web app, its hacky to store the xml as resources and read in the file as servlet context and parse, or is it?
I realise I may be told to store the tables to a database accessing with a dao, and use hibernate to get the objects.
I am just curious as to what is the usual approach with data already stored in XML file? Given I will have new XML files daily.
Apologies if the questions are basic, I have a large amount of books/reference material, but its just taking me time to get the higher level design answers.

Not having really looked at the caching options I would fetch the data from the DB my self but only after an interval has passed.
Usually you work within the Action scope, the next level up is the Session and the most global is the Application. A simple way to test this is to create an Action class which implements ApplicationAware. Then you can get the values put there from any jsp/action... anywhere you can get to the ActionContext (which is most anyplace) see: http://struts.apache.org/2.0.14/docs/what-is-the-actioncontext.html
Anyways, I would implement a basic interceptor which would check if new data should be available and I have not looked it up already, then load the new data (the user triggering this interceptor may not need this new data, so doing this in a new thread would be a good idea).
This method increases the complexity, as you are responsible for managing some data structures and making them co-operate with the ORM.
I've done this to load data from tables which will never need to be loaded again, and that data stands on it's own (I don't need to find relationships between it and other tables). This is quick and dirty, Stevens solution is far more robust and probably would pay you back at a later date when further performance is a requirement.

This isn't really specific to Struts2 at all. You definitely do not want to try storing this information in the ActionContext -- that's a per-request object.
You should look into a caching framework like EHCache or something similar. If you use Hibernate for your persistence, Hibernate has options for caching data so that it does not need to hit the database on every request. (Hibernate can also use EHCache for its second-level cache).

As mentioned earlier, the best approach would be using EHCache or some other trusted cache manager.
Another approach is to use a factory to access the information. For instance, something to the effect of:
public class MyCache {
private static MyCache cache = new MyCache();
public static MyCache getCache() {
return cache;
}
(data members)
private MyCache() {
(update data members)
}
public synchronized getXXX() {
...
}
public synchronized setXXX(SomeType data) {
...
}
}
You need to make sure you synchronize all your reads and writes to make sure you don't have race conditions while updating the cache.
synchronized (MyCache.getCahce()) {
MyCahce.getCache().getXXX();
MyCache.getCache().getTwo();
...
}
etc
Again, better to use EHCache or something else turn-key since this is likely to be fickle without good understanding of the mechanisms. This sort of cache also has performance issues since it only allows ONE thread to read/write to the cache at a time. (Possible ways to speed up are to use thread locals and read/write locks - but that sort of thing is already built into many of the established cache managers)

Hibernate lazy-load application design

I tend to use Hibernate in combination with Spring framework and it's declarative transaction demarcation capabilities (e.g., #Transactional).
As we all known, hibernate tries to be as non-invasive and as transparent as possible, however this proves a bit more challenging when employing lazy-loaded relationships.
I see a number of design alternatives with different levels of transparency.
Make relationships not lazy-loaded (e.g., fetchType=FetchType.EAGER)
This vioalites the entire idea of lazy loading ..
Initialize collections using Hibernate.initialize(proxyObj);
This implies relatively high-coupling to the DAO
Although we can define an interface with initialize, other implementations are not guaranteed to provide any equivalent.
Add transaction behaviour to the persistent Model objects themselves (using either dynamic proxy or #Transactional)
I've not tried the dynamic proxy approach, although I never seemed to get #Transactional working on the persistent objects themselves. Probably due to that hibernate is operation on a proxy to bein with.
Loss of control when transactions are actually taking place
Provide both lazy/non-lazy API, e.g, loadData() and loadDataWithDeps()
Forces the application to know when to employ which routine, again tight coupling
Method overflow, loadDataWithA(), ...., loadDataWithX()
Force lookup for dependencies, e.g., by only providing byId() operations
Requires alot of non-object oriented routines, e.g., findZzzById(zid), and then getYyyIds(zid) instead of z.getY()
It can be useful to fetch each object in a collection one-by-one if there's a large processing overhead between the transactions.
Make part of the application #Transactional instead of only the DAO
Possible considerations of nested transactions
Requires routines adapted for transaction management (e.g., suffiently small)
Small programmatic impact, although might result in large transactions
Provide the DAO with dynamic fetch profiles, e.g., loadData(id, fetchProfile);
Applications must know which profile to use when
AoP type of transactions, e.g., intercept operations and perform transactions when necessary
Requires byte-code manipulation or proxy usage
Loss of control when transactions are performed
Black magic, as always :)
Did I miss any option?
Which is your preferred approach when trying to minimize the impact of lazy-loaded relationships in your application design?
(Oh, and sorry for WoT)

As we all known, hibernate tries to be as non-invasive and as transparent as possible
I would say the initial assumption is wrong. Transaparent persistence is a myth, since application always should take care of entity lifecycle and of size of object graph being loaded.
Note that Hibernate can't read thoughts, therefore if you know that you need a particular set of dependencies for a particular operation, you need to express your intentions to Hibernate somehow.
From this point of view, solutions that express these intentions explicitly (namely, 2, 4 and 7) look reasonable and don't suffer from the lack of transparency.

I am not sure which problem (caused by lazyness) you're hinting to, but for me the biggest pain is to avoid losing session context in my own application caches. Typical case:
object foo is loaded and put into a map;
another thread takes this object from the map and calls foo.getBar() (something that was never called before and is lazy evaluated);
boom!
So, to address this we have a number of rules:
wrap sessions as transparently as possible (e.g. OpenSessionInViewFilter for webapps);
have common API for threads/thread pools where db session bind/unbind is done somewhere high in the hierarchy (wrapped in try/finally) so subclasses don't have to think about it;
when passing objects between threads, pass IDs instead of objects themselves. Receiving thread can load object if it needs to;
when caching objects, never cache objects but their ids. Have an abstract method in your DAO or manager class to load the object from 2nd level Hibernate cache when you know the ID. The cost of retrieving objects from 2nd level Hibernate cache is still far cheaper than going to DB.
This, as you can see, is indeed nowhere close to non-invasive and transparent. But the cost is still bearable, to compare with the price I'd have to pay for eager loading. The problem with latter is that sometimes it leads to the butterfly effect when loading single referenced object, let alone a collection of entities. Memory consumption, CPU usage and latency to mention the least are also far worse, so I guess I can live with it.

A very common pattern is to use OpenEntityManagerInViewFilter if you're building a web application.
If you're building a service, I would open the TX on the public method of the service, rather than on the DAOs, as very often a method requires to get or update several entities.
This will solve any "Lazy Load exception". If you need something more advanced for performance tuning, I think fetch profiles is the way to go.

The best place to store large data retrieved by a java servlet (Tomcat)

I have the java servlet that retrieves data from a mysql database. In order to minimize roundtrips to the database, it is retrieved only once in init() method, and is placed to a HashMap<> (i.e. cached in memory).
For now, this HashMap is a member of the servlet class. I need not only store this data but also update some values (counters in fact) in the cached objects of underlying hashmap value class. And there is a Timer (or Cron task) to schedule dumping these counters to DB.
So, after googling i found 3 options of storing the cached data:
1) as now, as a member of servlet class (but servlets can be taken out of service and put back into service by the container at will. Then the data will be lost)
2) in ServletContext (am i right that it is recommended to store small amounts of data here?)
3) in a JNDI resource.
What is the most preferred way?

Put it in ServletContext But use ConcurrentHashMap to avoid concurrency issues.

From those 3 options, the best is to store it in the application scope. I.e. use ServletContext#setAttribute(). You'd like to use a ServletContextListener for this. In normal servlets you can access the ServletContext by the inherited getServletContext() method. In JSP you can access it by ${attributename}.
If the data is getting excessive large that it eats too much of Java's memory, then you should consider a 4th option: use a cache manager.

The most obvious way would be use something like ehcache and store the data in that. ehcache is a cache manager that works much like a hash map except the cache manager can be tweaked to hold things in memory, move them to disk, flush them, even write them into a database via a plugin etc. Depends if the objects are serializable, and whether your app can cope without data (i.e. make another round trip if necessary) but I would trust a cache manager to do a better job of it than a hand rolled solution.

If your cache can become large enough and you access it often it'll be reasonable to utilize some caching solution. For example ehcache is a good candidate and easily integrated with Spring applications, too. Documentation is here.
Also check this overview of open-source caching solutions for Java.

Open Session In View Pattern

I'm asking this question given my chosen development frameworks of JPA (Hibernate implementation of), Spring, and <insert MVC framework here - Struts 1, Struts 2, Spring MVC, Stripes...>.
I've been thinking a bit about relationships in my entity layer - for example I have an order entity that has many order lines. I've set up my app so that it eagerly loads the order lines for every order. Do you think this is a lazy way to get around the lazy initialization problems that I would come across if I was to set the fetch strategy to false?
The way I see it, I have the following alternatives when retrieving entities and their associations:
Use the Open Session In View pattern to create the session on each request and commit the transaction before returning the response.
Implement a DTO (Data Transfer Object) layer such that every DAO query I execute returns the correctly initialized DTO for my purposes. I don't really like this option much because in my experience I've found that it creates a lot of boilerplate copying code and becomes messy to maintain.
Don't map any associations in JPA so that every query I execute returns only the entities I'm interested in - this will probably require me to have DTOs anyway and will be a pain to maintain and I think defeats the purpose of having an ORM in the first place.
Eagerly fetch all (or most associations) - in the example above, always fetch all order lines when I retrieve an order.
So my question is, when and under what circumstances would you use which of these options? Do you always stick with one way of doing it?
I would ask a colleague but I think that if I even mentioned the term 'Open Session in View' I would be greeted with blank stares :( What I'm really looking for here is some advice from a senior or very experienced developer.
Thanks guys!

Open Session in View has some problems.
For example, if the transaction fails, you might know it too late at commit time, once you are nearly done rendering your page (possibly the response already commited, so you can't change the page !) ... If you had know that error before, you would have followed a different flow and ended up rendering a different page...
Other example, reading data on-demand might turn to many "N+1 select" problems, that kill your performance.
Many projects use the following path:
Maintain transactions at the business layer ; load at that point everything you are supposed to need.
Presentation layer runs the risk of LazyExceptions : each is considered a programming error, caught during tests, and corrected by loading more data in the business layer (you have the opportunity to do it efficiently, avoiding "N+1 select" problems).
To avoid creating extra classes for DTOs, you can load the data inside the entity objects themselves. This is the whole point of the POJO approach (uses by modern data-access layers, and even integration technologies like Spring).

I've successfully solved all my lazy initialization problems with Open Session In View -pattern (ie. the Spring implementation). The technologies I used were the exact same as you have.
Using this pattern allows me to fully map the entity relationships and not worry about fetching child entities in the dao. Mostly. In 90% of the cases the pattern solves the lazy initialization needs in the view. In some cases you'll have to "manually" initialize relationships. These cases were rare and always involved very very complex mappings in my case.
When using Open Entity Manager In View pattern it's important to define the entity relationships and especially propagation and transactional settings correctly. If these are not configured properly, there will be errors related to closed sessions when some entity is lazily initialized in the view and it fails due to the session having been closed already.
I definately would go with option 1. Option 2 might be needed sometimes, but I see absolutely no reason to use option 3. Option 4 is also a no no. Eagerly fetching everything kills the performance of any view that needs to list just a few properties of some parent entities (orders in tis case).
N+1 Selects
During development there will be N+1 selects as a result of initializing some relationships in the view. But this is not a reason to discard the pattern. Just fix these problems as they arise and before delivering the code to production. It's as easy to fix these problems with OEMIV pattern as it's with any other pattern: add the proper dao or service methods, fix the controller to call a different finder method, maybe add a view to the database etc.

I have successfully used the Open-Session-in-View pattern on a project. However, I recently read in "Spring In Practice" of an interesting potential problem with non-repeatable reads if you manage your transactions at a lower layer while keeping the Hibernate session open in the view layer.
We managed most of our transactions in the service layer, but kept the hibernate session open in the view layer. This meant that lazy reads in the view were resulting in separate read transactions.
We managed our transactions in our service layer to minimize transaction duration. For instance, some of our service calls resulted in both a database transaction and a web service call to an external service. We did not want our transaction to be open while waiting for a web service call to respond.
As our system never went into production, I am not sure if there were any real problems with it, but I suspect that there was the potential for the view to attempt to lazily load an object that has been deleted by someone else.

There are some benefits of DTO approach though. You have to think beforehand what information you need. In some cases this will prevent you from generating n+1 select statements. It helps also to see where to use eager fetching and/or optimized views.

I'll also throw my weight behind the Open-Session-in-View pattern, having been in the exact same boat before.
I work with Stripes without spring, and have created a manual filter before that tends to work well. Coding transaction logic on the backend turns messy really quick as you've mentioned. Eagerly fetching everything becomes TERRIBLE as you map more and more objects to each other.
One thing I want to add that you may not have come across is Stripersist and Stripernate - Stripersist being the more JPA flavor - auto-hydration filters that take a lot of the work off your shoulders.
With Stripersist you can say things like /appContextRoot/actions/view/3 and it will auto-hydrate the JPA Entity on the ActionBean with id of 3 before the event is executed.
Stripersist is in the stripes-stuff package on sourceforge. I now use this for all new projects, as it's clean and easily supports multiple datasources if necessary.

Does the Order and Order Lines compose a high volume of data? Do they take part in online processes where real-time response is required? If so, you might consider not using eager fetching - it does make a huge diference in performance. If the amount of data is small, there is no problem in eager fetching.
About using DTOs, it might be a viable implementation.
If your business layer is used internally by your own application (i.e a small web app and its business logic) it'd probably be best to use your own entities in your view with open session in view pattern since it's simpler.
If your entities are used by many applications (i.e a backend application providing a service in your corporation) it'd be interesting to use DTOs since you would not expose your model to your clients. Exposing it could mean you would have a harder time refactoring your model since it could mean breaking contracts with your clients. A DTO would make that easier since you have another layer of
abstraction. This can be a bit strange since EJB3 would theorically eliminate the need of DTOs.

Hibernate transaction problem

We are using Hibernate Spring MVC with OpenSessionInView filter.
Here is a problem we are running into (pseudo code)
transaction 1
load object foo
transaction 1 end
update foo's properties (not calling session.save or session.update but only foo's setters)
validate foo (using hibernate validator)
if validation fails ?
go back to edit screen
transaction 2 (read only)
load form backing objects from db
transaction 2 end
go to view
else
transaction 3
session.update(foo)
transaction 3 end
the problem we have is if the validation fails
foo is marked "dirty" in the hibernate session (since we use OpenSessionInView we only have one session throughout the http request), when we load the form backing objects (like a list of some entities using an HQL query), hibernate before performing the query checks if there are dirty objects in the session, it sees that foo is and flushes it, when transaction 2 is committed the updates are written to the database.
The problem is that even though it is a read only transaction and even though foo wasn't updated in transaction 2 hibernate doesn't have knowledge of which object was updated in which transaction and doesn't flush only objects from that transaction.
Any suggestions? did somebody ran into similar problem before
Update: this post sheds some more light on the problem: http://brian.pontarelli.com/2007/04/03/hibernate-pitfalls-part-2/

You can run a get on foo to put it into the hibernate session, and then replace it with the object you created elsewhere. But for this to work, you have to know all the ids for your objects so that the ids will look correct to Hibernate.

There are a couple of options here. First is that you don't actually need transaction 2 since the session is open you could just load the backing objects from the db, thus avoiding the dirty check on the session. The other option is to evict foo from the session after it is retrieved and later use session.merge() to reattach it when you what your changes to be stored.
With hibernate it is important to understand what exactly is going on under the covers. At every commit boundary it will attempt to flush all changes to objects in the current session regardless of whether or not the changes where made in the current transaction or any transaction at all for that matter. This is way you don't actually need to call session.update() for any object that is already in the session.
Hope this helps

There is a design issue here. Do you think an ORM is a transparent abstraction of your datastore, or do you think it's a set of data manipulation libraries? I would say that Hibernate is the former. Its whole reason for existing is to remove the distinction between your in-memory object state and your database state. It does provide low-level mechanisms to allow you to pry the two apart and deal with them separately, but by doing so you're removing a lot of Hibernate's value.
So very simply - Hibernate = your database. If you don't want something persisted, don't change your persistent objects.
Validate your data before you update your domain objects. By all means validate domain objects as well, but that's a last line of defense. If you do get a validation error on a persistent object, don't swallow the exception. Unless you prevent it, Hibernate will do the right thing, which is to close the session there and then.

What about using Session.clear() and/or Session.evict()?

What about setting singleSession=false on the filter? That might put your operations into separate sessions so you don't have to deal with the 1st level cache issues. Otherwise you will probably want to detach/attach your objects manually as the user above suggests. You could also change the FlushMode on your Session if you don't want things being flushed automatically (FlushMode.MANUAL).

Implement a service layer, take a look at spring's #Transactional annotation, and mark your methods as #Transactional(readOnly=true) where applicable.
Your flush mode is probably set to auto, which means you don't really have control of when a DB commit happens.
You could also set your flush mode to manual, and your services/repos will only try to synchronize the db with your app when you tell them to.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.