too much information in HttpSession

too much information in HttpSession - java

Hi what do you think about this problem?
We do have too much information in HttpSession, because much information is computed and a few large graph of objects are needed to store between requests finally.
Is it appropriate to use any cache like memcache or so? Or is it the same as increasing memory for JVM?
There's fear of storing it in DB between requests. What would you use if we are getting
OutOfMemory error?
Thank you.

I think the real point is the lifespan of your data.
Think about these two characteristics of the HttpSession:
When in a cluster, the container is responsible for replicating the HttpSession. This is good (you don't have to manage this yourself), but can be dangerous in terms of performance if this leads to too much exchanges... If your application is not clustered, forget about this point.
The lifespan of the HttpSession can be a few minutes or a few hours, that is while the user keeps active. This is perfect for information that has that lifespan (connection information, preferences, authorizations...). But it is not appropriate for data that is useful from one screen to the next, let's call it transient+ data.
If you have clustering needs, the database takes care of it. But beware, you can't cache anything in memory then.
Storing in the database has even longer lifespan (persistent between session, and even between reboots!), so the problem would be even worth (except you trade a memory problem for a performance problem).
I think this is the wrong approach for data whose lifespan is not expected to be persistent ...
Transient data
If data is useful only for one request, then it is typically stored in the HttpRequest, fine.
But if it is used for a few requests (interactions within one screen, or within a screen sequence like an assistant ..), the HttpRequest is too short to store it, but the HttpSession is too long. The data needs to be cleaned regularly.
And many memory problems in the HttpSession are related to such data that is transient but was not cleaned (forgotten at all, or not cleaned when an Exception, or when the user doesn't respect the regular flow: hits Back, use a previous bookmark, clic on a different menu or whatever).
Caching library to have the correct lifespan
To avoid this cleaning effort altogether (and avoid the risks of OutOfMemory when things go wrong), you can store information in a data structure that has the right lifespan. As the container doesn't provide this (it is application-related anyway), you need to implement this yourself using a cache library (like the ones mentioned; we use EhCache).
The idea is that you have a technical code (not related to one functional page, but implemented globally, such as with a ServletFilter ...) that ensures cleaning is always done after the objects are not needed any more.
You can design this cache using one (or several as needed) of the following policies for cleaning the cache. Each policy related to a functional lifespan:
for data only related to one screen (but several requests : reloading of the screen, Ajax requests ...), the cache can store data only for one screen at a time (for each session), call it "currentScreenCache". That guarantees that, if the user goes to another screen (even in an unmanaged way), the new screen will override the "currentScreenCache" information, and the previous information can be garbage-collected.
Implementation idea: each request must carry its screenId, and the technical code responsible for clearing the cache detects when, for the current HttpSession id, the current screenId doesn't match the one in the cache. Then it cleans or resets that item in the cache.
for data only used in a series of connected screens (call it a functional module), the same applies at the level of the module.
Implementation: same as before, every request has to carry the module id...
for data that is expensive to recompute, the cache library can be configured to store the last X computed ones (the previous ones are considered less-likely to be useful in the near-future). In typical usage, the same ones are asked for regularly, so you have many cache hits. On intensive use, the X limit is reached and the memory doesn't inflate, preventing OutOfMemory errors (at the expense of re-computation the next time).
Implementation: cache libraries support natively this limiting factor, and several more...
for data that is only valid for a few minutes, the cache library can natively be configured to discard it after that delay...
... many more, see the caching library configuration for other ideas.
Note: Each cache can be application-wide, or specific to a user, a HttpSession id, a Company id or other functional value...

It's true that HttpSession doesn't scale well but that's mainly in relation to clustering. It's a convenience but at some point yes you are better off using something like memcache or Terracotta or EHCache to persist data between requests (or between users).

Related

Is it worth trying not to hit database as much as possible?

The data in db changes every 14 seconds but there may be many calls from clients to fetch same data in that duration. So, in Servlet I implemented a logic like
if(fetchedDataMoreThan3SecondsAgo)
/*a servlet field */ lastFetchedData=fetchData();
return lastFetchedData;
But when I measured, fetching data from db only takes few milliseconds. So the thing I did is probably already done by MySql.
Is this an unnecasary optimization? Because with my "optimazation",in some rare cases a client may recieve data 2-3 seconds longer than it should.

From architecture point of view in your case caching is good idea - you have data which is not changing constantly, does not make sense to bother the DB - is expensive resource so caching makes a lot of sense.
This is very important regarding scalability - if you have hundreds requests is OK, but what if your app gets millions of requests, this could lead to many problems without caching.
Implementing cache properly is good question, the code which you provided is the simplest, I would probably implement separate caching mechanism which will take care about updating in separate thread and share it on application level - servlets dont need to care about update period.
Actually you can determine in your app when the changing happened - first compare new data and old data with cache turned off, and when you see change you can start waiting for 13 seconds, then turn cache off until you see data change again. In this case you wouldnt have wrong data.

Is it a bad idea to keep an EntityManager open for the duration of the application lifetime?

I am writing an application in Java SE 8 and have recently migrated the database system from raw JDBC code to JPA. The interface itself is so much simpler, but I am running into an issue with the way I have designed my code which does not work well with JPA and I am unsure of how to proceed.
The primary issue I am having is that I cannot store references to my entities in code for any period of time anymore, because they immediately become out-of-date. I used to have a central persistence context where the one "true" instance of all my entities were always stored in code, and changes made to them would always be reflected everywhere because there were no duplicate instances. I realize this is not smart design when it comes to memory efficiency, but that allowed me to, for instance, implement the observer pattern and guarantee that any entity updates would be immediately visible in GUIs. But now, as soon as I load an entity from the database using JPA and close the EntityManager (as I have read so often that you must do), that instance merely represents a snapshot in time from when it was loaded and my GUIs will be waiting for updates from a dead object. Loading that entity from elsewhere in the code and making a change will do nothing, as it is a different instance altogether, with an empty list of subscribers (transient). There are a lot more cases in my code where I attempt to hold a reference to an entity for whatever purpose, and a lot of them rely on those entities being up-to-date.
I know that EntityManager is intended to be a short-lived object, but now I am thinking that it maybe wouldn't be such a bad idea after all to keep an EntityManager open for the lifetime of my program to replace that construct that I had in my old code. I quite frankly don't understand what the point of closing EntityManager so quickly is - isn't it beneficial to have your entities managed over a longer period of time? When I was first reading about how changes to managed entities are detected and persisted automatically, I hoped that that would allow me to completely detach my business logic from my persistence layer, and trust that all my changes were being saved. It was rather disillusioning to discover that in order for those entities to be managed in the first place, I would have to leave the EntityManager open for the duration of that business logic. And that would require them to be scoped higher than the method they are created in, so I could close them later. But all the literature implores the use of short-lived, low-scoped EntityManagers, which just seems like a direct contradiction.
I am somewhat at a loss for how to proceed. I would love to make full use of JPA and all of its extremely useful features, but I feel like I might be missing the point of EntityManager being short-lived. It seems like it would be so much more convenient long-lived. Can anyone give me some guidance?

Your central 'cache' with a single instance of data is a common idea, but it is difficult to manage. Some orm/JPA providers have caching built in and maintain something similar (check out EclipeLink's shared cache) but they usually have complex mechanisms that allow for limiting and managing what could be endless amounts of data that can quickly become stale. EclipseLink has tie ins to the database to get notifications when data changes, and can be configured for cache coordination when being run in different servers. Without such capabilities, your cache will be stale - and worse, your cache will have great difficulty maintaining transactional isolation. Any change to those cached objects is immediately visible to all processes, regardless of the transaction going through to the database or rolling back. Use of JPA is meant to guarantee that you only see committed data (excluding the changes you've made in the current transaction/unit of work).
To answer your specific question about keeping an EM open as generally to JPA providers: EntityManagers keep hooks to the entities read in through them so that they can track and manage all changes made to them. This can lead to very large amounts of data being held - check the forum for memory leak questions, as keeping EMs open for an extended period is the cause of quite a few. You gain object identity, but have to realize it comes at the cost of tracking everything read in through them - so you will likely have to occasionally clear the memory (em.clear()) at some key points, or find provider specific mechanics to dereference what it might be holding onto so GC can do its thing.
Other draw backs are that the EntityManager then itself becomes very large and difficult to merge changes into. Depending on how you merge changes into your app, you'll need a way to get those changes into your database. Having JPA go through very large sets of entities that builds over time to find changes to a small dataset is very inefficient, and you'll still have to find ways to refresh these entities if change are done through other entityManagers or applications.

Design ideas about application level context cache (similar to session/application context in a web)

We have an web application which involves creating some heavy Utility Objects (in terms of memory performance). Its usage can be on any application tier. These utility objects are specific to user. So ideally what should have been happening is create these objects when the user logs in, cache them 'somewhere' and reuse them wherever needed.
The available options right now are Session,Application. But these are not available to all the tiers. One way is to pass these to subsequent tiers. But this will violate the Separation of Concerns approach and other tier will need to know about web tier.
Another approach is t use a static utility class to cache these objects. Something like
MyUtilObject myObject = MyUtilCache.getMyUtilObject(userName);
Internally, backed up by something like a HashMap (and possibly a soft reference). These objects would be cleaned on user logout or session expiry.
Here is what we are using
JBoss, Struts1.2, Spring. All the tiers on on the same machine (in single runtime).
Please share your thoughts/approaches on this.

All you need is an interface that is common to all tiers. The implementation can be backed by the Session and injected wherever it is required.

creating some heavy Utility Objects
Explain in details when user logs in , what kind of heavy objects are stored over a period of time?
It is always good to have smaller datain HttpSession.Typical web application session lasts for seconds or minutes for an user. Mostly it should contain authorization, user preferences, connection infos and states .it should not contain data , that is useful from current page to next. Of course you can store too much data , for session heap is the limit. But there could be race condition if you store for long time. If the data is required for this request, then keep in HttpRequest. Otherwise you have 2nd level caches such as EHCACHE,*MEMCACHE* or TerraCotta,apply policies such as till the session expires , make this cache available for all the pages. Also between each pages pass a unique id , that determines cache info.
Another approach is t use a static utility class to cache these
objects
Static class is common to all users. How can you have static class specific to user.

Why not Serialize/De-serialize the objects as and when required. When the Objects are required at that point De-serialize them and at session close serialize them again

Retrieve all essential data on startup

In Java Web Application, i would like to know if it is a proper (or "standard"?) way that all the essential data such as the config data, message data, code maintenance data, dropdown option data and etc (assuming all data will not updated frequently) are loaded as a "static" variables from database when the server startup.Or is it more preferred way to retrieve data by querying db per request?
Thanks for all your advice here.

It is perfectly valid to pull out all the data that are not going to be modified during application life-cycle into and keep it in memory as singleton or something.
This is a good idea because it saves DB hits and retrieval is faster. A lot of environment specific settings and other data can also be pulled once and kept in an immutable hashmap for any future request.
In a common web-app you generally do not have so many config data/option objects that can eat up lot of memory and cause OOM. But, if you have a table with hundreds of thousands of config data, better assume pulling objects as and when requested. And if you do want to keep it in memory, think of putting this in some key-value store like MemcacheD.

We used DB to store config values and ehcache to avoid a lot of DB hits. This way you don't need to worry about memory consumption (it will use whatever memory you have).
EhCache is one of many available DB cache solution and can be configured on top of JPA etc.
You can configure ehcache (or many other cache providers) to deem the tables read-only, in which case it will only go to the DB if it's explicitly told to invalidate the cache. This performs pretty well. The overhead becomes visible though when the read occurs very frequently (like 100/sec), but usually storing the config value in a local variable and avoiding reading inside loops, passing it on through the method stack during the invocation mitigates this well enough.
Storing values in a Singleton as java objects performs the best, but if you want to modify these without app. start up, it becomes a little bit involved.
Here is a simple way to achieve dynamic configuration with Java objects:
private volatile ImmutableMap<String,Object> param_value
Basically you'll have to start thinking about multi-threaded access, and memory issues (while it's quite unlikely that you'll run out of memory because of configuration values, unless you have binary data as config values etc.).
In essence, I'd recommend using the DB and some cache provider unless that part of code really needs high-performance.

The best place to store large data retrieved by a java servlet (Tomcat)

I have the java servlet that retrieves data from a mysql database. In order to minimize roundtrips to the database, it is retrieved only once in init() method, and is placed to a HashMap<> (i.e. cached in memory).
For now, this HashMap is a member of the servlet class. I need not only store this data but also update some values (counters in fact) in the cached objects of underlying hashmap value class. And there is a Timer (or Cron task) to schedule dumping these counters to DB.
So, after googling i found 3 options of storing the cached data:
1) as now, as a member of servlet class (but servlets can be taken out of service and put back into service by the container at will. Then the data will be lost)
2) in ServletContext (am i right that it is recommended to store small amounts of data here?)
3) in a JNDI resource.
What is the most preferred way?

Put it in ServletContext But use ConcurrentHashMap to avoid concurrency issues.

From those 3 options, the best is to store it in the application scope. I.e. use ServletContext#setAttribute(). You'd like to use a ServletContextListener for this. In normal servlets you can access the ServletContext by the inherited getServletContext() method. In JSP you can access it by ${attributename}.
If the data is getting excessive large that it eats too much of Java's memory, then you should consider a 4th option: use a cache manager.

The most obvious way would be use something like ehcache and store the data in that. ehcache is a cache manager that works much like a hash map except the cache manager can be tweaked to hold things in memory, move them to disk, flush them, even write them into a database via a plugin etc. Depends if the objects are serializable, and whether your app can cope without data (i.e. make another round trip if necessary) but I would trust a cache manager to do a better job of it than a hand rolled solution.

If your cache can become large enough and you access it often it'll be reasonable to utilize some caching solution. For example ehcache is a good candidate and easily integrated with Spring applications, too. Documentation is here.
Also check this overview of open-source caching solutions for Java.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.