Java web - sessions design for horizontal scalability

Java web - sessions design for horizontal scalability - java

I'm definitely not an expert Java coder, I need to implement sessions in my java Servlet based web application, and as far as I know this is normally done through HttpSession. However this method stores the session data in the local filesystem and I don't want to have such constraints to horizontal scalability. Therefore I thought to save sessions in an external database to which the application communicates through a REST interface.
Basically in my application there are users performing some actions such as searches. Therefore what I'm going to persist in sessions is essentialy the login data, and the meta data associated to searches.
As the main data storage I'm planning to use a graph noSQL database, the question is: let's say I can eventually also use another database of another kind for sessions, which architecture fits better for this kind of situation?
I currently thought to two possible ways. the first one uses another db (such as an SQL db) to store sessions data. In this way I would have a more distributed workload since I'm not using the main storage also for sessions. Moreover I'd also have a more organized environment being session state variables and persisten ones not mixed up.
The second way instead consists in storing every information relative to any session into the "user node" of the main database. The sessionid will be at this point just a "shortcut" for an authentication. This way I dont have to rely on a second database, however I move all the workload to the main db mixing the session data with the persistent ones.
is there any standard general architecture to which I can ake reference? DO I miss some important point which should constraint my architecture?

Your idea to store sessions in a different location is good. How about using an in-memory cache like memcached or redis? Session data is generally not long-lived so you have other options other than a full-blown database. Memcached & Redis can both be clustered and can scale horizontally.

Related

Multiple independent H2 databases within one JVM

Is it possible to start up and shut down multiple H2 databases within a JVM?
My goal is to support multi-tenancy by giving each user/account their own database. Each account has very little data. Data between the accounts is never accessed together, compared, or grouped; each account is entirely separate from the others. Each account is only accessed briefly once a day or a few times a month. So there are few upsides to housing the data together in a single database, and some serious downsides.
So my idea is that when a user logs in for a particular account, that account’s database is loaded. When that user logs out, or their web app session (Vaadin app) times out, that account’s database is closed, it's data flushed to storage, and possibly a backup performed. This opening and closing would be happening for any number of databases in parallel.
Benefits include minimizing the amount of memory in use at any one time for caching data and indexes, minimizing locking and other contention, and allowing for smooth scaling.
I'm new to H2, so I'm not sure if its architecture can support this. I'm asking for a denial or confirmation of this capability, along with any tips or caveats.

Yes it is possible to do so. Each database will contain its own mini environment, no possible pollution between databases.
You could for example use a jdbc url based on the user id or login from the user:
jdbc:h2:user1 in H2 1.3.x embedded mode
jdbc:h2:./user1 in H2 1.4.x embedded mode
jdbc:h2:tcp://localhost/user1 in tcp mode
You can use any naming convention for the database name, provided your OS allows it: user1, user2, etc... or truly the name of the login.
Tips:
use the server mode rather than the embedded mode, allowing for same user multiple connections from multiple sessions/hosts
have a schema migrator (like flyway) to initialize each newly created db
ensure you manage name collisions at the top level of your app, and possibly store these databases and corresponding logins in a dedicated database as well
Caveats:
do not use a connection pool as connections will be difficult to reuse
You must make sure IFEXISTS=TRUE is not used on the server
avoid using tweaks on the jdbc url, like turning LOG=0, UNDO_LOG=0, etc...
I do not know if you'll have a limitation from your OS or the JVM on how many db files could be opened like this.
I do not know if such setting can be tweaked from the manual pages. I could not find one.
Please refer to H2 manual in doubts of url parameters.

any benefit from using Hazelcast instead of MongoDB to store user sessions/keys?

We are running mongodb instance to store data in a collections, no problems with it and mongo is our main data storage.
Today, we are going to develop Oauth2 support for the product and have to store the user sessions (security key, access token and etc.. ) and the access token have to be validated against the authentication server only after the defined timeout so that not every request will wait for validation by authentication server.
First request for secured resource (create) shall always be authenticated against the authentication server. Any subsequent request will be validated internally (cache) and check the internal timeout and only if expired, another request to the authentication server will be issued.
To solve that requirements, we have to introduce some kind of a distributed cache, to store (with TTL support) the user sessions and etc, expire it based on a ttl.. .i wrote about that above.
Two options here:
store user session in the hazelcast and share it across all App servers - nice choice, to persists all user session in eviction map.
store user sessions in MongoDb - and do the same.
Do you see any benefits of using Hazelcast instead of storing the temp data inside Mongo? Any significant performance improvements you're aware of ?
I'm new to Hazelcast, so don't aware about all killer features.

Disclaimer: I am the founder of Hazelcast...
Hazelcast is much simpler and simplicity matters a lot.
You can embed Hazelcast into your application (if your application is
written in Java). No need to deploy and maintain remote nosql
cluster.
Hazelcast works directly with your application objects. No
JSON or any other format. Write and read java objects.
You can
execute Java code on your in-memory data. No need to fetch and
process data; send your code over to the data.
You can listen for
the updates on your data. "Notify me when this map or key is
updated".
Hazelcast has rich set of data structures like queue,
topic, semaphores, locks, multimap etc. Imagine sharing a queue
across multiple nodes and be able to do blocking queue poll/take
operation... this is really cool :)

Hazelcast is an in-memory grid so it should be significantly faster than MongoDB for that kind of usage. They also have pre-made session clustering code for Java servlets if you do not want to create that yourself.
Code for the session clustering here on github. Or here for Maven artifact.

Collection stored in application session turns null

I built a server-client android application in which users are registering to server and every user profile is saved in DB and in collection pool that i'm saving in the application session.
The reason I'm saving all the existing users in the session is that the users is updated often about the other users. and reading from Google app engine data store that often will cost me a lot.
this worked OK for the first month but today that I have 1000 profiles on my session things started turn bad. all of a sudden the collection of profiles became null and I had to "reboot" the server for the profiles to load again to session.
I guess that my design of saving all the profile in the session is wrong. Am I right?
or something else might caused the null collection ?
Is there any design pattern I can use to do things right?

I don't know the internals of sessions and how they are handled in GAE well enough but my first guess is that the session gets too big and it fails to serialize properly between the requests.
My suggestion is to use the MemCache for storing the list of users. A typical pattern would be to store the tables you need in a database (that would include the connection pool per user you mentioned). Then, on first creation of a session or whenever you actually need it, load the data from the DB into the MemCache.
Once it's cached you can use it directly without additional database queries. If done right this is very robust and shouldn't get you in performance trouble either. Just keep in mind that the MemCache can be purged by the system randomly so you need to ensure you load it from the DB if the MemCache hasn't been initialized or reset.

Implementing multi-tenancy for a mature enterprise application

I've been tasked with making an enterprise application multi-tenant. It has a Java/Glassfish BLL using SOAP web services and a PostgreSQL backend. Each tenant has its own database, so (in my case at least) "multi-tenant" means supporting multiple databases per application server.
The current single-tenant appserver initializes a C3P0 connection pool with a connection string that it gets from a config file. My thinking is that now there will need to be one connection pool per client/database serviced by the appserver.
Once a user is logged in, I can map it to the right connection pool by looking up its tenant. My main issue is how to get this far - when a user is first logged in, the backend's User table is queried and the corresponding User object is served up. It seems I will need to know which database to use with only a username to work with.
My only decent idea is that there will need to be a "config" database - a centralized database for managing tenant information such as connection strings. The BLL can query this database for enough information to initialize the necessary connection pools. But since I only have a username to work with, it seems I would need a centralized username lookup as well, in other words a UserName table with a foreign key to the Tenant table.
This is where my design plan starts to smell, giving me doubts. Now I would have user information in two separate databases, which would need to be maintained synchronously (user additions, updates, and deletions). Additionally, usernames would now have to be globally unique, whereas before they only needed to be unique per tenant.
I strongly suspect I'm reinventing the wheel, or that there is at least a better architecture possible. I have never done this kind of thing before, nor has anyone on my team, hence our ignorance. Unfortunately the application makes little use of existing technologies (the ORM was home-rolled for example), so our path may be a hard one.
I'm asking for the following:
Criticism of my existing design plan, and suggestions for improving or reworking the architecture.
Recommendations of existing technologies that provide a solution to this issue. I'm hoping for something that can be easily plugged in late in the game, though this may be unrealistic. I've read about jspirit, but have found little information on it - any feedback on it or other frameworks will be helpful.
UPDATE: The solution has been successfully implemented and deployed, and has passed initial testing. Thanks to #mikera for his helpful and reassuring answer!

Some quick thoughts:
You will definitely need some form of shared user management index (otherwise you can't associate a client login with the right target database instance). However I would suggest making this very lightweight, and only using it for initial login. Your User object can still be pulled from the client-specific database once you have determined which database this is.
You can make the primary key [clientID, username] so that usernames don't need to be unique across clients.
Apart from this thin user index layer, I would keep the majority of the user information where it is in the client-specific databases. Refactoring this right now will probably be too disruptive, you should get the basic multi-tenant capability working first.
You will need to keep the shared index in sync with the individual client databases. But I don't think that should be too difficult. You can also "test" the synchronisation and correct any errors with an batch job, which can be run overnight or by your DBA on demand if anything ever gets out of sync. I'd treat the client databases as the master, and use this to rebuild the shared user index on demand.
Over time you can refactor towards a fully shared user management layer (and even in the end fully shared client databases if you like. But save this for a future iteration.....

Session replication across JVMs in WebSphere

We have an infrastructure set up where in the webservers are clustered and the application servers are not. The webservers route the request to the application servers based on round-robin policy.
In this scenario, the session data available in one application server is not available in the other application server. Is there anyway by which the session data from first application server can be made available in the second application ? The two application servers are physically separate boxes in different cells.
One approach could be to use the database - is there any other means of accomplishing this session replication ?

In WebSphere there are essentially two ways to replicate session data:
Persisting to a database
Memory-To-Memory transfers
Which one is appropriate for your needs is highly dependent on your application scenario:
How important is the persistence of your session data, when all your application servers go down?
How many session objects do you have at any one time simultaneously?
In a DB you can store many sessions without much problems, the other option is always a question of how much memory is available.
I would go with the database, if you already got one set up, which all application servers use anyway.
Here is the link to the WebSphere Information Center with the necessary details.

One obvious solution is to enable clustering of your application servers. I assume from the way you worded your question you have rejected this option. Another option is to change the routing used by the web servers to use session affinity (requests for the same session go to the same app server).
Other that that, I'd second the answer by dertoni.

maybe you can look at 'terracota'. its an caching framework, which can cache sessions and runs on a seperate server

There are two options for clustering within WebSphere, session replication or database. If you have large session objects you are best off using database because it allows you to offload stale sessions to disk. If they are then represented then they can be extracted from the database, if you use session replication then those sessions need to stay in memory on not just your target server but also the other servers in the replication group. With large sessions this can lead to an out of memory condition.
With database session handling it is also very customisable and doesn't performance noticeably in the environments that I have used it.

don't forget oracle coherence.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.