Preface: I'm not a java developer.
I have a question about Tomcat / jBOSS and other java application servers. Where are sessions (session data) stored? In PHP, sessions are usually stored in the database which means you can easily share session data in a load balanced environment. In Tomcat and other application servers, session seem to be stored in memory by default which would not apply in a load balanced env. While it is true that PHP stores sessions in files by default, it takes a few lines to hook it up to a DB. Is the same true of applications servers?
Basically, what's the pros for storying sessions in memory? Is this still standard practice for application servers? Thanks all!
I have a question about Tomcat / jBOSS and other java application servers. Where are sessions (session data) stored?
By default, I'd say in memory. Details are actually... implementation details which are application server specific.
In PHP, sessions are usually stored in the database which means you can easily share session data in a load balanced environment. In Tomcat and other application servers, session seem to be stored in memory by default which would not apply in a load balanced env.
Well, not exactly. What this means is that a client request has to be sent to the same node in a clustered environment (this is referred to as "session stickiness") and this is not an issue from a load balancing point of view. But this is an issue from a failover point of view: in case of failure of a node in a cluster, session state managed by the node can be lost. To solve this, almost all application server providers implement session failover (using various mechanism such as in-memory replication, JDBC-based persistence, etc). But, again, implementation details are application server specific. See for example how Tomcat or WebLogic deal with that. The Under the Hood of J2EE Clustering article on The Server Side is a very interesting reading too.
While it is true that PHP stores sessions in files by default, it takes a few lines to hook it up to a DB. Is the same true of applications servers?
As I said, not all application servers will offer JDBC-based persistence. Having that said, and to answer your question, configuration is in general simple when they do. But using a database is really not the preferred solution (actually, I avoid it at all price).
Basically, what's the pros for storying sessions in memory? Is this still standard practice for application servers?
Simply: performance! Serializing data, calling a database, writing to disk, all this has a cost. In-memory replication obviously allows to avoid some overhead. But it has some limitation too. For example, it doesn't allow WAN HTTP Session State Replication with WebLogic. But well, only few people need this :)
With the provided Session Manager, the session is always in memory but it has a manager to persist the session to JDBC store,
http://tomcat.apache.org/tomcat-5.5-doc/config/manager.html
Unlike in PHP, the session is still accessed from memory and it's only persisted to DB when the memory limit is reached or at server shutdown. So your load-balancer must have sticky routing for this to work.
The benefit of in-memory session is performance because db access doesn't occur for every transaction.
You can write your own session manager to simulate the PHP behavior.
The JavaEE spec does not dictate this, it is up to the individual implementations to decide. For example, the usual way to handle load balancing in Tomcat is to use replicated sessions, where the session data is multicast between the nodes. Storing session data in the database is a huge performance killer, and while Tomcat may support it, I really wouldn't recommend it.
Related
I've heard the term "clustering" used for application servers like GlassFish, as well as with Terracotta; and I'm trying to understand what the word clustering implies when used in conjunction with application servers, and when used in conjunction with Terracotta.
My understanding is:
If a GlassFish server is clustered, then it means we have multiple physical/virtual machines, each with their own JRE/JVM running separate instances of GlassFish. However, since they are clustered, they will all communicate through their admin server ("DAS"), and have the same apps deployed to all of them. They will effectively act (to the end user) as if they are a single app server - but now with load balancing, failover/redundancy and scalability added into the mix.
Terracotta is, essentially, a product that makes multiple JVMs, running on different physical/virtual machines, act as if they are a single JVM.
Thus, if my understanding is correct, the following are implied:
You cluster app servers when you want load balancing and failover tolerance
You use Terracotta when any particular JVM is too small to contain your application and you need more "horsepower"
Thus, technically, if you have a GlassFish cluster of, say, 5 server instances; each of those 5 instances could actually be an array/cluster of Terracotta instances; meaning each GlassFish server instance is actually a GlassFish instance living across the JVMs of multiple machines itself
If any of these assertions/assumptions are untrue, please correct me! If I have gone way off-base and clearly don't understand clustering and/or the very purpose of Terracotta, please point me in the right direction!
Terracotta enables you to have a shared state across all your nodes (its stateful). Basically it creates a shared memory space between different JVM's. This is useful when nodes in a cluster all need access to the same objects.
If your application is stateless and you just need load balancing and fail over you can use a solution like JGroups. In this scenario each node just handles requests and has little idea about other nodes. Objects in memory are not shared across nodes and each JVM just runs on its own and has no idea about other JVM's. This often works nicely for request / response type applications. A webserver serving content (without sessions) does this for example.
Dealing with a stateless cluster is often simpler then dealing with a stateful cluster. This is because in a stateless cluster nodes know almost nothing about each other which results in less things that can go wrong.
GlassFish sits a bit in the middle of the above concepts. Objects in memory within GlassFish are visible to all nodes. However the frontend (HTTP connectors) work stateless.
So to answer your questions:
1) Yes, those are the two most obvious reasons. However sometimes people only want failover or only want load balancing or sometimes both. Not all clustering solutions fix both of these problems.
2) Yes. Altough technically speaking Terracotta only solves the shared memory part, not the CPU part. However by solving the memory part it automatically solves the CPU part since you can now just add JVM's to the shared memory space.
3) I don't know if thats practically possible but as a thought experiment; Yes.
Clustering can mean one of the following:
Multiple instances can be managed as one. Deploy an application to the cluster, it is deployed to all instances in the cluster. Make a configuration change, and that change will be pushed to all nodes in the cluster. GlassFish supports this out of the box.
Service Availability. If any one instance fails, the application is available on another instance. Without high availability enabled, any instance failure also results in session loss for any session being managed by that instance. GlassFish supports this out of the box.
High availability. If any one instance fails, the application is available on another instance, and there is no session loss because a session replica is also maintained on another instance. GlassFish supports this. You will have to choose either #2 or #3 in any one cluster.
What you are asking about IMHO is really #3, because it is the only real case where Terracotta - in the context of high availability clustering - will offer value w/GlassFish. GlassFish already offers built-in high availability, so there had better be a very good reason to add Terracotta to the solution because it will complicate the deployment architecture.
The primary reason I can think of adding Terracotta is that you may want to offload session management to a data grid and free up GlassFish to run business logic. This may be due to more frequent garbage collection or wanting to manage more users per GlassFish instance. However, I'm not sure that Terracotta can do this seamlessly. With GlassFish built-in HA clustering, replicating sessions is seamless (no application logic modifications). You may have to write code to put/get data from a Terracotta cache I'll let you research :-) Oracle GlassFish Server also integrates (seamlessly) with Coherence to solve this problem. You can separate session management into a Coherence data grid without modifying your application code.
Unless you know for a fact up front that your application must scale to a very large number of concurrent users, start with built-in HA clustering, run tests, and go from there.
Hope this helps.
Are there any differences implementing Flex application security in a clustered Java environment (such as Oracle Application Server/OC4J or a JBoss cluster) vs a single application server environment? (And/or does it depend on the specific environment software?)
What considerations are there in a situation where you need to authenticate with LDAP (AD) and store user access information in a database (ex. USER table containing username + permissions/roles info)?
Are sessions shared across nodes with no issues? Any differences between Blaze DS and Granite DS?
Yes, Blaze DS is a pain when it comes to clustering full stop. LCDS isn't much better, but it at least has more support for clustering (with the downside of being ridiculously expensive).
The problem is the JSESSIONID which the instance uses to identify the Flex client that's making the call. The associated Flex Session object isn't shared in the cluster by default, and IIRC, BlazeDS doesn't have any option for sharing, while LCDS has limited options... Sticky Sessions or port broadcasting.
I can't speak for any of the Open Source options, but clustering support is usually the purview of paid-for solutions...
I mean if I construct a heavy object [including collections and a bunch of props ] and I need to query that object from time to time during the session life, should I save it with setAttribute or do I need to persist it somewhere? What are best practices here?
If you want it cached, place it in an external cache (2nd level cache, memcached or another cache server), not in the session. The session should be kept as small as possible, as the server may serialise it to disk.
Let's assume your heavy object can be rebuilt from your persistence layer whenever you need to, and that this is purely an optimization. Then this really turns into an "it depends". If your web application runs on a single server and doesn't get much use, it doesn't much matter either way, so pick whatever solution seems simplest to you.
If you're running the web application on a whole cluster of application servers, with lots of usage, you generally want to avoid session state for scalability. But you can still cache these structures outside the app server cluster for performance (eg, in an HTTP cache or a memcached distributed cache).
Then there's a broad middle ground where the app server cluster can be run with sticky sessions (traffic for each session gets routed to the same app server) or with cluster support for sessions (it maintains the session data, and migrates it to whatever server needs it).
We have an infrastructure set up where in the webservers are clustered and the application servers are not. The webservers route the request to the application servers based on round-robin policy.
In this scenario, the session data available in one application server is not available in the other application server. Is there anyway by which the session data from first application server can be made available in the second application ? The two application servers are physically separate boxes in different cells.
One approach could be to use the database - is there any other means of accomplishing this session replication ?
In WebSphere there are essentially two ways to replicate session data:
Persisting to a database
Memory-To-Memory transfers
Which one is appropriate for your needs is highly dependent on your application scenario:
How important is the persistence of your session data, when all your application servers go down?
How many session objects do you have at any one time simultaneously?
In a DB you can store many sessions without much problems, the other option is always a question of how much memory is available.
I would go with the database, if you already got one set up, which all application servers use anyway.
Here is the link to the WebSphere Information Center with the necessary details.
One obvious solution is to enable clustering of your application servers. I assume from the way you worded your question you have rejected this option. Another option is to change the routing used by the web servers to use session affinity (requests for the same session go to the same app server).
Other that that, I'd second the answer by dertoni.
maybe you can look at 'terracota'. its an caching framework, which can cache sessions and runs on a seperate server
There are two options for clustering within WebSphere, session replication or database. If you have large session objects you are best off using database because it allows you to offload stale sessions to disk. If they are then represented then they can be extracted from the database, if you use session replication then those sessions need to stay in memory on not just your target server but also the other servers in the replication group. With large sessions this can lead to an out of memory condition.
With database session handling it is also very customisable and doesn't performance noticeably in the environments that I have used it.
don't forget oracle coherence.
Tomcat (version 5 here) stores session information in memory. When clustering this information is periodically broadcast to other servers in the cluster to keep things in sync. You can use a database store to make sessions persistant but this information is only written periodically as well and is only really used for failure-recovery rather than actually replacing the in-memory sessions.
If you don't want to use sticky sessions (our configuration doesn't allow it unfortunately) this raises the problem of the sessions getting out of sync.
In other languages, web frameworks tend to allow you to use a database as the primary session store. Whilst this introduces a potential scaling issue it does make session management very straightforward. I'm wondering if there's a way to get tomcat to use a database for sessions in this way (technically this would also remove the need for any clustering configuration in the tomcat server.xml).
There definitely is a way. Though I'd strongly vote for sticky sessions - saves so much load for your servers/database (unless something fails)...
http://tomcat.apache.org/tomcat-5.5-doc/config/manager.html has information about SessionManager configuration and setup for Tomcat. Depending on your exact requirements you might have to implement your own session manager, but this starting point should provide some help.
Take a look at Terracotta, I think it can address your scaling issues without a major application redesign.
I've always been a fan of the Rails sessions technique: store the sessions (zipped+encrypted+signed) in the user's cookie. That way you can do load balancing to your hearts content, and not have to worry about sticky sessions, or hitting the database for your session data, etc. I'm just not sure you could implement that easily in a java app without some sort of rewriting of your session-access code. Anyway just a thought.
Another alternative would be the memcached-session-manager, a memcached based session failover and session replication solution for tomcat 6.x / 7.x. It supports both sticky sessions and non-sticky sessions.
I created this project to get the best of performance and reliability and to be able to scale out by just adding more tomcat and memcached nodes.