I've heard the term "clustering" used for application servers like GlassFish, as well as with Terracotta; and I'm trying to understand what the word clustering implies when used in conjunction with application servers, and when used in conjunction with Terracotta.
My understanding is:
If a GlassFish server is clustered, then it means we have multiple physical/virtual machines, each with their own JRE/JVM running separate instances of GlassFish. However, since they are clustered, they will all communicate through their admin server ("DAS"), and have the same apps deployed to all of them. They will effectively act (to the end user) as if they are a single app server - but now with load balancing, failover/redundancy and scalability added into the mix.
Terracotta is, essentially, a product that makes multiple JVMs, running on different physical/virtual machines, act as if they are a single JVM.
Thus, if my understanding is correct, the following are implied:
You cluster app servers when you want load balancing and failover tolerance
You use Terracotta when any particular JVM is too small to contain your application and you need more "horsepower"
Thus, technically, if you have a GlassFish cluster of, say, 5 server instances; each of those 5 instances could actually be an array/cluster of Terracotta instances; meaning each GlassFish server instance is actually a GlassFish instance living across the JVMs of multiple machines itself
If any of these assertions/assumptions are untrue, please correct me! If I have gone way off-base and clearly don't understand clustering and/or the very purpose of Terracotta, please point me in the right direction!
Terracotta enables you to have a shared state across all your nodes (its stateful). Basically it creates a shared memory space between different JVM's. This is useful when nodes in a cluster all need access to the same objects.
If your application is stateless and you just need load balancing and fail over you can use a solution like JGroups. In this scenario each node just handles requests and has little idea about other nodes. Objects in memory are not shared across nodes and each JVM just runs on its own and has no idea about other JVM's. This often works nicely for request / response type applications. A webserver serving content (without sessions) does this for example.
Dealing with a stateless cluster is often simpler then dealing with a stateful cluster. This is because in a stateless cluster nodes know almost nothing about each other which results in less things that can go wrong.
GlassFish sits a bit in the middle of the above concepts. Objects in memory within GlassFish are visible to all nodes. However the frontend (HTTP connectors) work stateless.
So to answer your questions:
1) Yes, those are the two most obvious reasons. However sometimes people only want failover or only want load balancing or sometimes both. Not all clustering solutions fix both of these problems.
2) Yes. Altough technically speaking Terracotta only solves the shared memory part, not the CPU part. However by solving the memory part it automatically solves the CPU part since you can now just add JVM's to the shared memory space.
3) I don't know if thats practically possible but as a thought experiment; Yes.
Clustering can mean one of the following:
Multiple instances can be managed as one. Deploy an application to the cluster, it is deployed to all instances in the cluster. Make a configuration change, and that change will be pushed to all nodes in the cluster. GlassFish supports this out of the box.
Service Availability. If any one instance fails, the application is available on another instance. Without high availability enabled, any instance failure also results in session loss for any session being managed by that instance. GlassFish supports this out of the box.
High availability. If any one instance fails, the application is available on another instance, and there is no session loss because a session replica is also maintained on another instance. GlassFish supports this. You will have to choose either #2 or #3 in any one cluster.
What you are asking about IMHO is really #3, because it is the only real case where Terracotta - in the context of high availability clustering - will offer value w/GlassFish. GlassFish already offers built-in high availability, so there had better be a very good reason to add Terracotta to the solution because it will complicate the deployment architecture.
The primary reason I can think of adding Terracotta is that you may want to offload session management to a data grid and free up GlassFish to run business logic. This may be due to more frequent garbage collection or wanting to manage more users per GlassFish instance. However, I'm not sure that Terracotta can do this seamlessly. With GlassFish built-in HA clustering, replicating sessions is seamless (no application logic modifications). You may have to write code to put/get data from a Terracotta cache I'll let you research :-) Oracle GlassFish Server also integrates (seamlessly) with Coherence to solve this problem. You can separate session management into a Coherence data grid without modifying your application code.
Unless you know for a fact up front that your application must scale to a very large number of concurrent users, start with built-in HA clustering, run tests, and go from there.
Hope this helps.
Related
I'm using Terracotta Enterprise Ehcache along with a Java application, but at some points of the day the Terracotta starts to take too much time to answer put/get requests, sometimes locking client's threads and launching exceptions.
My infrastructure is composed by a cluster of 5 JBoss servers 6.2.0 and another cluster with 4 Terracotta Enterprise Ehcache 3.7.5 that stores a large amount of data.
The application does around 10 million accesses to the Terracotta Ehcache per day.
Originally I used criteria, but, when the problems started, I changed everything to use id searches only.
I tried to change the DGC interval, making it run more often or even only once a day, it didn't get any better.
I started with the persistence mode permanent-store and tried to change to temporary-swap-only, but the problem continues.
Tried to change the Terracotta cluster to work with 2 actives machines and 2 passives or 4 actives.
Tried to config my caches as eternal true or false.
All my caches are nonstop and I tried to use the timeoutBehavior as exception or noop.
Basically none of my efforts seems to produce any significant change and the Terracotta continues to enter in this state where it can't answer the requests anymore.
Right now the only thing that seems to "solve" the problem is to restart all the clients.
Does anybody have a similar scenario using Terracotta, with this kind of throughput? Any ideas for where to look now?
Yes i faced a similar issue of thread contention on terracota cluster setup. The slaves requests for get/put used to take time and a thread dump showed locking as the main reason. I dont remember the details as it was more than 4-6 months back. I had 2 options then:
Create an own cache server which would be a custom war and would run ehcache underneath and expose my own put, get, delete etc operations as REST endpoints.
Use cache replication as ehcache provides.
I first tried with replication suing RMI and then with JGroups. RMI based approach worked excellently and was much stable so I decided to move to RMI based replication which ehcache provides OOTB. My setup was to use ehcache as a cache provider for hibernate based JPA and RMI absed solution worked very well and effectively. It is intelligent enough to see when the other servers in cluster go down and when it comes up. Replication is async and transparent. Since the second approach worked well I didnt try out the first one.
I have recently started taking a look into Infinispan as our caching layer. After reading through the operation modes in Infinispan as mentioned below.
Embedded mode: This is when you start Infinispan within the same JVM as your applications.
Client-server mode: This is when you start a remote Infinispan instance and connect to it using a variety of different protocols.
Firstly, I am confuse now which will be best suited to my application from the above two modes.
I have a very simple use case, we have a client side code that will make a call to our REST Service using the main VIP of the service and then it will get load balanced to individual Service Server where we have deployed our service and then it will interact with the Cassandra database to retrieve the data basis on the user id. Below picture will make everything clear.
Suppose for example, if client is looking for some data for userId = 123 then it will call our REST Service using the main VIP and then it will get load balanced to any of our four service server, suppose it gets load balanced to Service1, and then service1 will call Cassandra database to get the record for userId = 123 and then return back to Client.
Now we are planning to cache the data using Infinispan as compaction is killing our performance so that our read performance can get some boost. So I started taking a look into Infinispan and stumble upon two modes as I mentioned below. I am not sure what will be the best way to use Infinispan in our case.
Secondly, As from the Infinispan cache what I will be expecting is suppose if I am going with Embedded Mode, then it should look like something like this.
If yes, then how Infinispan cache will interact with each other? It might be possible that at some time, we will be looking for data for those userId's that will be on another Service Instance Infinispan cache? Right? So what will happen in that scenario? Will infinispan take care of those things as well? if yes, then what configuration setup I need to have to make sure this thing is working fine.
Pardon my ignorance if I am missing anything. Any clear information will make things more clear to me to my above two questions.
With regards to your second image, yes, architecture will exactly look like this.
If yes, then how Infinispan cache will interact with each other?
Please, take a look here: https://docs.jboss.org/author/display/ISPN/Getting+Started+Guide#GettingStartedGuide-UsingInfinispanasanembeddeddatagridinJavaSE
Infinispan will manage it using JGroups protocol and sending messages between nodes. The cluster will be formed and nodes will be clustered. After that you can experience expected behaviour of entries replication across particular nodes.
And here we go to your next question:
It might be possible that at some time, we will be looking for data for those userId's that will be on another Service Instance Infinispan cache? Right? So what will happen in that scenario? Will infinispan take care of those things as well?
Infinispan was developed for this scenario so you don't need to worry about it at all. If you have for example 4 nodes and setting distribution mode with numberOfOwners=2, your cached data will live on exactly 2 nodes in every moment. When you issue GET command on NON owner node, entry will be fetched from the owner.
You can also set clustering mode to replication, where all nodes contain all entries. Please, read more about modes here: https://docs.jboss.org/author/display/ISPN/Clustering+modes and choose what is the best for your use case.
Additionally, when you add new node to the cluster there will StateTransfer take place and synchronize/rebalance entries across cluster. NonBlockingStateTransfer is implemented already so your cluster will be still capable of serving responses during that joining phase. See: https://community.jboss.org/wiki/Non-BlockingStateTransferV2
Similarly for removing/crashing nodes in your cluster. There will be automatic rebalancing process so for example some entries (numOwners=2) which after crash live only at one node will be replicated respectively to live on 2 nodes according to numberOfOwners property in distribution mode.
To sum it up, your cluster will be still up to date and this does not matter which node you are asking for particular entry. If it does not contain it, entry will be fetched from the owner.
if yes, then what configuration setup I need to have to make sure this thing is working fine.
Aforementioned getting started guide is full of examples plus you can find some configuration file examples in the Infinispan distribution: ispn/etc/config-samples/*
I would suggest you to take a look at this source too: http://refcardz.dzone.com/refcardz/getting-started-infinispan where you can find even more basic and very quick configuration examples.
This source also provides decision related information for your first question: "Should I use embedded mode or remote client-server mode?" From my point of view, using remote cluster is more enterprise-ready solution (see: http://howtojboss.com/2012/11/07/data-grid-why/). Your caching layer is very easily scalable, high-available and fault tolerant then and is independent of your database layer and application layer because it simply sits between them.
And you could be interested about this feature as well: https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores
I think in newest Infinispan release supports to work in a special, compatibility mode for those users interested in accessing Infinispan in multiple ways .
follow below link to configure your cache environment to support either embedded or remotely.
Interoperability between Embedded and Remote Server Endpoints
Is it possible to form a cluster in which there are different types of application servers? For instance, 1 JBoss, 1 Glassfish and 1 WebSphere? Lets assume we are using EJB3.0.
Stateless session beans should be relatively easy and simple load balancing among the instances should do the work, but what about SFSBs and session replication? Is it possible to utilize some cache storage like infinispan for it?
I would appreciate any comments or sharing your experience on this topic.
I assume it may be possible if you use some application server agnostic solution like Hazelcast. According to its documentation it's pretty easy to configure web session replication and the only requirements it has are
Target application or web server should support Java 1.5+
Target application or web server should support Servlet 2.4+ spec
Session objects that needs to be clustered have to be Serializable
I've not tried to configure a cluster the way you've described, however I think it may do the trick.
The responce is simply NO. Clusturing is a non standard feature, it is up to the Java EE implementation to provide clustoring keeping standard behaviour (with very litle constrains, as stickiness is expected and session object are expected to be serializable) and no interoperability is forseen.
You can of course made the cluster your self, setting up an external data grid to serve as session store and manage your self the cache, but then you will lose any framework functionality related to the session (you will need to do every thing by your self) and what the point any more to use a full Java EE application server. Yes you will then need to forget about SFSBs.
I am ready curous what issue you want to solve by this type of architecture. I don't see any that can over come the cost of maintining 3 differents apps (app server have slite difference on the dev side) and more importantly 3 differents infrastructure operation stack (on this side there is lot of difference, so you need to multiply the opperation team knowlages).
Is there any easy way in a Java EE application (running on Websphere) to share an object in an application-wide scope across the entire cluster? Something maybe similar to Servlet Context parameters, but that is shared across the cluster.
For example, in a cluster of servers "A" and "B", if a value is set on server A (key=value), that value should immediately (or nearly so) be available to requests on server B.
(Note: Would like to avoid distributed caching solutions if possible. This really isn't a caching scenario as the objects being stored are fairly dynamic)
I'm watching this to see if any simple solutions appear, but I don't know of any myself. Last time I asked something like this, the answer was to use a distributed object store.
Our substitute was manual notification over HTTP to a configured list of URLs, one for each Web Container's direct server:port combination. (That is, bypassing any fronting proxy/web server/plugin.)
Try using the WebSphere workarea
I have heard that this is what JavaRebel does but is there any other good way to deploy a new version of an EAR while allowing users to remain active on the previous version? We use JBoss for the application server...
It's not what JavaRebel does. JavaRebel (according to description) hot-replaces the classes in memory. It's not acceptable in the case of existing connections to the system, since the updated classes may break the client's logic.
Once a company I was working for had a similar problem, and it was solved this way:
a smart router was used as a load-balancer
the new version was deployed to 50% of the nodes of the (new) cluster
new connections were delivered strictly to these updated nodes, old ones were balanced between old nodes
old nodes were took off-line (one-by-one, to keep number of clients per node within limits)
at the same time, new version was deployed to off-line "old" nodes and they were brought up as new nodes
due to EJB clustering, the sessions and beans were picked up by other old nodes
eventually (in a few hours), only one old node left, having a single instance of old version and all clients using old version were connected to it
when the last old client got disconnected, that node was too brought down
Now, I'm not a networking guy, and cannot give you many details (like what was the router hardware and such). My understanding this can be set up pretty easy, except, if I remember right, we had to setup an additional Weblogic domain to deploy new versions of the application (otherwise it would be conflicting with the old one on JNDI names).
Hope that helps.
P.S. Ichorus provided a comment saying that the app is deployed on clients' servers. So the router trick may be not feasible. Now, I see only one viable solution right now ( it's 21:52 now, I may overlook things :) ) --
Develop new version with "versioned" JNDI names; e.g. if Customer bean was under ejb/Customer in version 1, in version 2 it would be under ejb/Customer2
Have a business facade in the application with a stable basic interface (Factory-style) which, when asked for Customer bean, tries to find the highest-versioned JNDI name (not on every call, of course, can cache for a hour or so). That facade could (and should) be deployed as a separate application -- and never or very rarely updated
Now every new client would get access to the latest application deployed, and the applications won't conflict.
This approach takes a careful planning and testing, but should work IMHO.
I recently modified a few applications in a similar way to let them coexist in the same domain (before they used the same JNDI name for different data sources).
As I understand WebLogic has a feature called parallel deployment to eliminate downtime during EAR version upgrade. You can deploy the new version without stopping the existing application and once the new version deployed successfully you can switch over transparently from the old one to new one.
I am not sure if other application server supports this.
Ref: http://edocs.bea.com/wls/docs100/deployment/redeploy.html#wp1022490
Vladimir's suggestion around using a load balancer is a pretty sure way of achieving what you want. Keep in mind, it need not necessarily be a high-end hardware load balancer. Rather, if you front your JBoss server with a native web server (Apache or IIS) and mod_jk or mod_proxy, you can maintain one common web facade and implement the applicable loading and routing routines at EAR upgrade time.
//Nicholas
I think you might want to look into Spring using OSGI framework.
http://www.springframework.org/osgi