Load Balance in Distributed Project

Load Balance in Distributed Project - java

Does anyone know a simple load balance algorithm (formula) that relates users connected, cpu load, network load and memory usage?
This will be used to compare various servers and assign to a new user the best at the moment.
Thank You.

If you are using Apache Web Server to proxy the application servers, I recommend that you use mod_proxy and mod_proxy_balancer. You can find a quick introduction about mod_proxy here. This is about Jetty, but it is easily applicable to other servers.
The first thing that you need to worry about clustering is the way sessions are handled. You need to be sure that a request belonging to a session is directed to the same server (or the session is somehow persisted and then always retrieved). Mod_proxy can do this for you.
Regarding the load balancing algorithm, see the documentation of mod_proxy_balancer. According to it, there are 3 load balancer scheduler algorithms.
An older solution to load balancing is mod_jk.
In general, this is not something I would implement myself, even if I had a better algorithm. It is better to use an existing solution.

Have a look at nginx. It is easy do configure, very fast and handles load balancing between servers.
Distributed session handling is needed accordingly (see kgiannakakis) for details.

Have a look at haproxy. It is an extremelly stable and fast load HTTP/TCP load balancer that is being used by some extremelly trafficed websites.
Distributed session handling is needed accordingly (see kgiannakakis) for details.

Related

Elasticsearch cluster load balancing best practices

I would like to understand whether I need or is it considered as a good practice to have load balancer as part of the deployment of Elasticsearch.
As far as I understand high level rest client as well as transport client of Elasticsearch can manage load balancing between the nodes. So the client needs coma separated endpoint list and that's it.
Is there any point to have also Load Balancer at the middle?
For which case it might be useful?
Pros and cons of each method?

Normally external load-balancer in ES cluster is not very common and not required as Elasticsearch already does load balancing and by default all the data nodes in ES cluster act as co-ordinating role but if you want to improve the performance you can have dedicated co-ordinating node as well.
If your goal is to have a smart load-balancing which improves the performance than if you are on ES 6.X or higher(turned by default on 7.X), you get it out of the box without doing any external configuration, by using Adaptive replica selection.
Having another loadbalancer means extra configuration and another layer before your request reaches to ES, so IMHO it doesn't make any sense to use it.

The answer depends on your architecture and also your requirements. Do you need a loadbalancer for high availability? Or for performance reasons/scalability? Or both?
Elasticsearch like many other distributed systems comes with its own protocols and semantics to distribute load across multiple nodes and to manage fail-overs.
You can use these semantics to configure nodes in such a way that a node can perform just the role of a coordinator -- effectively acting as a load balancer for heavy duty operations like search requests or bulk index requests.
Elasticsearch also has its own built-in protocol for electing a new master node in case of failures -- again effectively performing the role of a load balancer.
In general, I would recommend you to use the native capabilities to achieve your goals instead of adding more complexity by introducing another technology in front of it.
If you want a stable URL for your cluster, then configure your DNS server to reach that goal. A cloud provider managed cluster should already have such a feature, otherwise you can configure it with some efforts.

Moving from Spring HTTP invoker to load balanced solution

Our application currently uses Spring's HttpInvokerProxyFactoryBean to expose a Java service interface, with POJO requests and responses handled by our single Tomcat server. This solution allows us to have a pure Java client and server, sharing the same Java interface. Due to increased load, we are now looking into the possibility of load balancing across multiple Tomcat instances.
It would be nice if we could make this transition while retaining the same Java interface, as this would minimise the additional development required. Googling seems to suggest that the most common solution for Tomcat load balancing is to use Apache http server together with mod_jk, but I presume this would mean using some communication mechanism other than Spring's HTTP invoker? Is there a better solution which would allow us to retain more of our current code? If not, what would be involved in transitioning between what we have now and Apache/mod_jk?
Any help would be greatly appreciated as I don't have any experience in this matter.

App Server Clustering vs Terracotta

I've heard the term "clustering" used for application servers like GlassFish, as well as with Terracotta; and I'm trying to understand what the word clustering implies when used in conjunction with application servers, and when used in conjunction with Terracotta.
My understanding is:
If a GlassFish server is clustered, then it means we have multiple physical/virtual machines, each with their own JRE/JVM running separate instances of GlassFish. However, since they are clustered, they will all communicate through their admin server ("DAS"), and have the same apps deployed to all of them. They will effectively act (to the end user) as if they are a single app server - but now with load balancing, failover/redundancy and scalability added into the mix.
Terracotta is, essentially, a product that makes multiple JVMs, running on different physical/virtual machines, act as if they are a single JVM.
Thus, if my understanding is correct, the following are implied:
You cluster app servers when you want load balancing and failover tolerance
You use Terracotta when any particular JVM is too small to contain your application and you need more "horsepower"
Thus, technically, if you have a GlassFish cluster of, say, 5 server instances; each of those 5 instances could actually be an array/cluster of Terracotta instances; meaning each GlassFish server instance is actually a GlassFish instance living across the JVMs of multiple machines itself
If any of these assertions/assumptions are untrue, please correct me! If I have gone way off-base and clearly don't understand clustering and/or the very purpose of Terracotta, please point me in the right direction!

Terracotta enables you to have a shared state across all your nodes (its stateful). Basically it creates a shared memory space between different JVM's. This is useful when nodes in a cluster all need access to the same objects.
If your application is stateless and you just need load balancing and fail over you can use a solution like JGroups. In this scenario each node just handles requests and has little idea about other nodes. Objects in memory are not shared across nodes and each JVM just runs on its own and has no idea about other JVM's. This often works nicely for request / response type applications. A webserver serving content (without sessions) does this for example.
Dealing with a stateless cluster is often simpler then dealing with a stateful cluster. This is because in a stateless cluster nodes know almost nothing about each other which results in less things that can go wrong.
GlassFish sits a bit in the middle of the above concepts. Objects in memory within GlassFish are visible to all nodes. However the frontend (HTTP connectors) work stateless.
So to answer your questions:
1) Yes, those are the two most obvious reasons. However sometimes people only want failover or only want load balancing or sometimes both. Not all clustering solutions fix both of these problems.
2) Yes. Altough technically speaking Terracotta only solves the shared memory part, not the CPU part. However by solving the memory part it automatically solves the CPU part since you can now just add JVM's to the shared memory space.
3) I don't know if thats practically possible but as a thought experiment; Yes.

Clustering can mean one of the following:
Multiple instances can be managed as one. Deploy an application to the cluster, it is deployed to all instances in the cluster. Make a configuration change, and that change will be pushed to all nodes in the cluster. GlassFish supports this out of the box.
Service Availability. If any one instance fails, the application is available on another instance. Without high availability enabled, any instance failure also results in session loss for any session being managed by that instance. GlassFish supports this out of the box.
High availability. If any one instance fails, the application is available on another instance, and there is no session loss because a session replica is also maintained on another instance. GlassFish supports this. You will have to choose either #2 or #3 in any one cluster.
What you are asking about IMHO is really #3, because it is the only real case where Terracotta - in the context of high availability clustering - will offer value w/GlassFish. GlassFish already offers built-in high availability, so there had better be a very good reason to add Terracotta to the solution because it will complicate the deployment architecture.
The primary reason I can think of adding Terracotta is that you may want to offload session management to a data grid and free up GlassFish to run business logic. This may be due to more frequent garbage collection or wanting to manage more users per GlassFish instance. However, I'm not sure that Terracotta can do this seamlessly. With GlassFish built-in HA clustering, replicating sessions is seamless (no application logic modifications). You may have to write code to put/get data from a Terracotta cache I'll let you research :-) Oracle GlassFish Server also integrates (seamlessly) with Coherence to solve this problem. You can separate session management into a Coherence data grid without modifying your application code.
Unless you know for a fact up front that your application must scale to a very large number of concurrent users, start with built-in HA clustering, run tests, and go from there.
Hope this helps.

How can I host many identical java web applications?

I have a problem. I need to host many (tens, hundreds) of small identical JAVA web applications that have different loads during one time. I want to use Glassfish V3. Do I need to use a load balancer and clusters or something else? Advise where can I find information about similar problems and their solutions...

I need to host many (tens, hundreds) of small identical JAVA web applications that have different loads during one time.
For hundreds of webapps, you will very likely need more than one app server instance. But this sounds odd to be honest.
I want to use Glassfish V3. Do I need to use a load balancer and clusters or something else?
Right now, GlassFish v3 offers only basic clustering support using mod_jk (i.e. no load balancer plugin, no centralized admin, no high availibility). If you are interested, have a look at this note that describes the configuration steps of GFv3 and mod_jk.
For centralized admin and clustering, you'll have to wait for GlassFish 3.1 (see the GlassFish Roadmap Community Update slides).

You could check out Gigaspaces. I have seen it used in conjunction with Mule for a somewhat similar project. ESBs tend to be overkill in my opinion, but it sounds like you have quite the task to conquer.

Based on your requirements, you cannot do load balancing since the load is predetermined by which client the request is for. Each request has to go to the app handling that client, so it cannot be distributed outside the set of apps dedicated to that client.
You can use multi-threading. you could set up the configuration so that different threads handle different clients. However, it might be better to simply have a server that can handle requests from different clients. Based on the client sent with the request, it would be dispatched to a different database etc.

Is there a way to specify a different session store with Tomcat?

Tomcat (version 5 here) stores session information in memory. When clustering this information is periodically broadcast to other servers in the cluster to keep things in sync. You can use a database store to make sessions persistant but this information is only written periodically as well and is only really used for failure-recovery rather than actually replacing the in-memory sessions.
If you don't want to use sticky sessions (our configuration doesn't allow it unfortunately) this raises the problem of the sessions getting out of sync.
In other languages, web frameworks tend to allow you to use a database as the primary session store. Whilst this introduces a potential scaling issue it does make session management very straightforward. I'm wondering if there's a way to get tomcat to use a database for sessions in this way (technically this would also remove the need for any clustering configuration in the tomcat server.xml).

There definitely is a way. Though I'd strongly vote for sticky sessions - saves so much load for your servers/database (unless something fails)...
http://tomcat.apache.org/tomcat-5.5-doc/config/manager.html has information about SessionManager configuration and setup for Tomcat. Depending on your exact requirements you might have to implement your own session manager, but this starting point should provide some help.

Take a look at Terracotta, I think it can address your scaling issues without a major application redesign.

I've always been a fan of the Rails sessions technique: store the sessions (zipped+encrypted+signed) in the user's cookie. That way you can do load balancing to your hearts content, and not have to worry about sticky sessions, or hitting the database for your session data, etc. I'm just not sure you could implement that easily in a java app without some sort of rewriting of your session-access code. Anyway just a thought.

Another alternative would be the memcached-session-manager, a memcached based session failover and session replication solution for tomcat 6.x / 7.x. It supports both sticky sessions and non-sticky sessions.
I created this project to get the best of performance and reliability and to be able to scale out by just adding more tomcat and memcached nodes.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.