I am writing a small proxy application which should be redundant, e.g. primary proxy will be running on one server and the redundant one will run on a separate server. Is there a simple high-availability framework which I can use to implement this redundancy? For example, this HA framework would send pings between instances and raise some sort of exception or notification on the other instance when the first one goes down.
Building such a system has been my routine job in recent years. I have found jgroups
a very usable tools to receive and handle such kind of grouping events. This is the case if you want to build your own HA infrastructure. I don't know, but maybe in your case just a simple reverse proxy such as HAProxy can be enough.
If you want HA without hassle, just use some load balancer with HA capability e.g. Ultramonkey, LVS with keepalived etc.
In a HA configuration, you'd typically want to use virtual IP, so even if you'd have this ping/notify functionality as a framework, you'll still have stuff to do (start responding to requests to the virtual IP once the other instance has failed). So unless you are looking for a learning occasion, I'd advice using a middleware instead of coding this yourself using frameworks.
There are number of health-checks that you can configure for these middlewares. A simple healthcheck might for example, fire a GET request to your app. periodically and look for a specific string (e.g. "XXX running.") in the response to make sure your app. is running fine.
You don't provide much details about the work your application does, so depending on how stateful it is, whether it can tolerate minor dataloss, is it time-critical, do you value developer time over machine time, you can have a varying spectrum of solutions.
There are some good suggestions above, I'd add: take a look at JMS and persistent messaging. Usually these make recovery quite trivial, but at the cost of latency hit (unless you byu a commercial product and learn it well or pay the vendor to tune your application). With JMS queues you can implement active-active processing and save yourself the headache of failure detection.
Another direction to look at is distributed state management/clustering framework like Gigaspaces, Coherence, Gemstone, Infinispan, Gridgain and Teracotta. These can replicate your data and guarantee varying quality of services levels. Most of them come with some type of failure detection and distributed management mechanism.
hadoop is a good place to start
Related
I'm interested in using JMX to monitor/configure a simple Java Client/Server application. For example, we would capture any network exceptions that occur in a Java program.
Can MBeans be extended in this way? Or are they limited to more concrete get & set functions?
So far, I've looked in Notifications and Monitor MBeans
Thanks
Well I would say that it's definitely doable. I was using JMX in an Apache Wicket application earlier with custom MBeans. Anyway MBeans is just a wrapper around some logic in your server application. So you can take the data directly from your application.
If you want to take an example how is this done in a working application you might want to checkout this:
https://github.com/apache/wicket/blob/master/wicket-jmx/src/main/java/org/apache/wicket/jmx/wrapper/MarkupSettings.java
The class basically holds a reference to the application and asks for data directly form the server app.
When the server starts up, then it registers all the MBeans through an initializer class:
https://github.com/apache/wicket/blob/master/wicket-jmx/src/main/java/org/apache/wicket/jmx/Initializer.java
Then every time when you take a look in your MBean server you will see the latest up-to-date information coming directly from the app.
There are some caveats though. One caveat is that Java in general doesn't provide any good abstraction to capture all Exceptions of a given type coming from any source of the application. You can register your catch-all exception handler but as far as I can remember it doesn't work perfectly.
What I was doing when I had to do something like this, I was using AspectJ to register an all catch place to handle exceptions. I was using compile time weaving to reduce the performance implication but I am not sure how much does it affect the overall performance (if it affects at all).
¯\_(ツ)_/¯
The other caveat is that JMX connections are usually difficult to set up in an enterprise environment. If you have to log-in through two hops just to arrive to the production servers because there are firewalls everywhere than your monitoring connection will definitely fail and you need to keep buying beer to your sysadmin and convince your manager that this is not imposing any security risk. :)
There is one thing though. You say
to monitor/configure a simple Java Client/Server application
You want to configure / monitor the clients as well? I've never done that. I am not sure that's even possible.
In grid computing, what is the de facto software practice used by a server to discover clients and get information about them? For example, the name of the client, how much memory is available, is the client currently performing a task (and how much has it completed), etc. Or is it the other way around? Do the clients occasionally report that information to the server?
Would this be done via RPC? Or a messaging protocol (AMQP, STOMP)?
I'm also wondering if the same method is used to send clients various jobs/taks to complete?
I'm looking to find a Java friendly solution, if possible.
Thanks!
There is no actual de facto standard for server/node/client discovery in grid computing, at least none that is universally used. Many implementations use adhoc discovery based on UDP multicasting, others use registry-based discovery as in SOA architectures. There's plenty of solutions but no universal standard.
Some Java-firendly implementations you might want to look at: Unicore, JPPF, HTCondor, GridGain, Hadoop, Globus, Hazelcast
Zookeeper is something to consider. Perhaps combined with JMS messaging if your resources are distributed far and wide. I use Zookeeper with a SystemInfo service running on each node. The service registers the systems information: memory, number of CPUs, disk space and such as a znode in /Resources in Zookeeper.
Then whatever service needs a resource can query /Resources if looking for a resource to do something and check its specifications before allocating work.
The Java APi for Zookeeper is pretty good. I find it easy to work with.
Background of the web application:
I am using java/spring-mvc/tomcat to provide my web service as well as exposing my restful API to mobile clients. I am happy with everything on the web surface right now. The problem is that my application has a really heavy computing process at its core, which invokes a separate Java program to process the images and return computed data back to the web service.
It sometime eats up lots of my EC2 instance memory, or causes an exception that shuts down my Tomcat7 server.
Question:
Right now everything is running under same tomcat7 container, and I am seeking a solution to decouple those two so that I can install them in different server, perhaps find a high memory server for computing program alone.
What are the options out there that allow me to decouple them and improve scalability and stability?
Update:
I can invoke computing engine programmatically or from command line.
Update2:
I have done some researches based on the answer. When I read on another post about What exactly is Apache Camel?, I feel I should probably learn a little more about EIP patterns. Hopefully, it is not overkill.
Solution based on suggestion
After reading through the EIP concept, camel in action, activemq, I finally come up with a solution. It might not be elegant, but it's working. Suggestion and comments would be appreciated!
I wrote a queue router based on apache-camel , connecting to activemq broker and running as standalone program in one server. The computing engine running in standalone container and the router is responsible to process jms requestor from my spring container in web server. Later on I just need to config load balance for computing engine from camel if further intensive computing is needed.
The one which are pointing right now is adding more hardware. You need to think through if this solves your problem. Eg: If you are using a 32 bit JVM there are limitations on how much heap size you can specify. If you are lucky to have a 64 bit JVM them then you will have a bigger room for memory. But there is always the possibility of using too much CPU where your application becomes unresponsive.
I prefer breaking the compute intensive tasks into jobs and work them out in a seperate JVM. Persist your jobs in a datastore/JMS so that they do not get lost. Be careful if you are doing DB updates from those jobs to avoid any locking.
If I understand correctly, it seems you need a load balancer.
Have a load balancer to route to one of multiple instances of your webservice/compute engine. You can achieve this using an esb, routing engine, clustered, master-slave, distributed-cache etc most of them interrelated.
And you can also spin up additional nodes realtime on EC2 based on load.
Else, if the task can be broken, then delegate it to multiple nodes/services. You will need some orchestration mechanism.
There are open source solutions that can address 1 and 2 above.
Does the backend work synchronously? I mean, when the mobile clients requests something do they have to wait for the backend to do a lot of processing?
If yes, you can grow horizontally, putting more worker nodes (backend webapps) and a front Nginx or any balancer. It's the fastest way.
Do you have reutilizable data? if yes, you can use something like memcached.
Hope it helps, if you give us more information I'm pretty sure that we will provide better advice.
Hi guys: I've "simplistic" workflow management tricks (like rotating file queues, controller threads, etc...) work in a wide variety of producer/consumer contexts... Where files are simply renamed, deleted, and created in a systematic manner; or where a "main" thread is calls and coordinates workers.
In contrast, I've also "played" with JMS in some toy applications, and I can see how it might be used to coordinate a complex application workflow.
I was wondering: What do messaging services like JMS offer over standard producer/consumer workflows (of course, if I'm missing something here, or have the wrong idea of when/why JMS is used, feel free to correct me)?
In particular, what type of applications require enterprise-grade messaging frameworks?
What do messaging services like JMS offer over standard producer/consumer workflows?
Scalability, availability, transparency, manageability. In point-to-point communication sender is bound to the receiver and vice versa. You, as the application developer, are responsible for thinking what to do when traffic increases and implement the necessary changes. Your application must be aware of the environment in which it works and must be changed every time the environment changes. You are forced to reinvent the wheel while solving typical messaging problems, for example, temporary congestion (what to do when the consumer can't keep the pace with the producer for a while?). You have to provide your own means of monitoring the current situation, if something does not work as expected. The list goes on...
Now imagine you have to wire 10 different systems this way. Obviously, you'll need to come up with a fairly universal solution so that you don't implement each connection logic from scratch — that would be terribly expensive to produce, not to mention maintaining it. A JMS message broker is one of such possible general solutions.
In particular, what type of applications require enterprise-grade messaging frameworks?
Complicated, in short. I work for a company that has a network of about 70 systems, some of them 30 years old. New systems are added to the network as time passes and the old systems don't need to be changed, neither must new systems be aware of ancient data exchange protocols — a centralized cluster of message brokers can translate a JMS message into some mainframe message format I have no idea about, and same way back with the answer.
we have a web application that does various things and sometimes emails users depending on a given action. I want to decouple the http request threads from actually sending the email in case there is some trouble with the SMTP server or a backlog. In the past I've used JMS for this and had no problem with it. However at the moment for the web app we're doing JMS just feels a bit of an over kill right now (in terms of setup etc) and I was wondering what other alternative there are out there.
Ideally I just like something that I can run in-process (JVM/Tomcat), but when the servlet context is unloaded any pending items in the queue would be swapped to disk/db. I could of course just code something together involving an in memory Q, but I'm looking to gain the benfit of opensource projects, so wondering whats out there if anything.
If JMS really is the answer anyone know of somethign that could fit our simple requirements.
thanks
I'm using JMS for something similar. Our reasons for using JMS:
We already had a JMS server for something else (so it was just adding a new queue)
We wanted our application be decoupled from the processing process, so errors on either side would stay on their side
The app could drop the message in a queue, commit, and go on. No need to worry about how to persist the messages, how to start over after a crash, etc. JMS does all that for you.
I would think spring integration would work in this case as well.
http://www.springsource.org/spring-integration
Wow, this issue comes up a lot. CommonJ WorkManagager is what you are looking for. A Tomcat implementation can be found here. It allows you to safely create threads in a Java EE environment but is much lighter weight than using JMS (which will obviously work as well).
Beyond JMS, for short messages you could also use Amazon Simple Queue Service (SQS).
While you might think it an overkill too, consider the fact there's minimal maintenance required, scales nicely, has ultra-high availability, and doesn't cost all that much.
No cost for creating new queues etc; or having account. As far as I recall, it's purely based on number of operations you do (sending messages, polling/retrieving).
Main limitation really is the message size (there are others, like not guaranteeing ordering due to distributed nature etc); but that might work as is. Or for larger messages, using related AWS service, s3, for storing actual body, and just passing headers through SQS.
You could use a scheduler. Have a look at Quartz.
The idea is that you schedule a job to start at regular intervals. All requests need to be persisted somewhere. The scheduled job will read them and process them. You need to define the interval between two subsequent jobs to fit your needs.
This is the recommended way of doing things. Full-fledged application servers offer Java EE Timers for this, but these aren't available in Tomcat. Quartz is fine though and you could avoid starting your own threads, which will cause mess in some situations (e.g. in application updates).
I agree that JMS is overkill for this.
You can just send the e-mail in a separate thread (i.e. separate from the request handling thread). The only thing to be careful about is that if your app gets any kind of traffic at all, you may want to use a thread pool to avoid resource depletion issues. The java.util.concurrent package has some nice stuff for thread pools.
Since you say the app "sometimes" emails users it doesn't sound like you're talking about a high volume of mail. A quick and dirty solution would be to just Runtime.getRuntime().exec():
sendmail recipient#domain.com
and dump the message into the resulting Process's getOutputStream(). After that it's sendmail's problem.
Figure a minute to see if you have sendmail available on the server, about fifteen minutes to throw together a test if you do, and nothing to install assuming you found sendmail. A few more minutes to construct the email headers properly (easy - here are some examples) and you're done.
Hope this helps...