I am trying to build the very lightweight solution for zero downtime deployment for Java apps. For the sake of simplicity lets think that we have two servers. My solution is to use:
On the "front" -- some load balancer (software) - I am thinking about HAProxy here.
On the "back" - two servers, both running Tomcat with deployed application.
When we are about to deploy new release
We disable one of the servers with HAProxy, so only one server (let's call it server A, which is running old release) will be available.
Deploy new release on other server (let's call it server B), run production unit tests (in case we have them :-) and enable server B with HAProxy, disabling server A at the same time.
Now we have again only one server active (server B, with the new release). Deploy new release on server B, and re-enable it.
Any advises how to improve? How automate?
Any ready made solutions or do I have to end up with my own custom scripts?
Thanks!
I have found some interesting solutions from this article regarding Zero downtime. I would like to highlight only few solutions in that article.
1. A/B switch: ( Rolling upgrade + Fallback mechanism )
We should have a set of nodes in standing by mode. We will deploy the new version to those nodes and switch the traffic to them instantly. If we keep the old nodes in their original state, we could do instant rollback as well. A load balancer fronts the application and is responsible for this switch upon request.
cons: If you need X servers to run your application, yon need 2X servers with this approach.
2. Zero downtime
With this approach, we don’t keep a set of machines; rather, we delay the port binding. Shared resource acquisition is delayed until the application starts up. The ports are switched after the application starts, and the old version is also kept running (without an access point) to roll back instantly if needed.
3. Parallel deployment – Apache Tomcat: ( For web applications only)
Apache Tomcat has added the parallel deployment feature to their version 7 release. They let two versions of the application run at the same time and take the latest version as default.
4. Delayed port binding:
we propose here is the ability to start the server without binding the port and essentially without starting the connector. Later, a separate command will start and bind the connector. Version 2 of the software can be deployed while version 1 is running and already bound. When version 2 is started later, we can unbind version 1 and bind version 2. With this approach, the node is effectively offline only for a few seconds.
5. Advanced port binding:
By breaking the myth: ‘Address already in use’, *both old process & new process will bind to same port. SO_REUSEPORT option in ON mode lets two (or more) processes bind to the same port. Once the new process binds to the port, kill the old process.
The SO_REUSEPORT option address two issues:
The small glitch between the application version switching: The node can serve traffic all the time, effectively giving us zero downtime.
Improved scheduling:
In Summary:
By combining both late binding and port reuse, we can effectively achieve zero downtime. And if we keep the standby process around, we will be able to do an instant rollback as well.
Rolling upgrade is indeed a good solution, provided your load-balancer supports this option (server starvation).
Another solution is to use OSGi-enabled application servers, to hot-replace parts or whole of your application.
I would recommend the first one. SpringSource's AMS supervision console can take down a cluster of tcServer (a custom tomcat on steroids), and IIRC do the rolling upgrade automatically (but check the docs).
Have a look at the OSGi technology if you can accommodate an OSGi container since it provides good isolation and hot deployment for OSGi bundles. If you are using the Spring framework you could use Spring OSGi
LiveRebel provides the functionality for rolling restarts, provides CLI API and Hudson/Jenkins plugin for automation.
There is easy-deploy that does exactly that with Docker containers.
Deploy version 1
easy-deploy -p 80:80 -v some/path:other/path my-image:1
To deploy a new version just run the command with the updated tag name
easy-deploy -p 80:80 -v some/path:other/path my-image:2
Disclosure: I built this tool. I built it exactly because I couldn't find a simple solution for this problem.
We develop an application which is normally deployed on a single webserver. Now we check how it runs in a clustered environment, as some customers are using clusters.
The problem is the app creates a local configuration (in registry/file) which does not make any sense in a cluster. The config is changed by the application.
Is there a generic way (like an interface) to make a central configuration, so the config(-file) itself is not duplicated on each node when the app in deployed in a cluster? Any other recommended options? (doing it manually with config on network-share/in database/some MBean?)
why generic? It must run on different application-servers (like tomcat, jboss, Webspere, weblogic ...) so we cannot use some server-specific feature.
Thanks.
Easiest way for central configuration is to put it on the file system. This way you can mount the file system to your OS and make it available to your app server no matter what the brand or version.
We do this for some of our applications. Shared libraries and/or properties files that we care about (in our case). We set up either JVM parms or JNDI environment variables (trying to move toward those) so we can look up the path to the mounted drive at runtime and load the data from the files.
Works pretty slick for us.
Now if you are writing information, that's a different story. As then you have to worry about how you are running your cluster (is it highly available only? load-balanced?). Is the app running in both clusters as if it was one app? Or is it running independently on each cluster node? If so, then you might have to worry about concurrent writes. Probably better to go with a database or one of the other solutions as mentioned above.
But if all you are doing is reading configuration, then I would opt for the mounted file system as it is simplest.
You may use a library like Commons Configuration and choose an implementation which is cluster-friendly like JDBC or JNDI.
I would consider JDBC and JDNI first, however if you want your servers to be able to run independantly, I would suggest a file distrubtion system like subversion/git/mercurial i.e. if your central configuration servers is down or unavailable, you don't want production to stop.
A version controlled system provides a history of who made what changes when and controlled releases (and roll back of releases)
One way to avoid the issue of the central server adding another point of failure is to use a databasse server which you already depend on (assuming you have one) on the basis that if its not running, you won't be working anyway.
When deploying a large Java webapp (>100 MB .war) I'm currently use the following deployment process:
The application .war file is expanded locally on the development machine.
The expanded application is rsync:ed from the development machine to the live environment.
The app server in the live environment is restarted after the rsync. This step is not strictly needed, but I've found that restarting the application server on deployment avoids "java.lang.OutOfMemoryError: PermGen space" due to frequent class loading.
Good things about this approach:
The rsync minimizes the amount of data sent from the development machine to the live environment. Uploading the entire .war file takes over ten minutes, whereas an rsync takes a couple of seconds.
Bad things about this approach:
While the rsync is running the application context is restarted since the files are updated. Ideally the restart should happen after the rsync is complete, not when it is still running.
The app server restart causes roughly two minutes of downtime.
I'd like to find a deployment process with the following properties:
Minimal downtime during deployment process.
Minimal time spent uploading the data.
If the deployment process is app server specific, then the app server must be open-source.
Question:
Given the stated requirements, what is the optimal deployment process?
Update:
Since this answer was first written, a better way to deploy war files to tomcat with zero downtime has emerged. In recent versions of tomcat you can include version numbers in your war filenames. So for example, you can deploy the files ROOT##001.war and ROOT##002.war to the same context simultaneously. Everything after the ## is interpreted as a version number by tomcat and not part of the context path. Tomcat will keep all versions of your app running and serve new requests and sessions to the newest version that is fully up while gracefully completing old requests and sessions on the version they started with. Specifying version numbers can also be done via the tomcat manager and even the catalina ant tasks. More info here.
Original Answer:
Rsync tends to be ineffective on compressed files since it's delta-transfer algorithm looks for changes in files and a small change an uncompressed file, can drastically alter the resultant compressed version. For this reason, it might make good sense to rsync an uncompressed war file rather than a compressed version, if network bandwith proves to be a bottleneck.
What's wrong with using the Tomcat manager application to do your deployments? If you don't want to upload the entire war file directly to the Tomcat manager app from a remote location, you could rsync it (uncompressed for reasons mentioned above) to a placeholder location on the production box, repackage it to a war, and then hand it to the manager locally. There exists a nice ant task that ships with Tomcat allowing you to script deployments using the Tomcat manager app.
There is an additional flaw in your approach that you haven't mentioned: While your application is partially deployed (during an rsync operation), your application could be in an inconsistent state where changed interfaces may be out of sync, new/updated dependencies may be unavailable, etc. Also, depending on how long your rsync job takes, your application may actually restart multiple times. Are you aware that you can and should turn off the listening-for-changed-files-and-restarting behavior in Tomcat? It is actually not recommended for production systems. You can always do a manual or ant scripted restart of your application using the Tomcat manager app.
Your application will be unavailable to users during a restart, of course. But if you're so concerned about availability, you surely have redundant web servers behind a load balancer. When deploying an updated war file, you could temporarily have the load balancer send all requests to other web servers until the deployment is over. Rinse and repeat for your other web servers.
It has been noted that rsync does not work well when pushing changes to a WAR file. The reason for this is that WAR files are essentially ZIP files, and by default are created with compressed member files. Small changes to the member files (before compression) result in large scale differences in the ZIP file, rendering rsync's delta-transfer algorithm ineffective.
One possible solution is to use jar -0 ... to create the original WAR file. The -0 option tells the jar command to not compress the member files when creating the WAR file. Then, when rsync compares the old and new versions of the WAR file, the delta-transfer algorithm should be able to create small diffs. Then arrange that rsync sends the diffs (or original files) in compressed form; e.g. use rsync -z ... or a compressed data stream / transport underneath.
EDIT: Depending on how the WAR file is structured, it may also be necessary to use jar -0 ... to create component JAR files. This would apply to JAR files that are frequently subject to change (or that are simply rebuilt), rather than to stable 3rd party JAR files.
In theory, this procedure should give a significant improvement over sending regular WAR files. In practice I have not tried this, so I cannot promise that it will work.
The downside is that the deployed WAR file will be significantly bigger. This may result in longer webapp startup times, though I suspect that the effect would be marginal.
A different approach entirely would be to look at your WAR file to see if you can identify library JARs that are likely to (almost) never change. Take these JARs out of the WAR file, and deploy them separately into the Tomcat server's common/lib directory; e.g. using rsync.
In any environment where downtime is a consideration, you are surely running some sort of cluster of servers to increase reliability via redundancy. I'd take a host out of the cluster, update it, and then throw it back into the cluster. If you have an update that cannot run in a mixed environment (incompatible schema change required on the db, for example), you are going to have to take the whole site down, at least for a moment. The trick is to bring up replacement processes before dropping the originals.
Using tomcat as an example - you can use CATALINA_BASE to define a directory where all of tomcat's working directories will be found, separate from the executable code. Every time I deploy software, I deploy to a new base directory so that I can have new code resident on disk next to old code. I can then start up another instance of tomcat which points to the new base directory, get everything started up and running, then swap the old process (port number) with the new one in the load balancer.
If I am concerned about preserving session data across the switch, I can set up my system such that every host has a partner to which it replicates session data. I can drop one of those hosts, update it, bring it back up so that it picks the session data back up, and then switch the two hosts. If I've got multiple pairs in the cluster, I can drop half of all pairs, then do a mass switch, or I can do them a pair at a time, depending upon the requirements of the release, requirements of the enterprise, etc. Personally, however, I prefer to just allow end-users to suffer the very occasional loss of an active session rather than deal with trying to upgrade with sessions intact.
It's all a tradeoff between IT infrastructure, release process complexity, and developer effort. If your cluster is big enough and your desire strong enough, it is easy enough to design a system that can be swapped out with no downtime at all for most updates. Large schema changes often force actual downtime, since updated software usually cannot accommodate the old schema, and you probably cannot get away with copying the data to a new db instance, doing the schema update, and then switching the servers to the new db, since you will have missed any data written to the old after the new db was cloned from it. Of course, if you have resources, you can task developers with modifying the new app to use new table names for all tables that are updated, and you can put triggers in place on the live db which will correctly update the new tables with data as it is written to the old tables by the prior version (or maybe use views to emulate one schema from the other). Bring up your new app servers and swap them into the cluster. There are a ton of games you can play in order to minimize downtime if you have the development resources to build them.
Perhaps the most useful mechanism for reducing downtime during software upgrades is to make sure that your app can function in a read-only mode. That will deliver some necessary functionality to your users but leave you with the ability to make system-wide changes that require database modifications and such. Place your app into read-only mode, then clone the data, update schema, bring up new app servers against new db, then switch the load balancer to use the new app servers. Your only downtime is the time required to switch into read-only mode and the time required to modify the config of your load balancer (most of which can handle it without any downtime whatsoever).
My advice is to use rsync with exploded versions but deploy a war file.
Create temporary folder in the live environment where you'll have exploded version of webapp.
Rsync exploded versions.
After successfull rsync create a war file in temporary folder in the live environment machine.
Replace old war in the server deploy directory with new one from temporary folder.
Replacing old war with new one is recommended in JBoss container (which is based on Tomcat) beacause it'a atomic and fast operation and it's sure that when deployer will start entire application will be in deployed state.
Can't you make a local copy of the current web application on the web server, rsync to that directory and then perhaps even using symbolic links, in one "go", point Tomcat to a new deployment without much downtime?
Your approach to rsync the extracted war is pretty good, also the restart since I believe that a production server should not have hot-deployment enabled. So, the only downside is the downtime when you need to restart the server, right?
I assume all state of your application is hold in the database, so you have no problem with some users working on one app server instance while other users are on another app server instance. If so,
Run two app servers: Start up the second app server (which listens on other TCP ports) and deploy your application there. After deployment, update the Apache httpd's configuration (mod_jk or mod_proxy) to point to the second app server.
Gracefully restarting the Apache httpd process. This way you will have no downtime and new users and requests are automatically redirected to the new app server.
If you can make use of the app server's clustering and session replication support, it will be even smooth for users which are currently logged in, as the second app server will resync as soon as it starts. Then, when there are no accesses to the first server, shut it down.
This is dependant on your application architecture.
One of my applications sits behind a load-balancing proxy, where I perform a staggered deployment - effectively eradicating downtime.
Hot Deploy a Java EAR to Minimize or Eliminate Downtime of an Application on a Server or How to “hot” deploy war dependency in Jboss using Jboss Tools Eclipse plugin might have some options for you.
Deploying to a cluster with no downtime is interesting too.
JavaRebel has hot-code deployement too.
If static files are a big part of your big WAR (100Mo is pretty big), then putting them outside the WAR and deploying them on a web server (e.g. Apache) in front of your application server might speed up things. On top of that, Apache usually does a better job at serving static files than a servlet engine does (even if most of them made significant progress in that area).
So, instead of producing a big fat WAR, put it on diet and produce:
a big fat ZIP with static files for Apache
a less fat WAR for the servlet engine.
Optionally, go further in the process of making the WAR thinner: if possible, deploy Grails and other JARs that don't change frequently (which is likely the case of most of them) at the application server level.
If you succeed in producing a lighter WAR, I wouldn't bother of rsyncing directories rather than archives.
Strengths of this approach:
The static files can be hot "deployed" on Apache (e.g. use a symbolic link pointing on the current directory, unzip the new files, update the symlink and voilà).
The WAR will be thinner and it will take less time to deploy it.
Weakness of this approach:
There is one more server (the web server) so this add (a bit) more complexity.
You'll need to change the build scripts (not a big deal IMO).
You'll need to change the rsync logic.
I'm not sure if this answers your question, but I'll just share on the deployment process I use or encounter in the few projects I did.
Similiar to you, I do not ever recall making a full war redeployment or update. Most of the time, my updates are restricted to a few jsp files, maybe a library, some class files. I am able to manage and determine which are the affected artifacts, and usually, we packaged those update in a zip file, along with an update script. I will run the update script. The script does the following:
Backup the files that will be overwritten, maybe to a folder with today's date and time.
Unpackage my files
Stop the application server
Move the files over
Start the application server
If downtime is a concern, and they usually are, my projects are usually HA, even if they are not sharing state but using a router that provide sticky session routing.
Another thing that I am curious would be, why the need to rsync? You should able to know what are the required changes, by determining them on your staging/development environment, not performing delta checks with live. In most cases, you would have to tune your rsync to ignore files anyway, like certain property files that define resources a production server use, like database connection, smtp server, etc.
I hope this is helpful.
At what is your PermSpace set? I would expect to see this grow as well but should go down after collection of the old classes? (or does the ClassLoader still sit around?)
Thinking outloud, you could rsync to a separate version- or date-named directory. If the container supports symbolic links, could you SIGSTOP the root process, switch over the context's filesystem root via symbolic link, and then SIGCONT?
As for the early context restarts. All containers have configuration options to disable auto-redeploy on class file or static resource changes. You probably can't disable auto redeploys on web.xml changes so this file is the last one to update. So if you disable to auto redeploy and update the web.xml as the last one you'll see the context restart after the whole update.
We upload the new version of the webapp to a separate directory, then either move to swap it out with the running one, or use symlinks. For example, we have a symlink in the tomcat webapps directory named "myapp", which points to the current webapp named "myapp-1.23". We upload the new webapp to "myapp-1.24". When all is ready, stop the server, remove the symlink and make a new one pointing to the new version, then start the server again.
We disable auto-reload on production servers for performance, but even so, having files within the webapp changing in a non-atomic manner can cause issues, as static files or even JSP pages could change in ways that cause broken links or worse.
In practice, the webapps are actually located on a shared storage device, so clustered, load-balanced, and failover servers all have the same code available.
The main drawback for your situation is that the upload will take longer, since your method allows rsync to only transfer modified or added files. You could copy the old webapp folder to the new one first, and rsync to that, if it makes a significant difference, and if it's really an issue.
Tomcat 7 has a nice feature called "parallel deployment" that is designed for this use case.
The gist is that you expand the .war into a directory, either directly under webapps/ or symlinked. Successive versions of the application are in directories named app##version, for example myapp##001 and myapp##002. Tomcat will handle existing sessions going to the old version, and new sessions going to the new version.
The catch is that you have to be very careful with PermGen leaks. This is especially true with Grails that uses a lot of PermGen. VisualVM is your friend.
Just use 2 or more tomcat servers with a proxy over it. That proxy can be of apache/nignix/haproxy.
Now in each of the proxy server there is "in" and "out" url with ports are configured.
First copy your war in the tomcat without stoping the service. Once war is deployed it is automatically opened by the tomcat engine.
Note cross check unpackWARs="true" and autoDeploy="true" in node "Host" inside server.xml
It look likes this
<Host name="localhost" appBase="webapps"
unpackWARs="true" autoDeploy="true"
xmlValidation="false" xmlNamespaceAware="false">
Now see the logs of tomcat. If no error is there it means it is up successfully.
Now hit all APIs for testing
Now come to your proxy server .
Simply change the background url mapping with the new war's name. Since registering with the proxy servers like apache/nignix/haProxy took very less time, you will feel minimum downtime
Refer -- https://developers.google.com/speed/pagespeed/module/domains for mapping urls
You're using Resin, Resin has built in support for web app versioning.
http://www.caucho.com/resin-4.0/admin/deploy.xtp#VersioningandGracefulUpgrades
Update: It's watchdog process can help with permgenspace issues too.
Not a "best practice" but something I just thought of.
How about deploying the webapp through a DVCS such as git?
This way you can let git figure out which files to transfer to the server. You also have a nice way to back out of it if it turns out to be busted, just do a revert!
I wrote a bash script that takes a few parameters and rsyncs the file between servers. Speeds up rsync transfer a lot for larger archives:
https://gist.github.com/3985742