Best practices for deploying Java webapps with minimal downtime?

Best practices for deploying Java webapps with minimal downtime? - java

When deploying a large Java webapp (>100 MB .war) I'm currently use the following deployment process:
The application .war file is expanded locally on the development machine.
The expanded application is rsync:ed from the development machine to the live environment.
The app server in the live environment is restarted after the rsync. This step is not strictly needed, but I've found that restarting the application server on deployment avoids "java.lang.OutOfMemoryError: PermGen space" due to frequent class loading.
Good things about this approach:
The rsync minimizes the amount of data sent from the development machine to the live environment. Uploading the entire .war file takes over ten minutes, whereas an rsync takes a couple of seconds.
Bad things about this approach:
While the rsync is running the application context is restarted since the files are updated. Ideally the restart should happen after the rsync is complete, not when it is still running.
The app server restart causes roughly two minutes of downtime.
I'd like to find a deployment process with the following properties:
Minimal downtime during deployment process.
Minimal time spent uploading the data.
If the deployment process is app server specific, then the app server must be open-source.
Question:
Given the stated requirements, what is the optimal deployment process?

Update:
Since this answer was first written, a better way to deploy war files to tomcat with zero downtime has emerged. In recent versions of tomcat you can include version numbers in your war filenames. So for example, you can deploy the files ROOT##001.war and ROOT##002.war to the same context simultaneously. Everything after the ## is interpreted as a version number by tomcat and not part of the context path. Tomcat will keep all versions of your app running and serve new requests and sessions to the newest version that is fully up while gracefully completing old requests and sessions on the version they started with. Specifying version numbers can also be done via the tomcat manager and even the catalina ant tasks. More info here.
Original Answer:
Rsync tends to be ineffective on compressed files since it's delta-transfer algorithm looks for changes in files and a small change an uncompressed file, can drastically alter the resultant compressed version. For this reason, it might make good sense to rsync an uncompressed war file rather than a compressed version, if network bandwith proves to be a bottleneck.
What's wrong with using the Tomcat manager application to do your deployments? If you don't want to upload the entire war file directly to the Tomcat manager app from a remote location, you could rsync it (uncompressed for reasons mentioned above) to a placeholder location on the production box, repackage it to a war, and then hand it to the manager locally. There exists a nice ant task that ships with Tomcat allowing you to script deployments using the Tomcat manager app.
There is an additional flaw in your approach that you haven't mentioned: While your application is partially deployed (during an rsync operation), your application could be in an inconsistent state where changed interfaces may be out of sync, new/updated dependencies may be unavailable, etc. Also, depending on how long your rsync job takes, your application may actually restart multiple times. Are you aware that you can and should turn off the listening-for-changed-files-and-restarting behavior in Tomcat? It is actually not recommended for production systems. You can always do a manual or ant scripted restart of your application using the Tomcat manager app.
Your application will be unavailable to users during a restart, of course. But if you're so concerned about availability, you surely have redundant web servers behind a load balancer. When deploying an updated war file, you could temporarily have the load balancer send all requests to other web servers until the deployment is over. Rinse and repeat for your other web servers.

It has been noted that rsync does not work well when pushing changes to a WAR file. The reason for this is that WAR files are essentially ZIP files, and by default are created with compressed member files. Small changes to the member files (before compression) result in large scale differences in the ZIP file, rendering rsync's delta-transfer algorithm ineffective.
One possible solution is to use jar -0 ... to create the original WAR file. The -0 option tells the jar command to not compress the member files when creating the WAR file. Then, when rsync compares the old and new versions of the WAR file, the delta-transfer algorithm should be able to create small diffs. Then arrange that rsync sends the diffs (or original files) in compressed form; e.g. use rsync -z ... or a compressed data stream / transport underneath.
EDIT: Depending on how the WAR file is structured, it may also be necessary to use jar -0 ... to create component JAR files. This would apply to JAR files that are frequently subject to change (or that are simply rebuilt), rather than to stable 3rd party JAR files.
In theory, this procedure should give a significant improvement over sending regular WAR files. In practice I have not tried this, so I cannot promise that it will work.
The downside is that the deployed WAR file will be significantly bigger. This may result in longer webapp startup times, though I suspect that the effect would be marginal.
A different approach entirely would be to look at your WAR file to see if you can identify library JARs that are likely to (almost) never change. Take these JARs out of the WAR file, and deploy them separately into the Tomcat server's common/lib directory; e.g. using rsync.

In any environment where downtime is a consideration, you are surely running some sort of cluster of servers to increase reliability via redundancy. I'd take a host out of the cluster, update it, and then throw it back into the cluster. If you have an update that cannot run in a mixed environment (incompatible schema change required on the db, for example), you are going to have to take the whole site down, at least for a moment. The trick is to bring up replacement processes before dropping the originals.
Using tomcat as an example - you can use CATALINA_BASE to define a directory where all of tomcat's working directories will be found, separate from the executable code. Every time I deploy software, I deploy to a new base directory so that I can have new code resident on disk next to old code. I can then start up another instance of tomcat which points to the new base directory, get everything started up and running, then swap the old process (port number) with the new one in the load balancer.
If I am concerned about preserving session data across the switch, I can set up my system such that every host has a partner to which it replicates session data. I can drop one of those hosts, update it, bring it back up so that it picks the session data back up, and then switch the two hosts. If I've got multiple pairs in the cluster, I can drop half of all pairs, then do a mass switch, or I can do them a pair at a time, depending upon the requirements of the release, requirements of the enterprise, etc. Personally, however, I prefer to just allow end-users to suffer the very occasional loss of an active session rather than deal with trying to upgrade with sessions intact.
It's all a tradeoff between IT infrastructure, release process complexity, and developer effort. If your cluster is big enough and your desire strong enough, it is easy enough to design a system that can be swapped out with no downtime at all for most updates. Large schema changes often force actual downtime, since updated software usually cannot accommodate the old schema, and you probably cannot get away with copying the data to a new db instance, doing the schema update, and then switching the servers to the new db, since you will have missed any data written to the old after the new db was cloned from it. Of course, if you have resources, you can task developers with modifying the new app to use new table names for all tables that are updated, and you can put triggers in place on the live db which will correctly update the new tables with data as it is written to the old tables by the prior version (or maybe use views to emulate one schema from the other). Bring up your new app servers and swap them into the cluster. There are a ton of games you can play in order to minimize downtime if you have the development resources to build them.
Perhaps the most useful mechanism for reducing downtime during software upgrades is to make sure that your app can function in a read-only mode. That will deliver some necessary functionality to your users but leave you with the ability to make system-wide changes that require database modifications and such. Place your app into read-only mode, then clone the data, update schema, bring up new app servers against new db, then switch the load balancer to use the new app servers. Your only downtime is the time required to switch into read-only mode and the time required to modify the config of your load balancer (most of which can handle it without any downtime whatsoever).

My advice is to use rsync with exploded versions but deploy a war file.
Create temporary folder in the live environment where you'll have exploded version of webapp.
Rsync exploded versions.
After successfull rsync create a war file in temporary folder in the live environment machine.
Replace old war in the server deploy directory with new one from temporary folder.
Replacing old war with new one is recommended in JBoss container (which is based on Tomcat) beacause it'a atomic and fast operation and it's sure that when deployer will start entire application will be in deployed state.

Can't you make a local copy of the current web application on the web server, rsync to that directory and then perhaps even using symbolic links, in one "go", point Tomcat to a new deployment without much downtime?

Your approach to rsync the extracted war is pretty good, also the restart since I believe that a production server should not have hot-deployment enabled. So, the only downside is the downtime when you need to restart the server, right?
I assume all state of your application is hold in the database, so you have no problem with some users working on one app server instance while other users are on another app server instance. If so,
Run two app servers: Start up the second app server (which listens on other TCP ports) and deploy your application there. After deployment, update the Apache httpd's configuration (mod_jk or mod_proxy) to point to the second app server.
Gracefully restarting the Apache httpd process. This way you will have no downtime and new users and requests are automatically redirected to the new app server.
If you can make use of the app server's clustering and session replication support, it will be even smooth for users which are currently logged in, as the second app server will resync as soon as it starts. Then, when there are no accesses to the first server, shut it down.

This is dependant on your application architecture.
One of my applications sits behind a load-balancing proxy, where I perform a staggered deployment - effectively eradicating downtime.

Hot Deploy a Java EAR to Minimize or Eliminate Downtime of an Application on a Server or How to “hot” deploy war dependency in Jboss using Jboss Tools Eclipse plugin might have some options for you.
Deploying to a cluster with no downtime is interesting too.
JavaRebel has hot-code deployement too.

If static files are a big part of your big WAR (100Mo is pretty big), then putting them outside the WAR and deploying them on a web server (e.g. Apache) in front of your application server might speed up things. On top of that, Apache usually does a better job at serving static files than a servlet engine does (even if most of them made significant progress in that area).
So, instead of producing a big fat WAR, put it on diet and produce:
a big fat ZIP with static files for Apache
a less fat WAR for the servlet engine.
Optionally, go further in the process of making the WAR thinner: if possible, deploy Grails and other JARs that don't change frequently (which is likely the case of most of them) at the application server level.
If you succeed in producing a lighter WAR, I wouldn't bother of rsyncing directories rather than archives.
Strengths of this approach:
The static files can be hot "deployed" on Apache (e.g. use a symbolic link pointing on the current directory, unzip the new files, update the symlink and voilà).
The WAR will be thinner and it will take less time to deploy it.
Weakness of this approach:
There is one more server (the web server) so this add (a bit) more complexity.
You'll need to change the build scripts (not a big deal IMO).
You'll need to change the rsync logic.

I'm not sure if this answers your question, but I'll just share on the deployment process I use or encounter in the few projects I did.
Similiar to you, I do not ever recall making a full war redeployment or update. Most of the time, my updates are restricted to a few jsp files, maybe a library, some class files. I am able to manage and determine which are the affected artifacts, and usually, we packaged those update in a zip file, along with an update script. I will run the update script. The script does the following:
Backup the files that will be overwritten, maybe to a folder with today's date and time.
Unpackage my files
Stop the application server
Move the files over
Start the application server
If downtime is a concern, and they usually are, my projects are usually HA, even if they are not sharing state but using a router that provide sticky session routing.
Another thing that I am curious would be, why the need to rsync? You should able to know what are the required changes, by determining them on your staging/development environment, not performing delta checks with live. In most cases, you would have to tune your rsync to ignore files anyway, like certain property files that define resources a production server use, like database connection, smtp server, etc.
I hope this is helpful.

At what is your PermSpace set? I would expect to see this grow as well but should go down after collection of the old classes? (or does the ClassLoader still sit around?)
Thinking outloud, you could rsync to a separate version- or date-named directory. If the container supports symbolic links, could you SIGSTOP the root process, switch over the context's filesystem root via symbolic link, and then SIGCONT?

As for the early context restarts. All containers have configuration options to disable auto-redeploy on class file or static resource changes. You probably can't disable auto redeploys on web.xml changes so this file is the last one to update. So if you disable to auto redeploy and update the web.xml as the last one you'll see the context restart after the whole update.

We upload the new version of the webapp to a separate directory, then either move to swap it out with the running one, or use symlinks. For example, we have a symlink in the tomcat webapps directory named "myapp", which points to the current webapp named "myapp-1.23". We upload the new webapp to "myapp-1.24". When all is ready, stop the server, remove the symlink and make a new one pointing to the new version, then start the server again.
We disable auto-reload on production servers for performance, but even so, having files within the webapp changing in a non-atomic manner can cause issues, as static files or even JSP pages could change in ways that cause broken links or worse.
In practice, the webapps are actually located on a shared storage device, so clustered, load-balanced, and failover servers all have the same code available.
The main drawback for your situation is that the upload will take longer, since your method allows rsync to only transfer modified or added files. You could copy the old webapp folder to the new one first, and rsync to that, if it makes a significant difference, and if it's really an issue.

Tomcat 7 has a nice feature called "parallel deployment" that is designed for this use case.
The gist is that you expand the .war into a directory, either directly under webapps/ or symlinked. Successive versions of the application are in directories named app##version, for example myapp##001 and myapp##002. Tomcat will handle existing sessions going to the old version, and new sessions going to the new version.
The catch is that you have to be very careful with PermGen leaks. This is especially true with Grails that uses a lot of PermGen. VisualVM is your friend.

Just use 2 or more tomcat servers with a proxy over it. That proxy can be of apache/nignix/haproxy.
Now in each of the proxy server there is "in" and "out" url with ports are configured.
First copy your war in the tomcat without stoping the service. Once war is deployed it is automatically opened by the tomcat engine.
Note cross check unpackWARs="true" and autoDeploy="true" in node "Host" inside server.xml
It look likes this
<Host name="localhost" appBase="webapps"
unpackWARs="true" autoDeploy="true"
xmlValidation="false" xmlNamespaceAware="false">
Now see the logs of tomcat. If no error is there it means it is up successfully.
Now hit all APIs for testing
Now come to your proxy server .
Simply change the background url mapping with the new war's name. Since registering with the proxy servers like apache/nignix/haProxy took very less time, you will feel minimum downtime
Refer -- https://developers.google.com/speed/pagespeed/module/domains for mapping urls

You're using Resin, Resin has built in support for web app versioning.
http://www.caucho.com/resin-4.0/admin/deploy.xtp#VersioningandGracefulUpgrades
Update: It's watchdog process can help with permgenspace issues too.

Not a "best practice" but something I just thought of.
How about deploying the webapp through a DVCS such as git?
This way you can let git figure out which files to transfer to the server. You also have a nice way to back out of it if it turns out to be busted, just do a revert!

I wrote a bash script that takes a few parameters and rsyncs the file between servers. Speeds up rsync transfer a lot for larger archives:
https://gist.github.com/3985742

Related

How can I set my weblogic's deployment mode to nostage?

My problem is that after every code change I have to build and deploy my Java web application (or at least some parts of it), which takes too much time.
JRebel would do the trick, but my company doesn't have a license for it.
I heard that weblogic's nostage mode can save some time, but how can I configure it?
I've changed my Managed Server's staging mode in the Admin Console, but how can I provide the path to my .wars? Or how can I get this thing work?
Sorry for my lack of knowledge, but I'm pretty new to this topic.

You now configured the default staging mode for new deployments, it would probably be easier to just change this during the individual deployments. If you are using the admin console to deploy it is the section called "Source accessibility".
Basically, in nostage / "I will make the deployment accessible" you tell WebLogic where to find your deployment by passing it a file location - which should be accessible for every targeted server. In the default staging mode (aptly called "stage"), you tell the admin server where to find the files and the admin server copies your files to the managed servers.
Unless your limits are in your bandwidth, I don't think this will save you any time during deployments.

How to sync configuration files in Tomcat cluster?

I have a Tomcat cluster (Apache httpd front-end, proxying a Tomcat cluster), with 2 nodes in the backend, everything on Windows server 2008. The Tomcat nodes serve a webapp, which has some configuration files in their respective instance directories. The configuration files can get written run-time by an administrator. Upon the next restart, the changes are picked up by the webapp.
I wish to synchronize the configuration files in real-time, without delays and should handle a possible split-brain - like the drbd tool does in Linux.
The above described setup is relatively small with only 3 physical servers (Apache load balancer and backend nodes), and using anything like a separate database, hadoop etc. is not very economical. Also the configuration files are relatively small.
From some search, came across many standalone utilities - FreeFileSync, SyncToy, Synkron etc. None of them really fits my criteria.
Programmatic syncing is not very ideal, a split brain scenario can make things messy.
Unfortunately there are no drbd alternatives for Windows, and so here is my question :
What is the easiest/safest and open-sourced way to sync files in real-time in a Windows environment.
Are there any built-in solutions for file syncing for a Tomcat cluster (I couldn't find much from the documentation).
Any other possibilities I can sync the configurations across in a Tomcat cluster ?

If you're not going to access them too often, I would use a shared network folder or a common storage area. Basically, you've to configure each webapp to look up its configuration files from that network path instead of their instance directory.
You could also use Dropbox/Drive/ APIs and read/store the conf there. (You'll have also versioning features as a bonus, too)
Both options are slower than local access, but usually performance isn't an issue when updating conf files.

How to manage deployments to multiple non-clustered tomcat instances

What is the best way to manage deployments of a single web-app to multiple non-clustered tomcat instances.
My ideal solution will support:
A simple API - invoked with
groovy/ant/Rest or similar
Success/failure
notification for all nodes
Atomic
deployments - if the deployments
fails on any node it is rolled back.

We had over 100 clients each running on a dedicated tomcat instance across 5 servers where most would be updated to the latest release at the same time. In our case we used mapped network drives and some tricks with the CATALINA_BASE, but personally I think it may be easier to use WAR deployment via an ANT script to the Tomcat manager if you can get away with it.
For yours you could (at minimum) have a tomcat directory for each instance and each can use the same webapps dir via a network share. Upgrading would still require stopping each instance, updating the single shared dir then starting all instances.
You could also use the tomcat management console (via ANT or other automated process) to manage a scripted local installation or start/stop but this would be not be atomic.

You might want to look at using Tomcat manager from Ant. It might not have all you are asking for, but I guess you could script what you need.

J2EE Cluster: Is there a generic way to handle central configuration?

We develop an application which is normally deployed on a single webserver. Now we check how it runs in a clustered environment, as some customers are using clusters.
The problem is the app creates a local configuration (in registry/file) which does not make any sense in a cluster. The config is changed by the application.
Is there a generic way (like an interface) to make a central configuration, so the config(-file) itself is not duplicated on each node when the app in deployed in a cluster? Any other recommended options? (doing it manually with config on network-share/in database/some MBean?)
why generic? It must run on different application-servers (like tomcat, jboss, Webspere, weblogic ...) so we cannot use some server-specific feature.
Thanks.

Easiest way for central configuration is to put it on the file system. This way you can mount the file system to your OS and make it available to your app server no matter what the brand or version.
We do this for some of our applications. Shared libraries and/or properties files that we care about (in our case). We set up either JVM parms or JNDI environment variables (trying to move toward those) so we can look up the path to the mounted drive at runtime and load the data from the files.
Works pretty slick for us.
Now if you are writing information, that's a different story. As then you have to worry about how you are running your cluster (is it highly available only? load-balanced?). Is the app running in both clusters as if it was one app? Or is it running independently on each cluster node? If so, then you might have to worry about concurrent writes. Probably better to go with a database or one of the other solutions as mentioned above.
But if all you are doing is reading configuration, then I would opt for the mounted file system as it is simplest.

You may use a library like Commons Configuration and choose an implementation which is cluster-friendly like JDBC or JNDI.

I would consider JDBC and JDNI first, however if you want your servers to be able to run independantly, I would suggest a file distrubtion system like subversion/git/mercurial i.e. if your central configuration servers is down or unavailable, you don't want production to stop.
A version controlled system provides a history of who made what changes when and controlled releases (and roll back of releases)
One way to avoid the issue of the central server adding another point of failure is to use a databasse server which you already depend on (assuming you have one) on the basis that if its not running, you won't be working anyway.

Best practices in terms of replacing a web service?

So we have a busy legacy web service that needs to be replaced by a new one. The legacy web service was deployed using a WAR file on an apache tomcat server. That is it was copied over into the web apps folder under tomcat and all went well. I have been delegated with the task to replace it and would like to do it ensuring
I have a back up of the old service
the service gets replaced by another WAR file with no down time
Again I know I am being overly cautious however it is production level and I would like everything to go smooth. Step by step instructions would help.

Make a test server
Read tutorials and play around with the test server until it goes smoothly
Replicate what you did on the test server on the prod server.
If this really is a "busy prod server" with "no down time", then you will have some kind of test server that you can get the configuration right on.

... with no down time
If you literally mean zero downtime, then you will need to replicate your webserver and implement some kind of front-end that can transparently switch request streams to different servers. You will also need to deal with session migration.
If you mean with minimal downtime, then most web containers support hot redeployment of webapps. However, this typically entails an automatic shutdown and restart of the webapp, which may take seconds or minutes, depending on the webapp. Furthermore there is a risk of significant memory leakage; e.g. of permgen space.
The fallback is a complete shutdown / restart of the web container.
And it goes without saying that you need:
A test server that replicates your production environment.
A rigorous procedure for checking that deployments to your test environment result in a fully functioning system.
A preplanned, tested and hopefully bomb-proof procedure for rolling back your production system in the event of a failed deployment.
All of this (especially rollback) gets a lot more complicated when you system includes other stuff apart from the webapp; e.g. databases.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.