CloudBees Service Level Agreement(s) and Capabilities Service

CloudBees Service Level Agreement(s) and Capabilities Service - java

I have been comparing Java PaaSes carefully and am really starting to like CloudBees. I only have one big concern with them, and that is their SLA/uptime.
After scouring through all of their documentation, I can only find one paper they offer on SLAs which states:
If you are using the CloudBees PaaS without taking advantage of high availability options, then CloudBees can only offer uptime that approaches the base uptime SLA of
the infrastructure cloud provider.
As the same paper also mentions, Amazon seems to offer a 99.95% uptime, and I know that CloudBees runs - largely - on AWS/EC2 instances itself.
So this spawns a number of closely-related SLA questions:
If I don't take advantage of "high availability" options, then can I assume that CloudBees doesn't even guarantee 99.95%? Or is there documentation elsewhere that does state what their uptime is, and remedies for failing to meet that uptime?
What High Availability options are they talking about here? I just read their entire developer docs and never saw anything about HA.
What are my remedies if a partner service (like SendGrid for mail, or MemCachier for caching) goes down? One thing I do like about GAE is its CapabilitiesService where, before you go to use their Email API, or Caching API, you first check with the master CapabilitiesService to make sure those services are operating. I'd like to do the same with CloudBees, but seems like I'd need to build it myself. That's fine, but not sure if CloudBees even offers a mechanism (API call, etc.) to determine if a particular service partner is on or offline.
Thanks in advance!

CloudBees does not offer an SLA on availability nor remedies in the form of credits if a particular level of uptime is not met in a month. This is AFAIK common for other offerings on AWS (e.g., Heroku). CloudBees does offer standard response-time based SLAs via a support agreement. As discussed in the white paper you reference, we also employ practices for our own usage of AWS and external providers that has helped to isolate our users from some specific Amazon issues.
The availability features you can make use of include:
Using multiple instances (and potentially auto-scale). App instances are spread by CloudBees across different EC2 instances, so you can avoid downtime in the event of an EC2 instance failure.
Using the session store. You can share session state in a separate tier from your app instance using our offering or a partner offering like Memcachier.
Using dedicated servers that CloudBees sets up in multiple AWS availability zones.
Ensuring the database used with your app is set up in a highly available configuration. For example, RDS is simple to use with CloudBees and supports standbys and read replicas in multiple AZs.
Using app monitoring solutions from partners like New Relic and AppDynamics to alert you of any issues.
The main point of the comment about using "high availability options" was to warn people that simply deploying an app on CloudBees does not make it highly available. If an EC2 instance fails underneath your single-instance deployment, your users will experience downtime while our internal machinery redeploys to a working instance, whereas a multi-instance deployment will likely only experience slower responses until a new instance is deployed. Similarly with single-instance databases without standbys or replicas across AZs. While this is just stating the blindingly obvious for a lot of people, you might be surprised how many people just assume some magic is happening.
Good point on the CapabilitiesService! We have some ideas kicking around in this area, but you would have to do something like this on your own for now.

Related

Google Cloud Platform: are my architectural solutions correct?

I'm trying to make simple application and deploy it on Google Cloud Platform Flexible App Engine, which will contain two main parts:
Front end application (simple Web UI based on Java 8 (Spring + Thymeleaf) with OAuth authorization from different external sites)
Back end application (monitoring several resources in separate threads, based on logged in users and reacting to their input in a certain way (behavioral changes))
Initially I was planning to make them as one app, but I think that potentially heavy background processing may cause failures in my front end application part + App Engine docs says that deployed services behave similar to microservice architecture.
My questions are:
Do I really need to separate front end from back end, if I need to react to user input as fast as possible? (but delays up to 2 seconds aren't that critical)
If I do need to separate them (and I strongly believe that I do) - how to I set up interaction between applications?
Each resource must be tracked exactly by one thread on back end - what are the best practices about this? I thought about having a SQL table with a list of acquired resources, but the flaw I see there is if an instance will fail I will need to make some kind of clean up on that table and redetermine which resources are actually acquired.

Your proposed architecture sounds like the right approach in separating the two into different services for the following reasons:
Can deploy code for each separately, rollback versions separately, and split traffic separately for experiments or phased rollouts.
Can adjust machine types and memory allocations for each service to better suit its needs. If you're doing work that is memory intensive on the backend, you can adjust that service's settings to allocate more memory per instance.
Allow each type of service to scale independently based on demands, which should result in better utilization of the services and less waste. This should also lower your overall spending than if you tried to go for a one-sized fits all approach in a single monolithic service.
You can mix different runtime environments across services. For example, you can mix language runtimes within a project OR you could even mix between standard and flexible environments. Say your front-end code is more cost efficient in standard, designate that service as a standard environment service and your backend as a flexible environment service. Or say you need a customer docker file with Perl in it, you could do that as a flexible environment custom runtime and have your front-end in Java 8.
You can still share common services like Cloud SQL, PubSub, Cloud Tasks (currently in alpha) or Redis for in memory caching. Your works don't need t reside in App Engine, they could reside in a different product if that better suits your needs.
Overall, you get much better control over your application to split it apart. The biggest benefit likely comes down to optimizing your application for spending only on what you need.

I think that you are likely to be able to deploy everything as an appengine app except if you use some exotic Java libraries that are not whitelisted. It could still be good to deploy it with compute engine for increased configurability and versatility.
You can create one front-end instance and one back-end instance in compute engine and divide the resources between them like that. Google's documentation has an example where you can do that.

Monitor Web application

I made a web based application by using the java language, and I would like to monitor its performance periodically (e.g. response time). Also I want to display this information on the homepage of my application. Is that possible? Can I have any idea about how this can be made.
Thanks.

You can take a look at stagemonitor. It is a open source java web application performance monitor. It captures response time metrics, JVM metrics, request details (including a call stack captured by the request profiler) and more. The overhead is very low.
Optionally, you can use the great timeseries database graphite with it to store a long history of datapoints that you can look at with fancy dashboards.
Example:
Take a look at the github page to see screenshots, feature descriptions and documentation.
Note: I am the developer of stagemonitor

Depending on your environment, I would use a cron job or task that measures the response time to request your app using something like HttpClient. Then drop that information into a database table accessible by your app.
The answer here is the simplest way you can measure the time: How do I time a method's execution in Java?

Why not checkout Munin monitoring? The website says
Munin the monitoring tool surveys all your computers and remembers
what it saw. It presents all the information in graphs through a web
interface. Its emphasis is on plug and play capabilities. After
completing a installation a high number of monitoring plugins will be
playing with no more effort.
SLAC at the Stanford university also keeps a large, quite well sorted list with various solutions for network monitoring among other things. SLACs list of Network Monitoring Tools, check for instance "Public domain or free network monitoring tools".

You can also consider to create your own custom web application monitor. Therfore, use the ProxyPattern and and create a concreate monitor. By using Spring framework you can easily swich on and off the monitor during runtime without re- deployment or restart of the web application. Furthermore you can create a lot of different specific monitors by yourself and are able to control what is beeing monitored. This gives you a maximum of flexibility, but requires a bit of work.

It is possible.
The clearest way to go about it, providing true numbers is to simulate a client that performs some sort of activity that mimics the real usage. Then have that client periodically use the website.
This presupposes that your website has a means to accept inputs that do not impact the real back end business. Crafting such interfaces requires some thought, but is not beyond the ability of a person who could put together the web site in the first place. The key points are to attempt to emulate as much using the real website as possible, but guard against real business impact. Basically it is designing for a special user (the tester).
So you might have a special user that when logged in, all purchases are bound to a special account that actually is filtered out to appropriately not demand payment and not ship goods. Provided the systems you integrate with all share an understanding of this live testing account, you can simultaneously test alongside of real production post-deployment.
Such a structure provides a huge benefit. You get performance of the real, live running system. Performance tends to change over time, and is subject to the environment. By fetching your performance numbers on the live system, in the same environment, you get a much better view of what real users might be encountering. Also, you can differentiate and track performance for different activities.
Yes, it is a lot more to design and set up; however, if you are in it for the long run, the benefits are huge.

I guess JavaMelody is the most appropriate solution for you. It can be built into a Java application and due to this feature, it monitors the functionality inside the app. Using this platform, it’s possible to get much more specific parameters for your Java app, than via external monitoring. In addition, it allows you displaying some statistics on your app homepage. Moreover, you can build in the app the graphs from JavaMelody, that essentially facilitates the app monitoring.
Take a look at the detailed overview of JavaMelody: http://cases.azoft.com/enterprise-system-monitoring-solutions-business-apps/

Java Frameworks in the Cloud

So I'm trying to finally grasp how cloud-based, enterprise applications work, and what their architectures typically look like. Say I use a cloud provider like Amazon. I assume (please correct me if I'm wrong) that I would be paying for 1+ virtual machines that would house a stack of software per my application's needs.
I'm confused with how frameworks like jclouds or Terracotta fit into the picture. jclouds advertises itself as "an open source library that helps you get started in the cloud", and lists off a number of huge features that don't mean much to me without meaningful examples. Terracotta boasts itself as a high-scaling clustering framework. Why would I need to use something like jclouds? What specific, concrete scenarios would I use it for?
Again, if I'm using Amazon as my cloud provider, wouldn't they already be highly-scaled? Why would I need Terracotta in the cloud?

Taking an app "into the cloud" has at least two aspects.
Firstly you have to manage the nodes: deploy your app on all nodes, monitor them, start new nodes to actually scale, detect and replace failed nodes, realize some update scenario for new app versions, and so on. Usually this can't be done reasonably without tools. JClouds fits in here, since it covers some of these points.
Secondly your app itself must be "cloud ready". You can't take an arbitrary app, put it on multiple nodes and expect it to scale well. The main point here is to define how to scale the access to the shared data between all nodes (SQL database, NoSQL datastore, potentially session replication, ...). Usually you use some existent framework/appserver/datastore to manage your shared state. Terracotta is one of them, basically it provides an efficient way to share memory between JVM instances on multiple nodes.

So you have your Linux machine (virtual instance) and it is working OK. But suddenly you need to scale - that is you need to fire up more instances as demand go high and shut them as it goes down. So what you can do is basically use Amazon's API to start EC2 instances - provision them with everything you can do from the administrative console (and even more). But using amazon's API's basically ties your hands to amazon. With frameworks such as JCloud what you do is something like (this is pseudo code):
CloudProvider provider = new CloudProvider.getProvider("Amazon");
provider.authenticate("username", "password");
provider.startInstance("some option", numOfInstances);
So say you have to scale and you are deployed on Amazon using JClouds - you are going to use something like the above BUT suddenly you decide to move from amazon to Rackspace so instead of re-engineering all the logic of your app which has to do with provisioning instances and working with them you can just change the
CloudProvider provider = new CloudProvider.getProvider("Amazon");
to something like
CloudProvider provider = new CloudProvider.getProvider("RackSpace");
and continue using the authenticate method and startInstance but then the library would take of how to actually "translate" this library method to the specific method which the given cloud provider supports. Basically it is a way of abstracting the code which has to deal with the underlying cloud provider - you shouldn't care who it is as long as it is providing the service, right?

Java HA framework

I am writing a small proxy application which should be redundant, e.g. primary proxy will be running on one server and the redundant one will run on a separate server. Is there a simple high-availability framework which I can use to implement this redundancy? For example, this HA framework would send pings between instances and raise some sort of exception or notification on the other instance when the first one goes down.

Building such a system has been my routine job in recent years. I have found jgroups
a very usable tools to receive and handle such kind of grouping events. This is the case if you want to build your own HA infrastructure. I don't know, but maybe in your case just a simple reverse proxy such as HAProxy can be enough.

If you want HA without hassle, just use some load balancer with HA capability e.g. Ultramonkey, LVS with keepalived etc.
In a HA configuration, you'd typically want to use virtual IP, so even if you'd have this ping/notify functionality as a framework, you'll still have stuff to do (start responding to requests to the virtual IP once the other instance has failed). So unless you are looking for a learning occasion, I'd advice using a middleware instead of coding this yourself using frameworks.
There are number of health-checks that you can configure for these middlewares. A simple healthcheck might for example, fire a GET request to your app. periodically and look for a specific string (e.g. "XXX running.") in the response to make sure your app. is running fine.

You don't provide much details about the work your application does, so depending on how stateful it is, whether it can tolerate minor dataloss, is it time-critical, do you value developer time over machine time, you can have a varying spectrum of solutions.
There are some good suggestions above, I'd add: take a look at JMS and persistent messaging. Usually these make recovery quite trivial, but at the cost of latency hit (unless you byu a commercial product and learn it well or pay the vendor to tune your application). With JMS queues you can implement active-active processing and save yourself the headache of failure detection.
Another direction to look at is distributed state management/clustering framework like Gigaspaces, Coherence, Gemstone, Infinispan, Gridgain and Teracotta. These can replicate your data and guarantee varying quality of services levels. Most of them come with some type of failure detection and distributed management mechanism.

hadoop is a good place to start

performance monitoring tools for multi-tenant web application

We have a need to monitor performance of our java web app. We are looking for some tolls which can help us with this task. The major difficulty is that we are SaaS provider with multi-tenant server architecture with hundreds of customers running on the same hardware. So far we tried commercial products like DynaTrace and Coradinat but unfortunately they don't get the job done so far. What we need is a simple report which would tell us if we had performance problems on each customer site in a specified period of time. Mostly it will be response time per customer but also we will need some more specifics based on the URLs.
please let me know if someone had any experience with setting up such monitoring.
Thanks!

Take a look at stagemonitor. It is an open source java web application performance monitoring library capable of multi-tenancy. It captures response time metrics, JVM metrics, request details and more. The overhead is very low. It uses the great timeseries database graphite that automatically downsamples historical datapoints which leads to a low storage overhead.
Here is a screenshot. You can find more on the project site.
Note: I am the developer of stagemonitor

HypericHQ is nice for this because, being written in Java itself, it integrates quite nicely with all the MBean properties already exposed on your APP server. You can set up administrator alerts/charts based on JVM properties/app server MBean properties that most non-Java tools can't get at.
On the downside, it does like to run a relatively heavy (as these things go) agent on your server.
-I am not in any way affiliated with Hyperic Inc ;)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.