Increase/Decrease the number of Worker Role instances in Azure

Increase/Decrease the number of Worker Role instances in Azure - java

I can increase the number of Worker Role (WR) instances directly from Java using the ServiceManagementRest class in the Azure4Java package. See the tutorial Azure Management through Java.
My question is, when I decrease the number of WR instances, can I decide which WR instances shut down? Because, for the Cloud elasticity idea, I would stop the instances in a IDLE status and not the instances in EXECUTING status.
Regards,
Fabrizio

You can't choose which instance(s) to shut down; you simply change the instance count, and the fabric controller takes care of shutting instances down. One reason is due to fault domains and SLA: if you had, say, 4 instances in 2 fault domains, and shut down two instances in fault domain 0, you'd now have 2 instances in fault domain 1. So now you have two instances in the same rack, perhaps, and that rack goes offline. Now you have zero instances running for a period of time.
Dealing with instance shutdown is a common scenario, and the typical pattern for working around this is by taking advantage of queues to buffer your workload, then have worker role instances consume work items from these queues. If you shut down an instance prior to its work being finished, the item eventually reappears on the queue and another instance can do the work.
This pattern requires idempotency, which is sometimes a challenge. With a recent update to Windows Azure queues, you can now modify queue messages, which makes this a bit easier - you can add information to your queue message as you complete various stages of your work item processing. Then, if your instance is shut down before work is completed, the next worker to pick it up can resume from a point other than "start."
One more detail: you should be able to handle the Stopping event, and tell the "instance being stopped" to stop reading from the queue (maybe set a flag). Then, override OnStop(), and wait for in-process operations to complete before returning. If the still-in-process operations will take more than 5 minutes, you might have to get creative...

You cannot control which instance will shut down, however it is almost always (as far as I've seen) the instance with the highest number suffix. I.e. if you have IN_0, IN_1 and IN_2 and you close an instance it will most likely be IN_2 that shuts down. Maybe you can use this trend to your advantage?
What is probably least obstructive is if you wait for a time of day when the worker roles are less busy to reduce instances?

I think it's wise to never assume what will happen next is based on an instance id. I tend to spread roles over services (scaleunits), starting and stopping when required - pattern used to control 4000 nodes.

Related

how to serialize multi-threaded program

I have many threads performing different operations on object and when nearly 50% of the task finished then I want to serialize everything(might be I want to shut down my machine ).
When I come back then I want to start from the point where I had left.
How can we achieve?
This is like saving state of objects of any game while playing.
Normally we save the state of the object and retrieve back. But here we are storing its process's count/state.
For example:
I am having a thread which is creating salary excel sheet for 50 thousand employee.
Other thread is creating appraisal letters for same 50 thousand employee.
Another thread is writing "Happy New Year" e-mail to 50 thousand employee.
so imagine multiple operations.
Now I want to shut down in between 50% of task finishes. say 25-30 thousand employee salary excel-sheet have been written and appraisal letters done for 25-30 thousand and so on.
When I will come back next day then I want to start the process from where I had left.
This is like resume.

I'm not sure if this might help, but you can achieve this if the threads communicate via in-memory queues.
To serialize the whole application, what you need to do is to disable the consumption of the queues, and when all the threads are idle you'll reach a "safe-point" where you can serialize the whole state. You'll need to keep track of all the threads you spawn, to know if they are in are idle.
You might be able to do this with another technology (maybe a java agent?) that freezes the JVM and allows you to dump the whole state, but I don't know if this exists.

well its not much different than saving state of object.
just maintain separate queues for different kind of inputs. and on every launch (1st launch or relaunch) check those queues, if not empty resume your 'stopped process' by starting new process but with remaining data.
say for ex. an app is sending messages, and u quit the app with 10 msg remaining. Have a global queue, which the app's senderMethod will check on every launch. so in this case it will have 10msg in pending queue, so it will continue sending remaining msgs.
Edit:
basically, for all resumable process' say pr1, pr2....prN, maintain queue of inputs, say q1, q2..... qN. queue should remove processed elements, to contain only pending inputs. as soon as u suspend system. store these queues, and on relaunching restore them. have a common routine say resumeOperation, which will call all resumable process (pr1, pr2....prN). So it will trigger the execution of methods with non-0 queues. which in tern replicate resuming behavior.

Java provides the java.io.Serializable interface to indicate serialization support in classes.
You don't provide much information about the task, so it's difficult to give an answer.
One way to think about a task is in terms of a general algorithm which can split in several steps. Each of these steps in turn are tasks themselves, so you should see a pattern here.
By cutting down each algorithms in small pieces until you cannot divide further you get a pretty good idea of where your task can be interrupted and recovered later.
The result of a task can be:
a success: the task returns a value of the expected type
a failure: somehow, something didn't turn right while doing computation
an interrupted computation: the work wasn't finished, but it may be resumed later, and the return value is the state of the task
(Note that the later case could be considered a subcase of a failure, it's up to you to organize your protocol as you see fit).
Depending on how you generate the interruption event (will it be a message passed from the main thread to the worker threads? Will it be an exception?), that event will have to bubble within the task tree, and trigger each task to evaluate if its work can be resumed or not, and then provide a serialized version of itself to the larger task containing it.

I don't think serialization is the correct approach to this problem. What you want is persistent queues, which you remove an item from when you've processed it. Every time you start the program you just start processing the queue from the beginning. There are numerous ways of implementing a persistent queue, but a database comes to mind given the scale of your operations.

Why does GAE spawn new instances even though I have set min idle=1 and pending latency=max?

I have a low/sporadic-load application and the latency caused by starting new instances (around 10s) far exceeds the time needed to process my requests, which typically complete in less than 500ms.
So in order to avoid the latency spikes caused by the spawning of new instances ("loading requests"), I made the following two settings:
set min idle instances = max idle instances = 1, to ensure that there is always one instance running (one instance is enough to handle my traffic); and
set the pending latency to 15s, so that GAE waits for up to 15s for the one resident instance to become free rather than start a new one.
Billing is activated. However, GAE still starts new instances resulting in inacceptable latency. Why is that?
In the logs I can see that my requests always return in less than 500ms; there is no way that a request would get queued up to 15s.
What can I do about this? Any help much appreciated.
Update: my solution was to set up a cron job which issues a request every 5 minutes, to always have a dynamic instance running. As it turned out (see answer below), idle instances are reserved for crazy load spikes, not the low-load scenario that I'm in 99% of the time.

As #koma says, app-engine will create a dynamic instance to keep the number of idle instances constant, but not only it will create a new one but it will also use it immediately instead of using the idle one, on average. If you have a bunch of idle instances app engine will in fact still prefer spinning up dynamic ones even when a single request comes in, and will "save" the idle ones in case of crazy traffic spikes.
This is highly counter-intuitive because you'd expect it to use the instance that are already idling around to serve request and spin up dynamic ones for future requests, but that's not how it works.

If you set min idle instances = 1, it will definitely spawn another instance at first request.... because there is no longer any idle instance (it is busy processing the first request !).
And since a new instance is started, it might as well process some requests and be no longer idle ?
see also Google App Engine Instances keep quickly shutting down

Appengine Backend Task Dispatcher: A More Economical Version

I'm working on a job dispatcher for appengine, and the default scheduler always winds up firing up 3-4 instances that do all the work, some overflow instances that might take thousands of tasks, or only a couple and then sits there burning cpus doing nothing.
My task involves processing jobs for many different sized domains, sometimes there's huge throughput, and other times it's one user with 10,000 models to update; if I turn the normal appengine task scheduler loose, it fails in two ways: 1) backends never shut down, and when memory hits the cap, java gc makes an instance thrash and act like it's almost a zombie yet never shut down {and still take/hold jobs}, and 2) many domains have a single user that takes far longer than all the others to process, and this keeps a backend alive long after the rest of the domain has finished.
These tasks must run throughout the day, and it takes multiple backends to handle fanout, so I can't just dump them all on a B8 and call it a day., so we need a dispatcher to manage how tasks get allocated to backends.
Now, I don't want to pay datastore ops on every task just to save a few minutes of cpu time, so my plan of attack {please critique} is to use a static ConcurrentHashMap in RAM, start each run() in a try, have every deferred task put it's [hashcode, startTime] in at startup and remove(hashcode) in a finally. There will be one such map per backend instance that's running jobs, wrapped in a method, BackendCounter.addToLiveMap(this); it's .size() serves as a running total of how many jobs are alive on that backend {with timestamp to detect zombie jobs that run >10 minutes}. The job dispatcher can fire off a worker thread per instance to monitor how many jobs, excluding itself, are running in that instance, and keep a ranked list in memcache of which instances have how many tasks alive. If one instance drops below a threshold of X live tasks, pick an overflow instance to defer to, then have the method BackendCounter.addToLiveMap(this) throw an exception I can catch to tell jobs to just schedule themselves to a new instance {ChangeInstanceException#getNewTarget()}. This way I can prevent barely-used instances from getting new jobs so they have a chance to shut down, paying only for some memcache ops and fanout only pays a write and delete to static map.
That takes care of problem two, which is the instance-hour killer. As for problem one, which is how to prevent one instance {usually instance 0 and 1} from hitting peak memory and start turning toward the dark side, I am torn between two options.
On the one hand, I can use the expected call to BackendCounter.addToLiveMap(this) throws ChangeInstanceException and simply check memory:
if (((float)Runtime.getRuntime().freeMemory() / Runtime.getRuntime().totalMemory())<0.9) throw new ChangeInstanceException(getOverflowInstance());
This naive approach will simply tell any instance approaching it's memory limit to send all new work elsewhere.
On the other hand, I could keep instance 0 and 1 for handling overflow {and toggle between which of the two gets new jobs to give them chances to shut down}, then send the fanout to instances 2+, which will only run until they drop to say, 10 or 15 jobs in parallel. The fanout is pretty consistent, and only takes a couple minutes, so instances 2, 3 and, at most, 4, will need to turn on, and be given time to turn off while a different instance gets hit with more load.
The only thing I'm afraid of is if jobs starting bouncing from one instance to another, which can probably be overcome with a redirect header limit to skip throwing ChangeInstanceException.
Any thoughts or advice are greatly appreciated.

Greedy threads are grabbing too many JMS messages under WebLogic

We encountered a problem under WebLogic 8.1 that we lived with but could never fix. We often queue up a hundred or more JMS messages, each of which represents a unit of work. Despite the fact that each message is of the same size and looks the same, one may take only seconds to complete while the next one represents 20 minutes of solid crunching.
Our problem is that each of the message driven beans we have doing the work of these messages ends up on a thread that seems to grab ten messages at a time (we think it is being done as a WebLogic optimization to keep from having to hit the queue over and over again for small messages). Then, as one thread after another finishes all of its small jobs and no new ones come in, we end up with a single thread log jammed on a long running piece of work with up to nine other items sitting waiting on it to finish, despite the fact that other threads are free and could start on those units of work.
Now we are at a point where we are converting to WebLogic 10 so it is a natural point to return to this problem and find out if there is any solution that we could implement so that either: a) each thread only grabs one JMS message at a time to process and leaves all the others waiting in the incoming queue, or b) it would automatically redistribute waiting messages (even ones already assigned to a particular thread) out to free threads. Any ideas?

Enable the Forward Delay and provide an appropriate value. This will cause the JMS Queue to redistribute messages to it's peers if they have not been processed in the configured time.
Taking a single message off the queue every time might be overkill - It's all a balance on the number of messages you are processing and what you gauge as an issue.
There are also multiple issues with JMS on WebLogic 10 depending on your setup. You can save yourself a lot of time and trouble by using the latest MP right from the start.

when a Thread is in 'starvation' after getting the resources they can able to execute.The threads which are in starvation called as "greedy thread"

Multiple SingleThreadExecutors for a given application...a good idea?

This question is about the fallouts of using SingleThreadExecutor (JDK 1.6). Related questions have been asked and answered in this forum before, but I believe the situation I am facing, is a bit different.
Various components of the application (let's call the components C1, C2, C3 etc.) generate (outbound) messages, mostly in response to messages (inbound) that they receive from other components. These outbound messages are kept in queues which are usually ArrayBlockingQueue instances - fairly standard practice perhaps. However, the outbound messages must be processed in the order they are added. I guess use of a SingleThreadExector is the obvious answer here. We end up having a 1:1 situation - one SingleThreadExecutor for one queue (which is dedicated to messages emanating from one component).
Now, the number of components (C1,C2,C3...) is unknown at a given moment. They will come into existence depending on the need of the users (and will be eventually disposed of too). We are talking about 200-300 such components at the peak load. Following the 1:1 design principle stated above, we are going to arrange for 200 SingleThreadExecutors. This is the source of my query here.
I am uncomfortable with the thought of having to create so many SingleThreadExecutors. I would rather try and use a pool of SingleThreadExecutors, if that makes sense and is plausible (any ready-made, seen-before classes/patterns?). I have read many posts on recommended use of SingleThreadExecutor here, but what about a pool of the same?
What do learned women and men here think? I would like to be directed, corrected or simply, admonished :-).

If your requirement is that the messages be processed in the order that they're posted, then you want one and only one SingleThreadExecutor. If you have multiple executors, then messages will be processed out-of-order across the set of executors.
If messages need only be processed in the order that they're received for a single producer, then it makes sense to have one executor per producer. If you try pooling executors, then you're going to have to put a lot of work into ensuring affinity between producer and executor.
Since you indicate that your producers will have defined lifetimes, one thing that you have to ensure is that you properly shut down your executors when they're done.

Messaging and batch jobs is something that has been solved time and time again. I suggest not attempting to solve it again. Instead, look into Quartz, which maintains thread pools, persisting tasks in a database etc. Or, maybe even better look into JMS/ActiveMQ. But, at the very least look into Quartz, if you have not already. Oh, and Spring makes working with Quartz so much easier...

I don't see any problem there. Essentially you have independent queues and each has to be drained sequentially, one thread for each is a natural design. Anything else you can come up with are essentially the same. As an example, when Java NIO first came out, frameworks were written trying to take advantage of it and get away from the thread-per-request model. In the end some authors admitted that to provide a good programming model they are just reimplementing threading all over again.

It's impossible to say whether 300 or even 3000 threads will cause any issues without knowing more about your application. I strongly recommend that you should profile your application before adding more complexity
The first thing that you should check is that number of concurrently running threads should not be much higher than number of cores available to run those threads. The more active threads you have, the more time is wasted managing those threads (context switch is expensive) and the less work gets done.
The easiest way to limit number of running threads is to use semaphore. Acquire semaphore before starting work and release it after the work is done.
Unfortunately limiting number of running threads may not be enough. While it may help, overhead may still be to great, if time spent per context switch is major part of total cost of one unit of work. In this scenario, often the most efficient way is to have fixed number of queues. You get queue from global pool of queues when component initializes using algorithm such as round-robin for queue selection.
If you are in one of those unfortunate cases where most obvious solutions do not work, I would start with something relatively simple: one thread pool, one concurrent queue, lock, list of queues and temporary queue for each thread in pool.
Posting work to queue is simple: add payload and identity of producer.
Processing is relatively straightforward as well. First you get get next item from queue. Then you acquire the lock. While you have lock in place, you check if any of other threads is running task for same producer. If not, you register thread by adding a temporary queue to list of queues. Otherwise you add task to existing temporary queue. Finally you release the lock. Now you either run the task or poll for next and start over depending on whether current thread was registered to run tasks. After running the task, you get lock again and see, if there is more work to be done in temporary queue. If not, remove queue from list. Otherwise get next task. Finally you release the lock. Again, you choose whether to run the task or to start over.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.