Appengine Backend Task Dispatcher: A More Economical Version

Appengine Backend Task Dispatcher: A More Economical Version - java

I'm working on a job dispatcher for appengine, and the default scheduler always winds up firing up 3-4 instances that do all the work, some overflow instances that might take thousands of tasks, or only a couple and then sits there burning cpus doing nothing.
My task involves processing jobs for many different sized domains, sometimes there's huge throughput, and other times it's one user with 10,000 models to update; if I turn the normal appengine task scheduler loose, it fails in two ways: 1) backends never shut down, and when memory hits the cap, java gc makes an instance thrash and act like it's almost a zombie yet never shut down {and still take/hold jobs}, and 2) many domains have a single user that takes far longer than all the others to process, and this keeps a backend alive long after the rest of the domain has finished.
These tasks must run throughout the day, and it takes multiple backends to handle fanout, so I can't just dump them all on a B8 and call it a day., so we need a dispatcher to manage how tasks get allocated to backends.
Now, I don't want to pay datastore ops on every task just to save a few minutes of cpu time, so my plan of attack {please critique} is to use a static ConcurrentHashMap in RAM, start each run() in a try, have every deferred task put it's [hashcode, startTime] in at startup and remove(hashcode) in a finally. There will be one such map per backend instance that's running jobs, wrapped in a method, BackendCounter.addToLiveMap(this); it's .size() serves as a running total of how many jobs are alive on that backend {with timestamp to detect zombie jobs that run >10 minutes}. The job dispatcher can fire off a worker thread per instance to monitor how many jobs, excluding itself, are running in that instance, and keep a ranked list in memcache of which instances have how many tasks alive. If one instance drops below a threshold of X live tasks, pick an overflow instance to defer to, then have the method BackendCounter.addToLiveMap(this) throw an exception I can catch to tell jobs to just schedule themselves to a new instance {ChangeInstanceException#getNewTarget()}. This way I can prevent barely-used instances from getting new jobs so they have a chance to shut down, paying only for some memcache ops and fanout only pays a write and delete to static map.
That takes care of problem two, which is the instance-hour killer. As for problem one, which is how to prevent one instance {usually instance 0 and 1} from hitting peak memory and start turning toward the dark side, I am torn between two options.
On the one hand, I can use the expected call to BackendCounter.addToLiveMap(this) throws ChangeInstanceException and simply check memory:
if (((float)Runtime.getRuntime().freeMemory() / Runtime.getRuntime().totalMemory())<0.9) throw new ChangeInstanceException(getOverflowInstance());
This naive approach will simply tell any instance approaching it's memory limit to send all new work elsewhere.
On the other hand, I could keep instance 0 and 1 for handling overflow {and toggle between which of the two gets new jobs to give them chances to shut down}, then send the fanout to instances 2+, which will only run until they drop to say, 10 or 15 jobs in parallel. The fanout is pretty consistent, and only takes a couple minutes, so instances 2, 3 and, at most, 4, will need to turn on, and be given time to turn off while a different instance gets hit with more load.
The only thing I'm afraid of is if jobs starting bouncing from one instance to another, which can probably be overcome with a redirect header limit to skip throwing ChangeInstanceException.
Any thoughts or advice are greatly appreciated.

Related

Prevent from slow job taking over a thread pool

I have a system where currently every job has it's own Runnable class and I pre defined a fixed number of threads for every job.
My understanding is that it is a wrong practice, because:
You have to tailor the number of threads with respect to the machine running the process.
Each threads can only take one type of job.
Would you agree on that? (current solution is wrong)
So, I'd like to use something like Java's ThreadPool instead. I was conflicted with an argument claiming that by doing so, slow jobs will take over most of the thread pool, leaving no place to the other jobs. Whereas, with the current solution, a fixed number of threads were assigned to the slow worker and it won't hurt the others.
(Notice that you can't know a-priori if a job will be "slow")
How can a system be both adaptive in the number of threads it uses, but at the same time not be bounded to the most slow job?

You could try getting the time it takes for the job to complete (With a hand-made Timer class of sorts. Then you normalize this value by dividing this time by the maximum time any given thread has taken. Finally, you multiply this number by a fixed number which varies depending on how many threads you want running per job per second. This will be the requested amount of threads this process should be using. You can adjust that according.
Edit: You can set minimum and maximum values that regulate how many threads a job is entitled to. You could alternatively request threads from a very spacious job when another thread enters the system.
Hope that helps!

It's more of a business problem. Let's say I am a telecom operator. I bar my subscribers from making outgoing calls when they don't clear their dues. When they make payment I clear a flag and in a second the subscriber can make calls. But a lot of other activities go on in my system like usage processing, billing, bill formatting etc.
Now let's assume I have a system wide common pool of threads and I started the billing of 50K subscribers. All my threads are now processing the relatively long running billing jobs and a huge queue is building up.
A poor customer now makes a payment and wants to make an urgent call. But I have no thread left in my pool to clear the flag. The customer had to wait for an hour before he can make the call. That's SLA breach.
What I should have done is create separate thread pools. If the call unblocking jobs are not very frequent and short, I can create a separate pool for it with core size 5 maybe. For billing jobs I'd rather create a pool with core size 25 and max-size 30.
So, my system limits won't anyway exceed because I know in even the worst situation I won't have more than 30 threads.
This will also make it easy to debug. If I have a different thread name pattern for each pool amd my system has some issues. I can easily take a thread dump and understand if the billing or the payment stuff is the culprit.
So, I think the existing design is based on some business use case which you need to thoroughly understand before proposing a solution.

How to decide on the ThreadPoolTaskExecutor pools and queue sizes?

This is may be more general question, on how to decide on the thread pool size, but let's use the Spring ThreadPoolTaskExecutor for this case. I have the following configuration for the pool core and max size and the queue capacity. I've already read about what all these configurations mean - there is a good answer here.
#SpringBootApplication
#EnableAsync
public class MySpringBootApp {
public static void main(String[] args) {
ApplicationContext ctx = SpringApplication.run(MySpringBootApp.class, args);
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(25);
return executor;
}
}
The above numbers look random to me and I want to understand how to set them up correctly based on my environment. I will outline the following constraints that I have:
the application will be running on a two-core CPU box
the executor will work on a task which usually takes about 1-2
seconds to finish.
Usually I expect 800/min tasks to be submitted to my executor, spiking at 2500/min
The task will construct some objects and make an HTTP call to Google pubsub.
Ideally I'd like to understand what other constraints I need to consider and based on them what will be a reasonable configuration for my pools and queue sizes.

Update : This answer got a few votes over the years so I'm adding a shortened version for people who don't have the time to read my weird metaphor :
TL;DR answer :
The actual constraint is that a (logical) CPU core can only run a single thread at the same time. Thus :
Number of core : Number of logical core of your CPUs * 1/(ratio_of_time_your_thread_is_runnable_when_doing_your_task)
So, if you have 8 logical cores on your machine, you can safely put 8 threads in your threadPool (well, remember to exclude the other threads that may be used). Then you need to ask yourself if you can put more : you need to benchmark the kind of task you intend to run on your threadpool : if you notice the thread are, on average running only 50% of the time, that means your CPU is free to go work on another thread 50% of its time and you can add more threads.
Queue size : as many as you can wait on.
The queue size is the number of items your threadPool will accept before rejecting them. It is business logic. It depends on what behavior you expect : is there a point accepting a billion tasks ? When do you throw the towel ?
If one task takes one second to complete, and you have 10 threads, that means that the 10,000th task in queue will hopefully be done in 1000 seconds. Is that acceptable ?
The worst thing to happen is having clients timeout and re-submit the same tasks before you could complete the firsts.
Original ELI12 answer :
It may not be the most accurate answer, but I'll try :
A simple approach is to be aware that your 2-core CPU will only work on two threads at the same time.
If you have relatively modern Intel CPU, and you have Hyper Threading (aka. HT(TM), HTT(TM), SMT) turned on (via setting in BIOS), your operating system will see the number of available cores as double the number of the physical cores within your CPU.
Either way, from Java to detect how many cores (or simultaneous not-preempting each other threads) you can work with, just call int cores = Runtime.getRuntime().availableProcessors();
If you try to see your application as a Workshop (an actual one) :
A processor would be represented by an employee. It is the physical unit that will add value to a product.
A task would be a lump of raw material (plus some instructions list)
Your thread is a desk on which the employee can put the task on and work.
The queue size is the length of the conveyor belt that brings the raw materials to the desk.
Thus, your question becomes "How can I choose how many desks and how long can my conveyor belt be inside my factory, given an unchanging number of employees ?".
For the how many desks (Threads) part :
An employee can only work at one desk at a time, and you can only have a single employee per desk. Thus, the basic setup would be to have at least as many desks as you have employees (to avoid having any employee (Processor) left out without any possibility to work.
But, depending on your activity, you may afford more desks per employee :
If your employees are expected to put mail inside enveloppes constantly, an operation that require their full attention (in programing : sorting collections, creating objects, incrementing counters), having more desks wouldn't help, and may even be detrimental because your employee would have to
sometime change desk (switching context, which takes some time), thus leaving the one they were working on, to make work progress on the other.
But, if your task is making pottery, and relies on your employee waiting for the clay to cook in an oven (understand getting access to external resource, such as a file system, a web service etc), your employee can afford to go model clay on another desk and get back to the first one later.
Thus, you can afford more desks per employee as long as your task have a active work/waiting ratio (running/waiting) big enough. And the number of desks being how many tasks can your employee make progress on during the waiting time.
For the conveyor belt (queue) size part :
The queue size represents how many item you are allowing to be queued before starting to reject any more task (by throwing an exception), thus being the threshold at which you start to tell "ok, I'm already overbooked and won't ever be able to comply"
First, I'd say your conveyer belt needs to fit inside the workshop. Meaning that the collection should be small enough to prevent out of memory errors (obviously).
After that, it is based on your company policy. Let's assume a task is added to the belt every time a client makes an order (another service call your API). If the caller doesn't care how much time you take to comply and trust you enough with the execution, there's no point in limiting the size of the belt.
But if you can expect that your client gets annoyed after waiting for their pottery for a month, and leaves you for a concurrent or reorder another pottery, assuming the first order was lost and won't be bothered to ever check if the first order was completed... That first order was done for nothing, you won't get payed, and if your client makes another order whenever you're too slow to comply, you'll enter in a feedback loop because every new order will slow down the whole process.
Thus, in that case, you should put up a sign telling your client "sorry, we're overbooked, you shouldn't make any new order now, as we won't be able to comply within an acceptable time range".
Then, the queue size would be : acceptable time range / time to complete a task.
Concrete Example : if your client service expects that the task it submits would have to be completed in less than 100 seconds, and knowing that every task takes 1-2 seconds, you should limit the queue to 50-100 tasks because once you have 100 tasks waiting in the queue, you're pretty sure that the next one won't be completed in less than 100 seconds, thus rejecting the task to prevent the service from waiting for nothing.

how to serialize multi-threaded program

I have many threads performing different operations on object and when nearly 50% of the task finished then I want to serialize everything(might be I want to shut down my machine ).
When I come back then I want to start from the point where I had left.
How can we achieve?
This is like saving state of objects of any game while playing.
Normally we save the state of the object and retrieve back. But here we are storing its process's count/state.
For example:
I am having a thread which is creating salary excel sheet for 50 thousand employee.
Other thread is creating appraisal letters for same 50 thousand employee.
Another thread is writing "Happy New Year" e-mail to 50 thousand employee.
so imagine multiple operations.
Now I want to shut down in between 50% of task finishes. say 25-30 thousand employee salary excel-sheet have been written and appraisal letters done for 25-30 thousand and so on.
When I will come back next day then I want to start the process from where I had left.
This is like resume.

I'm not sure if this might help, but you can achieve this if the threads communicate via in-memory queues.
To serialize the whole application, what you need to do is to disable the consumption of the queues, and when all the threads are idle you'll reach a "safe-point" where you can serialize the whole state. You'll need to keep track of all the threads you spawn, to know if they are in are idle.
You might be able to do this with another technology (maybe a java agent?) that freezes the JVM and allows you to dump the whole state, but I don't know if this exists.

well its not much different than saving state of object.
just maintain separate queues for different kind of inputs. and on every launch (1st launch or relaunch) check those queues, if not empty resume your 'stopped process' by starting new process but with remaining data.
say for ex. an app is sending messages, and u quit the app with 10 msg remaining. Have a global queue, which the app's senderMethod will check on every launch. so in this case it will have 10msg in pending queue, so it will continue sending remaining msgs.
Edit:
basically, for all resumable process' say pr1, pr2....prN, maintain queue of inputs, say q1, q2..... qN. queue should remove processed elements, to contain only pending inputs. as soon as u suspend system. store these queues, and on relaunching restore them. have a common routine say resumeOperation, which will call all resumable process (pr1, pr2....prN). So it will trigger the execution of methods with non-0 queues. which in tern replicate resuming behavior.

Java provides the java.io.Serializable interface to indicate serialization support in classes.
You don't provide much information about the task, so it's difficult to give an answer.
One way to think about a task is in terms of a general algorithm which can split in several steps. Each of these steps in turn are tasks themselves, so you should see a pattern here.
By cutting down each algorithms in small pieces until you cannot divide further you get a pretty good idea of where your task can be interrupted and recovered later.
The result of a task can be:
a success: the task returns a value of the expected type
a failure: somehow, something didn't turn right while doing computation
an interrupted computation: the work wasn't finished, but it may be resumed later, and the return value is the state of the task
(Note that the later case could be considered a subcase of a failure, it's up to you to organize your protocol as you see fit).
Depending on how you generate the interruption event (will it be a message passed from the main thread to the worker threads? Will it be an exception?), that event will have to bubble within the task tree, and trigger each task to evaluate if its work can be resumed or not, and then provide a serialized version of itself to the larger task containing it.

I don't think serialization is the correct approach to this problem. What you want is persistent queues, which you remove an item from when you've processed it. Every time you start the program you just start processing the queue from the beginning. There are numerous ways of implementing a persistent queue, but a database comes to mind given the scale of your operations.

Why does GAE spawn new instances even though I have set min idle=1 and pending latency=max?

I have a low/sporadic-load application and the latency caused by starting new instances (around 10s) far exceeds the time needed to process my requests, which typically complete in less than 500ms.
So in order to avoid the latency spikes caused by the spawning of new instances ("loading requests"), I made the following two settings:
set min idle instances = max idle instances = 1, to ensure that there is always one instance running (one instance is enough to handle my traffic); and
set the pending latency to 15s, so that GAE waits for up to 15s for the one resident instance to become free rather than start a new one.
Billing is activated. However, GAE still starts new instances resulting in inacceptable latency. Why is that?
In the logs I can see that my requests always return in less than 500ms; there is no way that a request would get queued up to 15s.
What can I do about this? Any help much appreciated.
Update: my solution was to set up a cron job which issues a request every 5 minutes, to always have a dynamic instance running. As it turned out (see answer below), idle instances are reserved for crazy load spikes, not the low-load scenario that I'm in 99% of the time.

As #koma says, app-engine will create a dynamic instance to keep the number of idle instances constant, but not only it will create a new one but it will also use it immediately instead of using the idle one, on average. If you have a bunch of idle instances app engine will in fact still prefer spinning up dynamic ones even when a single request comes in, and will "save" the idle ones in case of crazy traffic spikes.
This is highly counter-intuitive because you'd expect it to use the instance that are already idling around to serve request and spin up dynamic ones for future requests, but that's not how it works.

If you set min idle instances = 1, it will definitely spawn another instance at first request.... because there is no longer any idle instance (it is busy processing the first request !).
And since a new instance is started, it might as well process some requests and be no longer idle ?
see also Google App Engine Instances keep quickly shutting down

Increase/Decrease the number of Worker Role instances in Azure

I can increase the number of Worker Role (WR) instances directly from Java using the ServiceManagementRest class in the Azure4Java package. See the tutorial Azure Management through Java.
My question is, when I decrease the number of WR instances, can I decide which WR instances shut down? Because, for the Cloud elasticity idea, I would stop the instances in a IDLE status and not the instances in EXECUTING status.
Regards,
Fabrizio

You can't choose which instance(s) to shut down; you simply change the instance count, and the fabric controller takes care of shutting instances down. One reason is due to fault domains and SLA: if you had, say, 4 instances in 2 fault domains, and shut down two instances in fault domain 0, you'd now have 2 instances in fault domain 1. So now you have two instances in the same rack, perhaps, and that rack goes offline. Now you have zero instances running for a period of time.
Dealing with instance shutdown is a common scenario, and the typical pattern for working around this is by taking advantage of queues to buffer your workload, then have worker role instances consume work items from these queues. If you shut down an instance prior to its work being finished, the item eventually reappears on the queue and another instance can do the work.
This pattern requires idempotency, which is sometimes a challenge. With a recent update to Windows Azure queues, you can now modify queue messages, which makes this a bit easier - you can add information to your queue message as you complete various stages of your work item processing. Then, if your instance is shut down before work is completed, the next worker to pick it up can resume from a point other than "start."
One more detail: you should be able to handle the Stopping event, and tell the "instance being stopped" to stop reading from the queue (maybe set a flag). Then, override OnStop(), and wait for in-process operations to complete before returning. If the still-in-process operations will take more than 5 minutes, you might have to get creative...

You cannot control which instance will shut down, however it is almost always (as far as I've seen) the instance with the highest number suffix. I.e. if you have IN_0, IN_1 and IN_2 and you close an instance it will most likely be IN_2 that shuts down. Maybe you can use this trend to your advantage?
What is probably least obstructive is if you wait for a time of day when the worker roles are less busy to reduce instances?

I think it's wise to never assume what will happen next is based on an instance id. I tend to spread roles over services (scaleunits), starting and stopping when required - pattern used to control 4000 nodes.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.