GAE Java automatic scaling min-idle-instances not working? - java

I have Java GAE app with modules. Default front-end module is marked as
<automatic-scaling>
<min-idle-instances>1</min-idle-instances>
</automatic-scaling>
however when I check chart of instances for last 24 hours, I see that there is a period where no instance was running. I'd expect that min-idle-instances would set minimal number of running instances.
Is min-idle-instances not working ? Or is instance chart not working ? (By instance chart I mean chart accessible from Dashboad). Or do I get concept of min-idle-instances wrong?
Current GAE version is 1.9.11

I actually ran into similar questions regarding min-idle-instances. Turns out the min-idle-instance isn't EXACTLY what it sounds like.
I don't know for your project, but for us, the min actually meant it wouldn't STOP instances to go lower than min. It can still have fewer instances running.
Example (with 5 min idle instances).
0 running instances -> enough requests to boot up 3 instances -> requests finish, still at 3 instances -> more request that now need 6 instances -> requests finish, one instance spins down, we are now at our minimum and won't go lower.
Is it something like that you experience or is your instance actually spinning down?
[EDIT] The actual problem in this case had to do with maximum daily budgets. When the budget was hit, the instance went down to save costs.

Related

Problems maintaining trivial RPS numbers with jMeter for API testing

I am doing API load testing with JMeter. I have a Macbook Air (client) connected with ethernet to a machine being tested with the load (server).
I wanted to do a simple test. Hit the server with 5 requests per second (RPS). I create a concurrency thread group with 60 threads, a throughput shaping timer with 5 RPS for one minute, my HTTP request and hit the play button and run the test.
I expect to see my Hits per Second listener indicating a flat line of 5 hits per second, instead I see a variable rate, starting with 5 and then dropping to 2 and then later to 4... Sometimes there is more than the specified 5 RPS (e.g. 6 RPS) the point is that it's not a constant 5. It's too much of a variable rate - it's all over the place. And I don't get any errors.
My server, takes between 500ms to 3s to return an answer based on how much load is present - this is what I am testing. What I want to achieve with this test is to return as much as possible a response in 500ms time under load and I am not getting that. I have to start wondering if it's JMeter's fault in some way, but that's a topic for another day.
When I replace my HTTP sample request with a dummy sampler, I get the RPS I desire.
I thought I had a problem with JMeter resources, so I change heap size/memory to 1GB, use the -XX:+ DisableExplicitGC and -d64 flag and run in CLI mode. I never got any errors, not before setting the flags and not after. Also, I believe that 5 RPS is a small number so I don't expect resources to be a problem.
Something worth noting is that sometimes, the threads start executing towards the end of the test rather than at the start, I find this very odd behaviour.
What's next? Time to move to a new tool?

Queueing tasks via JMS

I would like to make a question to the comunity and get as many feedbacks as possible about an strategy I have been thinking, oriented to resolve some issues of performance in my project.
The context:
We have an important process that perform 4 steps.
An entity status change and its persistence
If 1 ends OK. Entity is exported into a CSV file.
If 2 ends OK. Entity is exported into another CSV. This one with way more Info.
If 3 ends OK. The last CSV is sent by mail
Steps 1 and 2 are linked and they are critical.
Steps 3 and 4 are not critical. Doesn't even care if they ends successfully.
Performance of 1-2 is fine, but 3-4 in some escenarios are just insanely slow. Mostly cause step 3.
If we execute all the steps as a sequence, some times step 3 causes a timeout. Client do not get any response about steps 1 and 2 (the important ones) and user don't know whats going on.
This case made me think in JMS queues in order to delegate the last 2 steps to another app/process. Deallocate the notification from the business logic. Second export and mailing will be processed when posible and probably in parallel. I could also split it in 2 queues: exports, mail notification.
Our webapp runs into a WebLogic 11 cluster, so I could use its implementation.
What do you think about the strategy? Is WebLogic JMS implementation anything good? Should I check another implementation? ActiveMQ, RabbitMQ,...
I have also thinking on tiketing system implementation with spring-tasks.
At this point I have to point at spring-batch. Its usage is limited. We have already so many jobs focused on important processes of data consolidation and the window of time for allocation of more jobs is limited. Plus the impact of to try to process all items massively at once.
May be we could if we find out a way to use the multithread of spring-batch but we didn't find yet the way to fit oír requirements into such strategy.
Thank you in advance and excuse my english. I promise to keep working hard on it :-).
One problem to consider is data integrity. If step n fails, does step n-1 need to be reversed? Is there any ordering dependencies that you need to be aware of? And are you writing to the same or different CSV? If the same, then might have contention issues.
Now, back to the original problem. I would consider Java executors, using 4 fixed-sized pools and move the task through the pools as successes occur:
Submit step 1 to pool 1, getting a Future back, which will be used to check for completion.
When step 1 completes, you submit step 2 to pool 2.
When step 2 completes, you now can return a result to the caller. The call to this point has been waiting (likely with a timeout so it doesn't hang around forever) but now the critical tasks are done.
After returning to the client, submit step 3 to pool 3.
When step 3 completes, submit step to pool 4.
The pools themselves, while fixed sized, could be larger for pool 1/2 to get maximum throughput (and to get back to your client as quickly as possible) and pool 3/4 could be smaller but still large enough to get the work done.
You could do something similar with JMS, but the issues are similar: you need to have multiple listeners or multiple threads per listener so that you can process at an appropriate speed. You could do steps 1/2 synchronously without a pool, but then you don't get some of the thread management that executors give you. You still need to "schedule" steps 3/4 by putting them on the JMS queue and still have listeners to process them.
The ability to recover from server going down is key here, but Executors/ExecutorService has not persistence, so then I'd definitely be looking at JMS (and then I'd be queuing absolutely everything up, even the first 2 steps) but depending on your use case it might be overkill.
Yes, an event-driven approach where a message bus makes the integration sounds good. They are asynch so you will not have timeout. Of course you will need to use a Topic. WLS has some memory issues when you have too many messages in the server, maybe a different server would work better for separation of concerns and resources.

Java - How can I ensure I run a single instance of a process in a clustered environment

I have a jvm process that wakes a thread every X minutes.
If a condition is true -> it starts a job (JobA).
Another jvm process does almost the same but if the condition is true -
it throws a message to a message broker which triggers the job in another server (JobB).
Now, to avoid SPOF problem I want to add another instance of this machine in my cloud.
But than I want ensure I run a single instance of a JobA each time.
What are my options?
There are a number of patterns to solve this common problem. You need to choose based on your exact situation and depending on which factor has more weight in your case (performance, correctness, fail-tolerance, misfires allowed or not, etc). The two solution-groups are:
The "Quartz" way: you can use a JDBCStore from the Quartz library which (partially) was designed for this very reason. It allows multiple nodes to communicate, and share state and workload between each other. This solution gives you a probably perfect solution at the cost of some extra coding and setting up a shared DB (9 tables I think) between the nodes.
Alternatively your nodes can take care of the distribution itself: locking on a resource (single record in a DB for example) can be enough to decide who is in charge for that iteration of the execution. Sharing previous states however will require a bit more work.

Why does GAE spawn new instances even though I have set min idle=1 and pending latency=max?

I have a low/sporadic-load application and the latency caused by starting new instances (around 10s) far exceeds the time needed to process my requests, which typically complete in less than 500ms.
So in order to avoid the latency spikes caused by the spawning of new instances ("loading requests"), I made the following two settings:
set min idle instances = max idle instances = 1, to ensure that there is always one instance running (one instance is enough to handle my traffic); and
set the pending latency to 15s, so that GAE waits for up to 15s for the one resident instance to become free rather than start a new one.
Billing is activated. However, GAE still starts new instances resulting in inacceptable latency. Why is that?
In the logs I can see that my requests always return in less than 500ms; there is no way that a request would get queued up to 15s.
What can I do about this? Any help much appreciated.
Update: my solution was to set up a cron job which issues a request every 5 minutes, to always have a dynamic instance running. As it turned out (see answer below), idle instances are reserved for crazy load spikes, not the low-load scenario that I'm in 99% of the time.
As #koma says, app-engine will create a dynamic instance to keep the number of idle instances constant, but not only it will create a new one but it will also use it immediately instead of using the idle one, on average. If you have a bunch of idle instances app engine will in fact still prefer spinning up dynamic ones even when a single request comes in, and will "save" the idle ones in case of crazy traffic spikes.
This is highly counter-intuitive because you'd expect it to use the instance that are already idling around to serve request and spin up dynamic ones for future requests, but that's not how it works.
If you set min idle instances = 1, it will definitely spawn another instance at first request.... because there is no longer any idle instance (it is busy processing the first request !).
And since a new instance is started, it might as well process some requests and be no longer idle ?
see also Google App Engine Instances keep quickly shutting down

Increase/Decrease the number of Worker Role instances in Azure

I can increase the number of Worker Role (WR) instances directly from Java using the ServiceManagementRest class in the Azure4Java package. See the tutorial Azure Management through Java.
My question is, when I decrease the number of WR instances, can I decide which WR instances shut down? Because, for the Cloud elasticity idea, I would stop the instances in a IDLE status and not the instances in EXECUTING status.
Regards,
Fabrizio
You can't choose which instance(s) to shut down; you simply change the instance count, and the fabric controller takes care of shutting instances down. One reason is due to fault domains and SLA: if you had, say, 4 instances in 2 fault domains, and shut down two instances in fault domain 0, you'd now have 2 instances in fault domain 1. So now you have two instances in the same rack, perhaps, and that rack goes offline. Now you have zero instances running for a period of time.
Dealing with instance shutdown is a common scenario, and the typical pattern for working around this is by taking advantage of queues to buffer your workload, then have worker role instances consume work items from these queues. If you shut down an instance prior to its work being finished, the item eventually reappears on the queue and another instance can do the work.
This pattern requires idempotency, which is sometimes a challenge. With a recent update to Windows Azure queues, you can now modify queue messages, which makes this a bit easier - you can add information to your queue message as you complete various stages of your work item processing. Then, if your instance is shut down before work is completed, the next worker to pick it up can resume from a point other than "start."
One more detail: you should be able to handle the Stopping event, and tell the "instance being stopped" to stop reading from the queue (maybe set a flag). Then, override OnStop(), and wait for in-process operations to complete before returning. If the still-in-process operations will take more than 5 minutes, you might have to get creative...
You cannot control which instance will shut down, however it is almost always (as far as I've seen) the instance with the highest number suffix. I.e. if you have IN_0, IN_1 and IN_2 and you close an instance it will most likely be IN_2 that shuts down. Maybe you can use this trend to your advantage?
What is probably least obstructive is if you wait for a time of day when the worker roles are less busy to reduce instances?
I think it's wise to never assume what will happen next is based on an instance id. I tend to spread roles over services (scaleunits), starting and stopping when required - pattern used to control 4000 nodes.

Categories