I am currently working on building a load testing platform with Google App Engine. My eventual goal is to simulate 1 million users sending data to another GAE server application every 10 minutes.
My current implementation is using task queues, where each task represents a user or handful of users. My problem is that GAE is throttling my task queues with an enforced rate well bellow my maximum/desired rate. I have tried simply throwing instances at the problem, and while this helps I still end up with an enforced rate well bellow desired.
However, I know that my application is capable of running tasks faster than the enforced rate. I have witnessed my application successfully running 250+ tasks per second for a period of time, only to have the task queue throttled to 60 or 30 tasks per second a minute later.
I am using basic scaling with a cap of 10 instances for now, and I would like to understand this problem more before increasing the instance count, as cost start running up quite quickly with a high instance count.
Does anyone have more information on why I am being throttled like this, and how to get around this throttling? The only documentation/information/answers I can find to this question simply quote the insufficient documentation, which says something like:
"The enforced rate may be decreased when your application returns a 503 HTTP response code, or if there are no instances able to execute a request for an extended period of time."
I am happy to clarify any questions, thank you in advance for your help I have been wrestling with this problem for about a month.
Related
I have an ec2 instance doing a long running job. The job should take about a week but after a week it is only at 31%. I believe this is due to the low average cpu (less than 1%) and because it very rarely receives a GET request (just me checking the status).
Reason for low cpu:
this Java service performs many GET requests then processes a batch of pages once it has a few hundred (non-arbitrary, there is a reason they are all required first). but to prevent me getting http 429 (too many requests) i must time my GET requests apart using Thread.sleep(x) and synchronization. this results in a very low cpu which spikes every so often.
I think amazons preemptive systems think that my service is arbitrarily waiting, when in actual fact it needs to wake up at a specific moment. I also notice that if i check the status more often then it goes quicker.
How do i stop amazons preemptive system thinking my service isn't doing anything?
I have thought of 2 solutions, however neither seems intuitive:
have another process running to keep the cpu at ~25%. which would only really consist of
while(true){
Thread.sleep(300);
LocalDateTime until = LocalDateTime.now().plusMillis(100);
while(LocalDateTime.now().isBefore(until){
//empty loop
}
}
however this just seems like an unnecessary use of resources.
have a process on my laptop perform a GET request to the aws service every 10 minutes. but one of the reasons i put it on aws was to free up my laptop, although this would be magnitudes less resources of my laptop than having the service run locally.
Is one of these solutions more desirable than the other? Is there another solution which would be more appropriate?
Many Thanks,
edit: note, I use the free-tier services only.
So I am wondering if someone can help me please, I am trying to load test a java rest application with thousands of requests per minute but something is a miss and I can't quite figure out what's happening.
I am executing the load test from my laptop (via terminal) and it's hitting a number of servers in AWS.
I am using the infinite loop function to fire in a lot of requests repeatedly. The 3 thread groups have the same config.
The problem I am having is the CPU is rising very high and the numbers do not match on what I have in production with the same enviornment with regards to CPU etc, the JMeter load test seems to be making the CPU work harder on my test enviorment.
Question 1 - Is my load test too fast for the server?
Question 2 - Is there a way to space out the load test so that I can say 10k rpm exactly?
Question 1 - Is my load test too fast for the server? - we don't know, if you see high CPU usage using 60 threads only and application responds slowly due to high CPU usage it looks like a bottleneck. Another question is the size of the machine, the number of processors and their frequency. So you need to find which function is consuming the CPU cycles using a profiler tool and look for the way to optimize the function
Question 2 - Is there a way to space out the load test so that I can say 10k rpm exactly? - it is, check out Constant Throughput Timer, but be aware of the next 2 facts:
Constant Throughput Timer can only pause the threads to limit JMeter's requests execution speed to the specified number of requests per minute. So you need to make sure to create sufficient number of threads in Thread Group(s) to produce the desired load, 60 might be not enough.
Application needs to be able to respond fast enough, i.e. 10000 requests per minute is approx 166 requests per second, with 60 threads it means that each thread needs to execute 2.7 requests per second which means that response time needs to be 370 ms or less
There are different aspects before we got for 10k requests.
Configure the tests for one user(/thread) and execute. Check for all
the request we are getting a proper response.
Incrementally increase the number of threads from 1 user, 5 users, 10 users, 20
users, 50 users etc.
Try for different duration scenarios like 10mins, 20 mins, 30 mins, 1 hour etc.
Collect metrics like error %, response time, number of request etc..
You can check probable breakpoints like:
CPU utilisaztion of machine getting high(100 %) from where you are executing the tests. in this case, you can setup few machines in master-slave configuration
error % getting high. Server may not be able to respond, so it might have crashed.
response time getting high. server may be getting busy due to the load.
Also, make sure, you have a reliable connectivity and bandwidth. Just imagine, you want to give a huge load, but the connection you have in few kbps. your tests will fail due to this.
Question: How to create a lightweight on-demand instance, preconfigured w/ Java8 and my code, pull a task from a task queue, execute the memory-intensive tasks, and shut itself down. (on-demand, high memory, medium cpu, single task executors)
History: I was successfully using Google App Engine Task Queue in Java for "bursty" processing of relatively rare events - maybe once a week someone would submit a form, the form creates ~10 tasks, the system would chew up some memory and CPU cycles thinking about the tasks for a few minutes, save the results, and the webpage would be polling the backend for completion. It worked great within Google App Engine - Auto scaling would remove all idle instances, Task Queues would handle getting the processing done, I'd make sure not to overload things by setting the max-concurrent-requests=1, and life was good!
But then my tasks got too memory intensive for instance-class: F4_1G 😢 I'd love to pick something with more memory, but that isn't an option. So I need to figure something out.
I think my best bet is to spin up a generic instance using the API com.google.api.services.compute.model.Instance but get stopped there. I'm so spoiled with how easy the Task Queue was to build that I'd hate to get lost in the weeds just to get the higher memory instance - I don't need a cluster, and don't need any sort of reliability!
Is this a docker container thing?
Is it going to be hard auth-wise to pull from the Pull Queue outside of GAE?
Is it crazy to spin up/down an instance (container?) for each task if a task is ~10 minutes?
I found some similar questions, but no answers that quite fit:
How to Consume App Engine Task Queue Via Compute Engine Instance
How do I integrate google app engine, taks queue and google compute engine?
I would have a read about GAE modules. These can be set to use basic scaling so an instance gets created on demand, then expires some time later, set by you in your appengine-web.xml using something such as:
<basic-scaling>
<max-instances>2</max-instances>
<idle-timeout>5m</idle-timeout>
</basic-scaling>
If the module processes requests from a task queue then is has 10 minutes to get its job done, which is probably ample for many tasks.
We have a Google App Engine Java app with 50 - 120 req/s depending on the hour of the day.
Our frontend appengine-web.xml is like that :
<instance-class>F1</instance-class>
<automatic-scaling>
<min-idle-instances>3</min-idle-instances>
<max-idle-instances>3</max-idle-instances>
<min-pending-latency>300ms</min-pending-latency>
<max-pending-latency>1.0s</max-pending-latency>
<max-concurrent-requests>100</max-concurrent-requests>
</automatic-scaling>
Usually 1 frontend instance manages to handle around 20 req/s. Start up time is around 15s.
I have a few questions :
When I change the frontend Default version, I get thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request.
So, to avoid that, I switch from one version to the other using the Traffic splitting feature by IP address, going from 1 to 100% by steps of 5%, it takes around 5 minutes to do it properly and avoid massive 500 errors. Moreover, that feature seems only available for the default frontend module.
-> Is there a better way to switch versions ?
To avoid thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request., We must use at least 3 Resident (min-idle) instances. And as our traffic grows, even with 3, we sometimes still get massive Error 500. Am I supposed to go to 4 residents? I thought App Engine was nice because you only pay for the instances you use, so if in order to work properly we need at least half our running instances in Idle mode, that's not great, is it? It's not really cost effective as when the load is low, still having 4 idle instances is a big waste :( What's weird is that they seem to wait only 10s before responding 500 : pending_ms=10248
-> Do you have advices to avoid that ?
Quite often, we also get thousands of Error 500 - A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104). I don't understand, there aren't any exceptions, and we get hundreds of them for a few seconds.
-> Do you have advices to avoid that ?
Thanks a lot in advance for your help ! ;)
Those error messages are mostly related to loading requests which last too long in being loaded and therefore they finish in something similar to a DeadlineExceededException, which affects dramatically the performance and users experience as you probably already know.
This is a very often issue specially when using DI frameworks with google app engine, and so far it´s an unavoidable and serious unfixed issue when using automatic scaling, which is the scaling policy that App Engine provides for hanndling public requests since its inception.
Try changing the Frontend Instance Class to F2, specially if your memory consumption is higher than 128MB per instance, and set to 15s the min-max pending latency so your requests will get more chances to be processed by a resident instance. However you will get still long response times for some requests, since Google App Engine may not issue a warmup request every time your application needs a new instance, and I undestand that F4 would break the bank.
I have given some thought on how to calculate how many users I can handle with Heroku and one dyno.
But to figure it out I need some input.
And I must say that the official documentation isn't nice to navigate and interpreter so I haven't read it all. My complaints about it are that it doesn't describe things very well. Sometimes it describes old stacks, sometimes it's ruby specific, sometime it isn't described at all and so on.
So I need some input on how Heroku, Cedar stack, works regarding requests to make my calculations.
You are more than welcome to correct me on my assumptions as I am relatively new to dyno theory.
Lets say I have a controller that takes a request and calculate a JSON response in 10ms locally will I be able to serve 100request a second?
As I understand the cedar stack doesn't have a fronting caching solution, many questions arises.
Does static content requests take up dyno time?
Does transfer time count to request time.
Can one dyno solution transfer many response to a request at the same time if the request requires small CPU utilization.
Some of the question is intertwined so a combined answer or other thought is valued.
An example:
Static HTML page.
<HTML>...<img><css><script>...
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
...</HTML>
Can I serve (1000ms / (10ms x 4)) = 25HTML pages a second?
This assumes that static content isn't provided by a dyno.
This assumes that transfer time isn't blamed on the dyno.
If this isn't the case I would be a catastrophe. Lets say a mobile phone in Africa makes 10 request and have a 10sec transfer time then my App will be unavailable for over 1½ minute.
I can only really answer the first question: Static assets most certainly do take up dyno time. In fact, I think it's best to keep all static assets, including stylesheets and JS on an asset server when using heroku's free package. (If everyone did that, heroku would benefit and so would you). I recommend using the asset_sync gem to handle that. The Readme does explain that there are one or two, easily resolved, current issues.
Regarding your last point, sorry if I'm misinterpreting here, but a user in south africa might take 10 seconds to have their request routed to Heroku, but most of that time is probably spent trafficking around the maze of telephone exchanges between SA and USA. Your dyno is only tied up for the portion of the request that takes place inside Heroku's servers, not the 9.9 seconds your request spent getting there. So effectively Heroku is oblivious to whether your request is coming from South Africa or Sweden.
There are all sorts of things you can do to speed your app up: Caching, more dynos, Unicorn with several workers
You're making two wrong assumptions. The good news is that your problem becomes much simpler once you think about things differently.
First off remember that a dyno is a single process, not a single thread. If you're using Java then you'll be utilizing many request threads. Therefore you don't have to worry about your application being unavailable while a request is being processed. You'll be able to process requests in parallel.
Also when talking about dyno time that refers to the amount of time that your process is running not just request processing time. So a web process that is waiting for a request still consumes dyno time since the process is up while it waits for requests. This is why you get 750 free dyno hours a month. You'll be able to run a single dyno for the entire month (720 hours).
As far as computing how many requests your application can serve per second the best way to do that is to test it. You can use New Relic to monitor your application while you load test it with JMeter or whatever your favorite load testing program is: http://devcenter.heroku.com/articles/newrelic