I have given some thought on how to calculate how many users I can handle with Heroku and one dyno.
But to figure it out I need some input.
And I must say that the official documentation isn't nice to navigate and interpreter so I haven't read it all. My complaints about it are that it doesn't describe things very well. Sometimes it describes old stacks, sometimes it's ruby specific, sometime it isn't described at all and so on.
So I need some input on how Heroku, Cedar stack, works regarding requests to make my calculations.
You are more than welcome to correct me on my assumptions as I am relatively new to dyno theory.
Lets say I have a controller that takes a request and calculate a JSON response in 10ms locally will I be able to serve 100request a second?
As I understand the cedar stack doesn't have a fronting caching solution, many questions arises.
Does static content requests take up dyno time?
Does transfer time count to request time.
Can one dyno solution transfer many response to a request at the same time if the request requires small CPU utilization.
Some of the question is intertwined so a combined answer or other thought is valued.
An example:
Static HTML page.
<HTML>...<img><css><script>...
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
AjaxCall //dyno processing time 10ms
...</HTML>
Can I serve (1000ms / (10ms x 4)) = 25HTML pages a second?
This assumes that static content isn't provided by a dyno.
This assumes that transfer time isn't blamed on the dyno.
If this isn't the case I would be a catastrophe. Lets say a mobile phone in Africa makes 10 request and have a 10sec transfer time then my App will be unavailable for over 1½ minute.
I can only really answer the first question: Static assets most certainly do take up dyno time. In fact, I think it's best to keep all static assets, including stylesheets and JS on an asset server when using heroku's free package. (If everyone did that, heroku would benefit and so would you). I recommend using the asset_sync gem to handle that. The Readme does explain that there are one or two, easily resolved, current issues.
Regarding your last point, sorry if I'm misinterpreting here, but a user in south africa might take 10 seconds to have their request routed to Heroku, but most of that time is probably spent trafficking around the maze of telephone exchanges between SA and USA. Your dyno is only tied up for the portion of the request that takes place inside Heroku's servers, not the 9.9 seconds your request spent getting there. So effectively Heroku is oblivious to whether your request is coming from South Africa or Sweden.
There are all sorts of things you can do to speed your app up: Caching, more dynos, Unicorn with several workers
You're making two wrong assumptions. The good news is that your problem becomes much simpler once you think about things differently.
First off remember that a dyno is a single process, not a single thread. If you're using Java then you'll be utilizing many request threads. Therefore you don't have to worry about your application being unavailable while a request is being processed. You'll be able to process requests in parallel.
Also when talking about dyno time that refers to the amount of time that your process is running not just request processing time. So a web process that is waiting for a request still consumes dyno time since the process is up while it waits for requests. This is why you get 750 free dyno hours a month. You'll be able to run a single dyno for the entire month (720 hours).
As far as computing how many requests your application can serve per second the best way to do that is to test it. You can use New Relic to monitor your application while you load test it with JMeter or whatever your favorite load testing program is: http://devcenter.heroku.com/articles/newrelic
Related
I have an ec2 instance doing a long running job. The job should take about a week but after a week it is only at 31%. I believe this is due to the low average cpu (less than 1%) and because it very rarely receives a GET request (just me checking the status).
Reason for low cpu:
this Java service performs many GET requests then processes a batch of pages once it has a few hundred (non-arbitrary, there is a reason they are all required first). but to prevent me getting http 429 (too many requests) i must time my GET requests apart using Thread.sleep(x) and synchronization. this results in a very low cpu which spikes every so often.
I think amazons preemptive systems think that my service is arbitrarily waiting, when in actual fact it needs to wake up at a specific moment. I also notice that if i check the status more often then it goes quicker.
How do i stop amazons preemptive system thinking my service isn't doing anything?
I have thought of 2 solutions, however neither seems intuitive:
have another process running to keep the cpu at ~25%. which would only really consist of
while(true){
Thread.sleep(300);
LocalDateTime until = LocalDateTime.now().plusMillis(100);
while(LocalDateTime.now().isBefore(until){
//empty loop
}
}
however this just seems like an unnecessary use of resources.
have a process on my laptop perform a GET request to the aws service every 10 minutes. but one of the reasons i put it on aws was to free up my laptop, although this would be magnitudes less resources of my laptop than having the service run locally.
Is one of these solutions more desirable than the other? Is there another solution which would be more appropriate?
Many Thanks,
edit: note, I use the free-tier services only.
I'm trying to better understand the way Google's Cloud Console Stackdriver Trace shows call details and to debug some performance issues for my app.
Most requests work heavily with memcache set/get operations and I'm having some issues here, but what I don't understand is why there's a long time gap between calls. I have uploaded 2 screenshots.
So, as you can see, the call #1025ms took 2ms, but there's more than 4 seconds between it and the urlfetch call #5235ms.
First of all, my code is not intensive at that point (and the full requests shows about 9000ms of untraced time), and second, most similar requests that run the same code do not have these gaps (ie. repeating the request doesn't have the same behavior). But I also see this issue on other requests as well and I cannot reproduce them.
Please advise!
EDIT:
I have uploaded another screenshot from appstats. It is a "normal" request that usually takes a few hundred ms to run (max 1s), and also in localhost (development). I cannot manage to find anything to take the debug further. I feel like I am missing something simple, something at base level, regarding to the DOs and DO NOTs of app engine.
I'm aware of the following common causes of such gaps ("untraced time"):
The request is actually CPU-bound during these gaps.
To check for this issue, go to the logs viewer
and view the details of the affected incoming HTTP request. Note that
there's also a convenient direct link from the trace details to the log
entry. In the request log entry, look for the cpu_ms field, which states the
CPU milliseconds required to fulfill the request. This is the number of milliseconds spent by the CPU actually executing your application code, expressed in terms of a baseline 1.2 GHz Intel x86 CPU. If the CPU actually used is faster than the baseline, the CPU milliseconds can be larger than the actual clock time [..]. (doc).
This metric is also available in protoPayload.megaCycles.
Here's an example log entry of a slow request with substantial untraced time:
2001:... - - [02/Mar/2017:19:20:22 +0100] "GET / HTTP/1.1" 200 660 - "Mozilla/5.0 ..." "example.com" ms=4966 **cpu_ms=11927** cpm_usd=7.376e-8 loading_request=1 instance=... app_engine_release=1.9.48 trace_id=...
The cpu_ms field is unusally high (11927) for this example request, and indicates that most of the untraced time was spent in the application itself (or the runtime).
Why is request handler using that much CPU? Typically, it's next to impossible to tell exactly where the CPU time was spent, but if you know what is supposed to happen in a given request, you can narrow it down more easily. Two common causes are:
It's the very first request to a newly started App Engine instance. The JVM needs to load classes and JIT-compile hot methods - this is expected to significantly impact the first request (and potentially a few more). Look for loading_request=1 in the request log entry to check if your request was slow because of this. Consider Configuring Warmup Requests to Improve Performance.
Protip, in case you want to focus your investigation filter out such loading requests in the logs viewer, apply this advanced filter:
protoPayload.megaCycles > 10000 and protoPayload.wasLoadingRequest=false
Some parts of the application code are massively slowed down by excessive use of reflection. This is specific to the App Engine Standard Environment where a security manager restricts the usage of reflection. Only mitigation is to use less reflection. Note that the App Engine serving infrastructure is constantly evolving, so this hint may be hopefully outdated sooner than later.
If the issue is reproducible locally in the dev appserver, you can use a profiler (or maybe just jstack) to narrow it down. In some other cases, I literally had to incrementally bisect the application code, add more log statements, redeploy, etc., until the offending code was identified.
There are actually untraced calls to backends that are not covered out of the box by Stackdriver Trace in the App Engine Standard Environment. The only example I'm aware of so far is Cloud SQL. Consider using Google Cloud Trace for JDBC to get interactions with Cloud SQL traced, too.
The application is multithreaded (great!) and experiences some self-inflicted synchronisation issues. Examples I've seen in the wild:
Application-specific synchronization forces all requests to the storage backend to be serialized (for a given App Engine instance). Nothing sticks out in the traces, except those mysterious gaps...
The application uses a database connection pool. The number of parallel requests exceeds the capacity of the pool (for a given App Engine instance), some requests have to wait until a connection becomes available. This is a more sophisticated variation of the previous item.
Given that this is occurring infrequently and that the actual processing times (indicated by the span lengths) are short, my suspicion is that some kind of App Engine scaling action is occurring in the background. For example, the slowdown may be caused by a new instance being added to your application. You can dig into this further by looking at the activity graph on the App Engine dashboard or by using AppStats (see this SO post).
Showing App Engine events in the trace timeline view is something that we've been wanting to do for a while, as it would dramatically shorten the analysis process for situations like this.
I am currently working on building a load testing platform with Google App Engine. My eventual goal is to simulate 1 million users sending data to another GAE server application every 10 minutes.
My current implementation is using task queues, where each task represents a user or handful of users. My problem is that GAE is throttling my task queues with an enforced rate well bellow my maximum/desired rate. I have tried simply throwing instances at the problem, and while this helps I still end up with an enforced rate well bellow desired.
However, I know that my application is capable of running tasks faster than the enforced rate. I have witnessed my application successfully running 250+ tasks per second for a period of time, only to have the task queue throttled to 60 or 30 tasks per second a minute later.
I am using basic scaling with a cap of 10 instances for now, and I would like to understand this problem more before increasing the instance count, as cost start running up quite quickly with a high instance count.
Does anyone have more information on why I am being throttled like this, and how to get around this throttling? The only documentation/information/answers I can find to this question simply quote the insufficient documentation, which says something like:
"The enforced rate may be decreased when your application returns a 503 HTTP response code, or if there are no instances able to execute a request for an extended period of time."
I am happy to clarify any questions, thank you in advance for your help I have been wrestling with this problem for about a month.
We have a Google App Engine Java app with 50 - 120 req/s depending on the hour of the day.
Our frontend appengine-web.xml is like that :
<instance-class>F1</instance-class>
<automatic-scaling>
<min-idle-instances>3</min-idle-instances>
<max-idle-instances>3</max-idle-instances>
<min-pending-latency>300ms</min-pending-latency>
<max-pending-latency>1.0s</max-pending-latency>
<max-concurrent-requests>100</max-concurrent-requests>
</automatic-scaling>
Usually 1 frontend instance manages to handle around 20 req/s. Start up time is around 15s.
I have a few questions :
When I change the frontend Default version, I get thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request.
So, to avoid that, I switch from one version to the other using the Traffic splitting feature by IP address, going from 1 to 100% by steps of 5%, it takes around 5 minutes to do it properly and avoid massive 500 errors. Moreover, that feature seems only available for the default frontend module.
-> Is there a better way to switch versions ?
To avoid thousands of Error 500 - Request was aborted after waiting too long to attempt to service your request., We must use at least 3 Resident (min-idle) instances. And as our traffic grows, even with 3, we sometimes still get massive Error 500. Am I supposed to go to 4 residents? I thought App Engine was nice because you only pay for the instances you use, so if in order to work properly we need at least half our running instances in Idle mode, that's not great, is it? It's not really cost effective as when the load is low, still having 4 idle instances is a big waste :( What's weird is that they seem to wait only 10s before responding 500 : pending_ms=10248
-> Do you have advices to avoid that ?
Quite often, we also get thousands of Error 500 - A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104). I don't understand, there aren't any exceptions, and we get hundreds of them for a few seconds.
-> Do you have advices to avoid that ?
Thanks a lot in advance for your help ! ;)
Those error messages are mostly related to loading requests which last too long in being loaded and therefore they finish in something similar to a DeadlineExceededException, which affects dramatically the performance and users experience as you probably already know.
This is a very often issue specially when using DI frameworks with google app engine, and so far it´s an unavoidable and serious unfixed issue when using automatic scaling, which is the scaling policy that App Engine provides for hanndling public requests since its inception.
Try changing the Frontend Instance Class to F2, specially if your memory consumption is higher than 128MB per instance, and set to 15s the min-max pending latency so your requests will get more chances to be processed by a resident instance. However you will get still long response times for some requests, since Google App Engine may not issue a warmup request every time your application needs a new instance, and I undestand that F4 would break the bank.
I'm hosting a java app app on app engine, or some reason i see that from today the response times are extremely slow - 10kms + !!! the gae status page shows everything is ok, Does anyone have an answer or similar experience ?
Second issue, i see that many request starts only a few seconds after it has been received, there is a delay in executing the request, does anyone know how i can fix it ?
p.s
I changed my instances from f1 to f2 to see if maybe it will help but the result is the same.
Thank you
The GAE Google group is likely still the best place to ask questions like this.
Could it be that you are just seeing an increased number of warmup requests? In this case going from F1 to F2 will not make a huge difference. Depending on your application instance startup can be reduced by changing the instance class. But this change alone will not reduce the time to a more reasonable response time of ~1 second.
The following best practices allow you to reduce the duration of loading requests:
Load only the code needed for startup.
Access the disk as little as possible.
In some cases, loading code from a zip or jar file is faster than loading from many separate files.
You can also try to add a few resident instances. The GAE scheduler will then put extra traffic on resident instances and launch new dynamic instances in the background. Since residents are started ahead of time this will hide some latency from users.