How can I reduce Google App Engine datastore latency? - java

Through appstats, I can see that my datastore queries are taking about 125ms (api and cpu combined), but often there are long latencies (e.g. upto 12000ms) before the queries are executed.
I can see that my latency from the datastore is not related to my query (e.g. the same query/data has vastly different latencies), so I'm assuming that it's a scheduling issue with app engine.
Are other people seeing this same problem ?
Is there someway to reduce the latency (e.g. admin console setting) ?
Here's a screen shot from appstats. This servlet has very little cpu processing. It does a getObjectByID and then does a datastore query. The query has an OR operator so it's being converted into 3 queries by app engine.
.
As you can see, it takes 6000ms before the first getObjectByID is even executed. There is no processing before the get operation (other than getting pm). I thought this 6000ms latency might be due to an instance warm-up, so I had increased my idle instances to 2 to prevent any warm-ups.
Then there's a second latency around a 1000ms between the getObjectByID and the query. There's zero lines of code between the get and the query. The code simply takes the result of the getObjectByID and uses the data as part of the query.
The grand total is 8097ms, yet my datastore operations (and 99.99% of the servlet) are only 514ms (45ms api), though the numbers change every time I run the servlet. Here is another appstats screenshot that was run on the same servlet against the same data.
Here is the basics of my java code. I had to remove some of the details for security purposes.
user = pm.getObjectById(User.class, userKey);
//build queryBuilder.append(...
final Query query = pm.newQuery(UserAccount.class,queryBuilder.toString());
query.setOrdering("rating descending");
query.executeWithArray(args);
Edited:
Using Pingdom, I can see that GAE latency varies from 450ms to 7,399ms, or 1,644% difference !! This is with two idle instances and no users on the site.

I observed very similar latencies (in the 7000-10000ms range) in some of my apps. I don't think the bulk of the issue (those 6000ms) lies in your code.
In my observations, the issue is related to AppEngine spinning up a new instance. Setting min idle instances may help mitigate but it will not solve it (I tried up to 2 idle instances), because basically even if you have N idle instances app engine will prefer spinning up dynamic ones even when a single request comes in, and will "save" the idle ones in case of crazy traffic spikes. This is highly counter-intuitive because you'd expect it to use the instance that are already around and spin up dynamic ones for future requests.
Anyway, in my experience this issue (10000ms latency) very rarely happens under any non-zero amount of load, and many people had to revert to some king of pinging (possibly cron jobs) every couple of minutes (used to work with 5 minutes but lately instances are dying faster so it's more like a ping every 2 mins) to keep dynamic instances around to serve users who hit the site when no one else is on. This pinging is not ideal because it will eat away at your free quota (pinging every 5 minutes will eat away more than half of it) but I really haven't found a better alternative so far.
In recap, in general I found that app engine is awesome when under load, but not outstanding when you just have very few (1-3) users on the site.

Appstats only helps diagnose performance issues when you make GAE API/RPC calls.
In the case of your diagram, the "blank" time is spent running your code on your instance. It's not going to be scheduling time.
Your guess that the initial delay may be because of instance warm-up is highly likely. It may be framework code that is executing.
I can't guess at the delay between the Get and Query. It may be that there's 0 lines of code, but you called some function in the Query that takes time to process.
Without knowledge of the language, framework or the actual code, no one will be able to help you.
You'll need to add some sort of performance tracing on your own in order to diagnose this. The simplest (but not highly accurate) way to do this is to add timers and log timer values as your code executes.

Related

Hazelcast Java - Query latency maximum optimization

I am currently working on a context where the application uses Hazelcast. The paradigm used is not embedded, therefore server-client is used.
I am having a flow where on a distributed map is executed a query.
After all the optimizations I could think of, different combinations with memory format, query cache, indexes etc. The most I could achieve was around ~10 milliseconds latency, which I know it sounds fast for a single operation.
The issue is that the current application is basing some flows on microseconds latency.
So my question is, is that kind of optimization possible for the query engine of Hazelcast. ? Or should I try to focus on maybe updating the business code ?
I am using Hazelcast: 4.2 with a map of around 14 000 items, with a memory count (total) of around 10 MB, so not that big.
The testing is done using local workstation.
So after all the debugging, seems that the query is capped in latency in milliseconds range. Doesn't seem that there is a way to go towards microseconds in the 4.2 version. When using continous query cache, there seem to be some unnecessary serialization carried out, which in certain cases can take 30-40 percent of the total latency, but even without that the total latency will still stay in the milliseconds range.

Reaching memory limit launching a task

I have a long task to run under my App Engine application with a lot of datastore to compute. It worked well with a small amount of data, but since yesterday, I'm suddenly getting more than a million datastore entries to compute per day. After a while running the task (around 2 minutes), it fails with a 202 exit code (HTTP error 500). I really cannot deal with this issue. It is pretty much undocumented. The only information I was able to find is that it probably means that my app is running out of memory.
The task is simple. Each entry in the datastore contains a non-unique string identifier and a long number. The task sums the numbers and stores the identifiers into a set.
My budget is really low since my app is entirely free and without ads. I would like to prevent the app cost to soar. I would like to find a cheap and simple solution to this issue.
Edit:
I read Objectify documentation thoroughly tonight, and I found that the session cache (which ensures entities references consistency) can consume a lot of memory and should be cleared regularly when performing a lot of requests (which is my case). Unfortunately, this didn't help.
It's possible to stay within the free quota but it will require a little extra work.
In your case you should split this operation into smaller batches ( ej process 1000 entities per batch) and queue those smaller tasks to run sequentially during off hours. That should save you form the memory issue and allow you to scale beyond your current entity amount.

Achieving consistent response times in GAE?

When running load tests against my app I am seeing very consistent response times. Once there is a constant level of load on GAE, the mean reponse times get smaller and smaller. But I want to have the same consistency on other apps that receive far fewer requests per second. In those I never need to support more than ~3 requests/second.
Reading the docs makes me think increasing the number of minimum idle instances should result in more consistent response times. But even then clients will still be see higher response times, every time GAE's scheduler thinks more instances are required. I am looking for a setup where users do not see those initial slow requests.
When I increase the number of minimum idle instances to 1, I want GAE to use the one resident instance only. As load increases, it should bring up and warm up new (dynamic) instances. Only once they are warmed up, GAE should send requests to them. But judging from the response times it seems as if client requests arrive in dynamic instances as they are brought up. As a result, those requests take a long time (up to 30 seconds).
Could this happen if my warmup code is incomplete?
Could the first calls on the dynamic instances be so slow because they involve code paths that have not been warmed up yet?
I do not experience this problem during load tests or when enough people are using the app. But my testing environments practically unusable by clients when nobody is using the app yet e.g. in the morning.
Thanks!
Some generic thoughts:
30 seconds startup-time for instances seems very much. We do a lot of initialization (including database-hits), and we have around 5 seconds overhead.
Warmup-Requests aren't guaranteed. If all instances are busy, and the scheduler believes that the request will be answered faster if it starts a new instance instead of queuing it on a busy one, it will do so without wasting time with a warmup-request
I don't think this is an issue of an cold code-path (though i don't know java's hotspot in detail), its probably the (mem-) cache which needs to fill first
I don't know what you meant with "incomplete warmup code"; just check your logs for requests to /_ah/warmup - if there are any, warmup-requests are enabled and working.
Increasing the amount of idle instances beyond the 1-instance mark probably won't help here.
Sadly, there aren't any generic tricks to avoid that, but you could try to
defer initialization-code (doing only the absolute required minimum of instance-startup overhead)
start a backend keeping the (mem-) cache hot
If you don't mind the costs (and don't need automatic scaling for your low-volume application), you could even have all requests served by always-on backends

App Engine app performance test

I have used jMeter for testing my appengine app performance.
I have created a thread group of
500 users,
ramp up period: 0 seconds
and loop to 1
and ran the test.
It created 4 instances in app engine. But interesting thing is, > 450 requests were processed by a single instance.
I have ran the test again with this instances up, still most of the requests (> 90%) were going to same instance.
Instance type: F1 Class
Max Idle Instances: ( Automatic )
Min Pending Latency: ( Automatic )
I'm getting much higher latency.
What's going wrong here?
Generating load from 1 IP , is there any problem?
Your problem is you are not using a realistic ramp up value. AppEngine, like most auto-scaling solutions, requires a reasonable amount of time to spin up new hardware. During this process while it is creating the new instances latency can increase if there was a large and sudden increase in traffic.
Choose a ramp up value that is representative of the sort of spikes / surges you realistically expect to see on Production and then run the test. Use the values from this test to decide how many appEngine instances you would like to be 'always on', the higher this value the lower any impact from a surge but obviously the higher your costs.
When you say "I'm getting much higher latency" what exactly are you getting? Do you consider it to be too slow?
If latency is an issue then you can reduce the max pending latency in the application settings. If you try this I imagine you will see your requests spread across the instances more.
My guess is simply that the 2-3 idle instances have spun up in anticipation of increased load but are actually not needed for your test.
It was totally app engine's issue...
see this issue reported at appengine's issue tracker
Spread your requests into different thread groups, and the instances will be utilised. I'm not sure why this happens. I was't able to find any definitive information that explains this.
(I wonder if maybe App Engine sees the requests from a single thread group as requests originating from a common origin, so it places all of the utilised resources in the same instance, so that the output can be most efficiently passed back to the originator of the requests.)

How to increase the performance of a loop which runs for every 'n' minutes

Giving small background to my requirement and what i had accomplished so far:
There are 18 Scheduler tasks run at regular intervals (least being 30 mins) takes input of nearly 5000 eligible employees run into a static method for iteration and generates a mail content for that employee and mails. An average task takes about 9 min multiplied by 18 will be roughly 162 mins meanwhile there would be next tasks which will be in queue (I assume).
So my plan is something like the below loop
try {
// Handle Arraylist of alerts eligible employees
Iterator employee = empIDs.iterator();
while (employee.hasNext()) {
ScheduledImplementation.getInstance().sendAlertInfoToEmpForGivenAlertType((Long)employee.next(), configType,schedType);
}
} catch (Exception vEx)
{
_log.error("Exception Caught During sending " + configType + " messages:" + configType, vEx);
}
Since I know how many employees would come to my method I will divide the while loop into two and perform simultaneous operations on two or three employees at a time. Is this possible. Or is there any other ways I can improve the performance.
Some of the things I had implemented so far
1.Wherever possible made methods static and variables too
Didn't bother to catch exceptions and send back because these are background tasks.
(And I assume this improves performance)
Get the DB values in one query instead of multiple hits.
If am successful in optimizing the while loop I think i can save couple of mins.
Thanks
Wherever possible make methods static
and variables too
Don't bother to catch exceptions and send back because these are background tasks. (And I assume this improves performance)
That just makes for very bad code style, nothing to do with performance.
Get the DB values in one query instead of multiple hits.
Yes, this can be quite essential.
But in general, learn how to use a profiler to see where your code actually spends its time so that you can improve those parts, rather than making random "optimizations" based on hearsay, as you seem to be doing now.
Using static & final improves performance only when you are not using a decent JIT-compilers. Since most JREs are already using a good JIT-compiler you can ignore final and static for performance.
Better check your locking and synchronization style. A good indicator for locking problems is the CPU usage of your application. If it is low when it should be hard-working, there may be locks or db-queries blocking your application.
Maybe you can use techniques like Copy-On-Write or ReadWriteLocks.
Also check out the concurrent and atomic packages for some ideas how to improve or eliminate locking.
If there's one CPU-core with high load while others are idling around, try to do things in parallel. An ExecutorService may be helpful here.
Please don't fix anything until you know what needs to be fixed. Michael is right - profile - but I prefer this method.
Get a sense of time scales. When people optimize while loops, they are working at the level of nanoseconds. When you do a DB query, it is at the level of milliseconds. When you send an email and wait for the I/O to complete, you're talking seconds. If you have delays on the order of seconds, doing something that only saves milliseconds or nanoseconds will leave you disappointed in the results. **
So find out (don't guess) what is causing most of the delay, and work from there.
** To give a sense of the time scale, in 1 second a photon can travel most of the way to the moon; in 1 millisecond it can travel across a medium-sized country, and in 1 nanosecond it can travel the length of your foot.
Assuming you have profiled you application first and found that most of the delay is not in your program but somewhere else, e.g. the network/mail server....
You can create a fixed size thread pool e.g. 2-4 threads. and add a task for each mail you would have sent in the current thread.
Review your code, removing constructions which are knowingly slow.
The garbage collector is your friend but it charges you a very high price. You must do your best to get rid of gc as possible. Prefer simple data structures made of primitive types.
Exceptions are another very handy thing which charges you a very high price.
They are very expensive because each time an exception is thrown, the stack trace must be created and populated. Imagine a balance transfer operation which fails in 1% of cases due to lack of funds. Even with this relatively little rate of failures, performance may be severely impacted. See this benchmark.
Review your logic and algorithms. Try to perform operations in parallel.
Use bulk data moves, when possible.
Use bulk database operations and always use PreparedStatements.
Only after that use profiling techniques.
Profiling the code should be the last resort.
If you've done your homework properly, profiling will contribute only marginally.

Categories