My AppEngine project retrieves XML data from a particular link using the GAE URL fetch API. I have used the sample from here except that it looks like this:
InputStream stream;
URLConnection connection = new URL(url).openConnection();
connection.setConnectTimeout(0);
connection.setReadTimeout(0);
stream = connection.getInputStream();
This takes more than 60 seconds (max allowed by the API) and hence causes a DeadlineExceededException. Using TaskQueues for the purpose is also not an option as mentioned here.
Is there any other way someone might have achieved this until now?
Thanks!
Task Queues can be active longer than the AppEngine automatic scaling request response deadline of 1 minute. On automatic scaling, a task can run for 10 minutes. On basic or manual scaling, it can run for 24 hours. See the docs here. (Note that the language python is actually not related to the material - the same is true for Java on GAE, Go, PHP, as well).
Finally, I have to echo what was said by the other users - the latency is almost certainly caused by the endpoint of your URL fetch, not by the network or app engine. You can also check this for sure by looking at your App Engine log lines for the failing requests. The cpu_millis field tells you how long the actual process GAE-side worked on the request, while the millis field will be the total time for the request. If the total time is much higher than the cpu time, it means the cost was elsewhere in the network.
It might be related to bandwidth exhaustion of multiple connections relative to the endpoint's limited resources. If the endpoint is muc2014.communitymashup.net/x3/mashup as you added in a comment, it might help to know that at the time I posted this comment, approx 1424738921 in unix time, the average latency (including full response, not just time to response beginning) on that endpoint was ~6 seconds, although that could feasibly go up to >60s given heavy load if no scaling system is set up for the endpoint. The observed latency is already quite high, but it might vary according to what kind of work needs to be done server-side, what volume of requests/data is being handled, etc.
The problem lied in the stream being used by a function from the EMF library which took a lot of time (wasn't the case previously).
Rather, loading the contents from the URL into a StringBuilder, converting it to a separate InputStream and passing that to the function worked. All this being done in a cron job.
Related
My question may sound very basic one but I am a bit confused on jackson s writeValueAsString(String) is also called as serialization and conversion of java object to byte stream is also called as Serialization.
Can anyone help me understand; how both of them are different?
The reason I am asking this question is, I got into a scenario where I call a REST service. The REST responds in 10 seconds with JSON. However if I log the time of writeValueAsString(String) on server side, it hardly takes a second.
UPDATE 1: This is what I am observing
Last log on invoked REST service(returning a collection) prints at -->9:10:10 UTC. And, data simultaneously starts streaming on my machines Git bash as I am using curl to call service.
After 10 seconds last log on my Servlet filter( as intercepts the request to REST api uri) prints out at --> 9:10:20 UTC and at the very same time data streaming stops at Git bash(nearly 35Mb downloaded). So, what could be the reason for this behavior?
If Jackson simultaneously started sending bytes over the network while serialization is still on?
Is that Jackson serialization is slow or network bandwidth is low?
Note that, I tried running my serialization and deserialization only using writeValueAsString(..)/readValue(..) operations without any network call through junit with the same set of data and they executes within a second of time.
Thanks
Server response time of 10 seconds is not just the serialization time, it includes the:
total time of request reaching the REST service server over the network
internal processing in the REST service app
response reaching your application over the network
(Additionaly the time taken at various other layers, but not including them for the sake of simplicity).
And for serialization - adding the comment from #Lino here:
In computing, serialization (or serialisation) is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment)
Source : Wikipedia
I noticed that a blocking gPRC call might be blocked for a long, long time, if not for ever.
I checked and found the following page:https://grpc.io/docs/guides/concepts.html#deadlines
However, the page does not tell the default deadline/timeout value for Java.
So, I am wondering if there is a default java value.
I probably have to set a deadline value for all the calls, if not. Which is inconvenient...
There is no default deadline, in gRPC for any language. If there are network failures and keepalive is enabled on client-side, the call will eventually fail. But if the server takes an unbounded amount of time, then the client may wait an unbounded amount of time.
It is equivalent to "infinity" according to this issue https://github.com/grpc/grpc-java/issues/1495
Just like #Eric Anderson said, there's no default deadline. But, it's highly recommended to set one for each RPC in the client and service provider should also specify the longest deadline they support as mentioned in the blog: https://grpc.io/blog/deadlines
In general, when you don’t set a deadline, resources will be held for all in-flight requests, and all requests can potentially reach the maximum timeout. This puts the service at risk of running out of resources, like memory, which would increase the latency of the service, or could crash the entire process in the worst case.
To avoid this, services should specify the longest default deadline they technically support, and clients should wait until the response is no longer useful to them. For the service this can be as simple as providing a comment in the .proto file. For the client this involves setting useful deadlines.
I use aws api gateway integrated with aws lambda(java), but I'm seeing some serious problems in this approach. The concept of removing the server and having your app scaled out of the box is really nice but here are the problem I'm facing. My lambda is doing 2 simple things- validate the payload received from the client and then send it to a kinesis stream for further processing from another lambda(you will ask why I don't send directly to the stream and only use 1 lambda for all of the operations. Let's just say that I want to separate the logic and have a layer of abstraction and also be able to tell the client that he's sending invalid data.).
In the implementation of the lambda I integrated the spring DI. So far so good. I started making performance testing. I simulated 50 concurrent users making 4 requests each with 5 seconds between the requests. So what happened- In the lambda's coldstart I initialize the spring's application context but it seems that having so many simultaneous requests when the lambda was not started is doing some strange things. Here's a screenshot of the times the context was initialized for.
What we can see from the screenshot is that the times for initializing the context have big difference. My assumption of what happening is that when so many requests are received and there's no "active" lambda it initializes a lambda container for every one of them and in the same time it "blocks" some of them(the ones with the big times of 18s) until the others already started are ready. So maybe it has some internal limit of the containers it can start at the same time. The problem is that if you don't have equally distributed traffic this will happen from time to time and some of the requests will timeout. We don't want this to happen.
So next thing was to do some tests without spring container as my thought was "ok, the initialization is heavy, let's just make plain old java objects initialization". And unfortunatelly the same thing happened(maybe just reduced the 3s container initialization for some of the requests). Here is a more detailed screenshot of the test data:
So I logged the whole lambda execution time(from construction to the end), the kinesis client initialization and the actual sending of the data to the stream as these are the heaviest operations in the lambda. We still have these big times of 18s or something but the interesting thing is that the times are somehow proportional. So if the whole lambda takes 18s, around 7-8s is the client initialization and 6-7 for sending the data to the stream and 4-5 seconds left for the other operations in the lambda which for the moment is only validation. On the other hand if we take one of the small times(which means that it reuses an already started lambda),i.e. 820ms, it takes 100ms for the kinesis client initialization and 340 for the data sending and 400ms for the validation. So this pushes me again to the thoughts that internally it makes some sleeps because of some limits. The next screenshot is showing what is happening on the next round of requests when the lamda is already started:
So we don't have this big times, yes we still have some relatively big delta in some of the request(which for me is also strange), but the things looks much better.
So I'm looking for a clarification from someone who knows actually what is happening under the hood, because this is not a good behavior for a serious application which is using the cloud because of it's "unlimited" possibilities.
And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. For me this is also a big limitation for a big application with lots of traffic. So as my business case in the moment(I don't know for the future) is more or less fire and forget the request. And I'm starting to think of changing the logic in the way that the gateway sends the data directly to the stream and the other lambda is taking care of the validation and the further processing. Yes, I'm loosing the current abstraction(which I don't need at the moment) but I'm increasing the application availability many times. What do you think?
The lambda execution time spikes to 18s because AWS launches new containers w/ your code to handle the incoming requests. The bootstrap time is ~18s.
Assigning more RAM can significantly improve the performance of your lambda function, because you have more RAM, CPU and networking throughput!
And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region.
You can ask to the AWS Support to increase that limit. I asked to increase that limit to 10,000 invocation/second and the AWS Support did it quickly!
You can proxy straight to the Kinesis stream via API Gateway. You would lose some control in terms of validation and transformation, but you won't have the cold start latency that you're seeing from Lambda.
You can use the API Gateway mapping template to transform the data and if validation is important, you could potentially do that at the processing Lambda on the other side of the stream.
I have a web service for which I need to limit the number of transactions a client can perform. A transaction is hitting the URL with correct parameters. Every client will have different number of transactions it can perform per second. Client will be identified based on the IP address or a parameter in the URL.
The maximum TPS a client can perform will be kept in database or any other configurable manner. I understand that it would be possible to write servlet filter to do this. The filter would calculate requests per second and make database connection to get maximum TPS of client and reject the request when TPS reached as it will further slow down the application response. But that will not be helpful during a DOS attack. Is there a better way?
I had to do the same thing. This is how I did it.
1) I had a data model for tracking an IP's requests. It mainly tracked the rate of requests by using some math that allowed me to add a new request and the new rate of requests for that IP would quickly be recalculated. Lets call this class IpRequestRate.
2) For each unique IP that made a request an instance of IpRequestRate was instantiated. Only one instance was required per IP. They were put into a HashMap for fast retrieval. If a new IP came in, then a new instance of IpRequestRate was created for it.
3) When a request came in, if there was already an instance of IpRequestRate in the HashMap, then I would add the new request to that instance and get the new rate. If the rate was above a certain threshold, then the request would not be processed.
4) If the requester accidentally went above that threshold, then the rate would quickly dip below the threshold. But if it was a real DOS, or in my case too many attempts to access an account (due to hackers), then it would take much longer for their rate to dip below the threshold; which is what I wanted.
5) I do not recall if I had a cleanup thread to remove old IP's but that's something that would be needed at some point. You can use EhCache as your HashMap so it could do this for you automatically.
6) It worked really well and I thought about open sourcing it. But it was really simple and easily reproducible. You just have to get that math down right. The math for getting the rate is easy to get it accurate, but a little tricky if you want it to be fast, so that not a lot fo CPU's are spent calculating the new rate when a new request is added to the IpRequestRate.
Does that answer your question or would you need more info on how to setup the Filter in your server?
Edit: WRT DOS, during a DOS attack we want to waste as little resources as possible. If it all possible DOS detection should be done in a load balancer or reverse proxy or gateway or firewall.
If we want to do per IP max transmission rate, which is stored in a database then I would just cache the max transmission rates. This can be done without doing DB lookup for a request. I would instead load the table into a HashMap.
1) At start of application, say in the init() method, I would load the table into a HashMap that maps IP to maxTransmissionRate.
2) When request comes in, try to get the maxTransmissionRate from the HashMap. If its not there then use a default maxTransmissionRate.
3) During the init(), kickoff a ScheduleExecutorService to update the HashMap at some desired interval, to keep the HashMap fresh. Here is the link to ScheduleExecutorService, its not that hard. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html
4) Which HashMap implementation should we use? If we use a regular HashMap then we will have problems when it gets updated by the ScheduledExecutorService. We can use a synchronized HashMap, but this locks the HashMap and hurts performance during concurrent requests. So I would go with ConcurrentHashMap, which was designed with speed and multithreaded environment. You can safely update a ConcurrentHashMap on separate thread without worry.
If you apply this technique then its still a viable solution for DOS prevention and while supporting per client maxTransmissionRate.
I am building an application that reaches out to a FHIR API that implements paging, and only gives me a maximum of 100 results per page. However, our app requires the aggregation of these pages in order to hand over metadata to the UI about the entire result set.
When I loop through the pages of a large result set, I get HTTP status 429 - Too many requests. I am wondering if handing off these requests to a kafka service will help me get around this issue and maybe increase performance. I've read through the Intro and Use Cases sections of the Kafka documentation, but am still unclear as to whether implementing this tool will help.
You're getting 429 errors because you're making too many requests too quickly; you need to implement rate limiting.
As far as whether to use Kafka, a big part of that is whether your result set can fit in memory. If you can fit it in memory, then I would really suggest avoiding bringing in a separate service (KISS). If not, then yes, you can use Kafka. But I'd suggest taking a long think about whether you can use a relational datastore, because they're much more flexible. Or maybe even reading/writing directly to the disk
I were you, before I look into Kafka, I would try to solve why you are getting a 429 error. I would not leave that unnoticed. I would try to see how I am going to solve that.
I would looking into the following:
1) Sleep your process. The server response usually includes a Retry-after header in the response with the number of seconds you are supposed to wait before retrying.
2) Exponential backoff If the server's response does not tell you how long to wait, you can retry your request by inserting pauses by yourself in between.
Do keep it mind, before implementing sleep, it warrants extensive testing. You would have to make sure that your existing functionality does not get impacted.
To answer your question if Kafka would help you or not, the answer is it may or may not, with the limited info I can get from your question. Do understand that implementing Kafka would change your network architecture. You are bringing in a streaming platform to the equation. You would most probably implement caching which would aggregate your results. But at the moment all these concepts are at a very holistic level. I would suggest that you first ought to solve the 429 error and then warrant if a proper technical reason is present to implement Kafka which would improve your website's performance.