Implementing idempotency for AWS Spot Instance Requests

Implementing idempotency for AWS Spot Instance Requests - java

I'm using the Java AWS SDK to make EC2 spot instance requests. As opposed to on demand instances, the API for spot requests does not have anything similar to ClientToken and thus does not support idempotency out of the box.
The most straightforward way I could think of to do this was to set the LaunchGroup property to a unique UUID; when I check for that I call DescribeSpotInstanceRequests and see if I already have a request with the same launch group.
To my surprise, it seems that there's a delay before the describe call returns the spot requests sent before. I wrote a JUnit test for this and it seems that in order for it to be consistent I would have to set a timeout of at least 60s between the two calls (request spot instance and describe spot instance requests). I need to have a granularity of 10s, because my requests can get repeated by the application at this interval, in case of any failure - i.e. something breaks after I sent the request but before I could read the result I got back from Amazon. In that case I don't want to have the request repeated, I just want to see that it got registered and move on.
#Test
public void testRunSpotInstances() throws Exception {
activity.execute(execution);
timeout(TIMEOUT);
// shouldn't do anything
activity.execute(execution);
timeout(TIMEOUT);
DescribeSpotInstanceRequestsResult result = client.describeSpotInstanceRequests(
new DescribeSpotInstanceRequestsRequest().withFilters(new Filter()
.withName("launch-group").withValues(BUSINESS_KEY)));
assertThat(result.getSpotInstanceRequests()).hasSize(1);
timeout(TIMEOUT);
}
The test works every time if TIMEOUT is set to 60s; for 40-50s it works intermittently. Anything below this fails every time.
Has anyone managed to work around this delay? Is implementing idempotency for spot requests possible using just the AWS API and not having state saved in the client application?

In that case I don't want to have the request repeated, I just want to see that it got registered and move on.
If you got a 200 back, then it's registered. It may not show up right away, but it's registered and you can move on in your flow.
Is implementing idempotency for spot requests possible using just the AWS API and not having state saved in the client application?
I don't believe so. I have the same sort of issue with Amazon's EMR. The way that I work around it is to have a component who's job it is to observe clusters. When I make a request for an EMR cluster, I get back a cluster id, which I then pass off to some observer. The observer will then call my other components when that cluster changes state. Not being acknowledged by EMR right away is a valid case and is not treated like an exception.
I have no idea if that's appropriate for you. Perhaps you could try maintaining the SpotInstanceRequestId. In my case, I only keep them in memory, but you could keep them somewhere persistent if need be.

Related

Request atomicity within microservices bounded context

Our project consists of multiple microservices. These microservices form a boundary to which the entry point is not strictly defined meaning each of microservices can be requested and can request other services.
The situation we need to handle in this bounded microservice context is following:
client (other application) makes the request to perform some logic and change the data (PATCH),
request times out,
while request is being processed client fires the same request to repeat the operation,
operation successfully completes,
second request is being processed the same way and completes within it's time and client gets response.
Now what happened is that the same was processed two times because of first timeout.
We need to make sure the same request won't get processed and application will respond with former response and status code.
The subsequent request is identified by the same uuid.
Now, I understand it's the client that should do requesting more precisely or we should have a single request entry point in out micorservices bounded context, but in enterprise projects the team doesn't own the whole system therefore we are a bit constrained with the solutions we propose for the problem. with this in mind while trying to not reinvent the wheel this comes to my mind:
The microservices should utilize some kind of session sharing (spring-session?) with the ability to look up the request by it's id before it gets processed and in described case, when first is being processed and second arrives, wait for the completion of the 1st and respond to the second with data of the first that has timed out for a client.
What I am struggling with is imagining handling the asynchronicity of replying to the second one and how to listen for session state of the first request.
If spring-session would be used (for example with hazelcast) I'm lacking some kind of concrete session state handler which would get fired when request ends. Is there something like this to listen for?
No code written yet. It's an architectural thought experiment that I want to discuss.
If unsure of understanding, read second time please, then I'm happy to expand.
EDIT: first idea:
process would be as follows (with numbering on the image):
(1) first request fired
(3) processing started; (2) request timed out meanwhile;
(4) client repeats the same request; program knows it has received the same request before because it knows the req. id.
program checks the cache and the state of that request id 'pending' so it WAITS (async).
computed result of first request is saved into the cache - orange square
(5) program responds to the first request with the data that was meant to be for the first one
idea is that result checking and responding to the repeated request would be done in the filter chain so it won't actually hit the controller when the second request is asynchronously waiting for the operation triggered by the first request to be done (I see hazelcast has some events when rows are added/updated/evicted from the cache - dunno if it's working yet) and when complete just respond (somehow write to the HttpServletResponse). result would be saved into the cache in postHandling filter.
Thanks for insights.

I'd consider this more of a caching paradigm. Stick your request/responses into an external cache provider (REDIS or similar), indexed by uuid. Having a TTL will allow your responses to automatically get cleaned up for requests that are never coming back, and the high-speed implementation (o1) should allow this to scale nicely. It will also out-of-the-box give you an asynchronous model (not a stated goal, but always a nice option).

How to make multiple call of #Transactional method to a single transaction

I have a method
#Transactional
public void updateSharedStateByCommunity(List[]idList)
This method is called from the following REST API:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
// call updateSharedStateByCommunity
}
Now the ID lists are very large, like 200000, When I try to process it, then it takes lots of time and on client side timeout error occurred.
So, I want to split it to two calls with list size of 100000 each.
But, the problem is, it is considered as 2 independent transactions.
NB: The 2 calls is an example, it can be divided to many times, if number ids are more larger.
I need to ensure two separate call to a single transaction. If any one of the 2 calls fails, then it should rollback to all operation.
Also, in the client side, we need to show progress dialog, so I can't use only timeout.

The most obvious direct answer to your question IMO is to slightly change the code:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
updateSharedStateByCommunityBlocks(resolveIds);
}
...
And in Service introduce a new method (if you can't change the code of the service provide an intermediate class that you'll call from controller with the following functionality):
#Transactional
public updateSharedStatedByCommunityBlocks(resolveIds) {
List<String> [] blocks = split(resolveIds, 100000); // 100000 - bulk size
for(List<String> block :blocks) {
updateSharedStateByCommunity(block);
}
}
If this method is in the same service, the #Transactional in the original updateSharedStateByCommunity won't do anything so it will work. If you'll put this code into some other class, then it will work since the default propagation level of spring transaction is "Required"
So it addresses harsh requirements: you wanted to have a single transaction - you've got it. Now all the code runs in the same transaction. Each method now runs with 100000 and not with all the ids, everything is synchronous :)
However, this design is problematic for many different reasons.
It doesn't allow to track the progress (show it to the user) as you've stated by yourself in the last sentence of the question. REST is synchronous.
It assumes that network is reliable and waiting for 30 minutes is technically not a problem (leaving alone the UX and 'nervous' user that will have to wait :) )
In addition to that, the network equipment can force closing the connection (like load balancers with pre-configured request timeout).
That's why people suggest some kind of asyncrhonous flow.
I can say that you still can use the async flow, spawn the task, and after each bulk update some shared state (in-memory in the case of a single instance) and persistent (like database in the case of cluster).
So that the interaction with the client will change:
Client calls "updateUser" with 200000 ids
Service responds "immediately" with something like "I've got your request, here is a request Id, ping me once in a while to see what happens.
Service starts an async task and process the data chunk by chunk in a single transaction
Client calls "get" method with that id and server reads the progress from the shared state.
Once ready, the "Get" methods will respond "done".
If something fails during the transaction execution, the rollback is done, and the process updates the database status with "failure".
You can also use more modern technologies to notify the server (web sockets for example), but it's kind of out of scope for this question.
Another thing to consider here: from what I know, processing 200000 objects should be done in much less then 30 minutes, its not that much for modern RDBMSs.
Of course, without knowing your use case its hard to tell what happens there, but maybe you can optimize the flow itself (using bulk operations, reducing the number of requests to db, caching and so forth).

My preferred approach in those scenarios is make the call asynchronous (Spring Boot allow this using the #Async annotation), hence the client won't expect for any HTTP response. The notification could be done via a WebSocket that will push a message to the client with the progress each X items processed.
Surely it will add more complexity to your application, but if you design the mechanism properly, you'll be able to reuse it for any other similar operation you may face in the future.

The #Transactional annotation accepts a timeout (although not all underlying implementations will support it). I would argue against trying to split the IDs into two calls, and instead try to fix the timeout (after all, what you really want is a single, all-or-nothing transaction). You can set timeouts for the whole application instead of on a per-method basis.

From technical point, it can be done with the org.springframework.transaction.annotation.Propagation#NESTED Propagation, The NESTED behavior makes nested Spring transactions to use the same physical transaction but sets savepoints between nested invocations so inner transactions may also rollback independently of outer transactions, or let them propagate. But the limitation is only works with org.springframework.jdbc.datasource.DataSourceTransactionManager datasource.
But for really large dataset, it still need more time to processing and make the client waiting, so from solution point of view, maybe using async approach will be more better but it depends on your requirement.

Scalable Way to Combine Same Requests Within Certain Time Threshold

I have an application, call it Service 1, that potentially makes a lot of the same requests to another application, call it Service 2. As an example, x number of people use Service 1 and that results in x requests (which are the exact same request) to Service 2. Each response is cached in Service 1.
Currently, we have a synchronized method that checks whether or not the same request has been made within a certain time threshold. The problem we are having is that when the server is under a heavy load that synchronized method locks up the threads, kubernetes can't perform liveness checks, so kubernetes restarts the service. The reason we want to prevent duplicate requests is two fold: 1) we don't want to hammer service 2, and 2) if we already are making the request we don't want to make it again, just wait for the result that will already be coming back.
What is the fastest, most scalable solution to not making duplicate requests without locking up and taking down the server?

FWIW, my experience with rx-java specifically is very limited, so I'm not entirely confident how applicable this is for your case. This is a solution I've used several times with Scala and I know Java itself does have analogous constructs that would allow the same approach.
A solution I have used in the past that has worked very well for me involves using Futures. It does not reduce duplication entirely, but it does remove duplication per requesting server. The approach involves using a TTL Cache in which we stored the Future object that does or will contain the result of a request we want to deduplicate on. It is stored under a key that can determine uniqueness of the request such as the different parameters that might be applicable.
So let's say you have a method that you call to fetch the response from Service 2 and returns it as a Future. As an example we'll say getPage which has one parameter, an integer, which is the page you'd like to fetch.
When a request begins and we're about to call getPage with the page number of 2, we check the cache for a key like "getPage:2". This won't contain anything for the first request, so we call getPage(2) which returns a Future[SomeResponseObject]. We set "getPage:2" in the TTL Cache to the Future object. When another request comes in that may spawn a duplicate request, the same cache check happens, however, there's a Future object already in the cache. We get this future and add a response listener to be invoked when the response is available, or in Scala, simply .map() on it.
This has a few advantages. If your request is slow or there's highly duplicative requests even in a small time frame, many requests to Service 1 are serviced by a single response from Service 2.
Secondarily, once the request to Service 2 has come back, assuming you have a window in which the response is still valid, the response is already available immediately and no request is necessary at all.
If your Service 2 request takes 50ms, and your response can be considered valid for 5 seconds, all requests happening to the same server in the first 50ms are serviced at ms 50 when the response is returned, and from that point forward for the remaining 4950 ms already have access to the response.
As I alluded earlier to the effectiveness here is tied to how many instances of Service 1 are running. The number of duplicate requests at any time is linear to the number of Servers running.
This is a mostly lock free way to achieve this. I saw mostly because some synchronization is necessary the TTL Cache itself to make sure the request is only started once, but has never been an issue for performance in my experience.
As an extension of this, you can potentially use something like redis to cache responses from Service 2 if it has long-ish response times, and have your getPage equivalent first check a redis cache for the serialized response (and write an expiring value if one wasn't there). This allows you to further reduce requests to Service 2 by having a more global value cached, but having a second caching layer does add some complexity and potential for issues.

Akka: Is this the correct way to build REST web service with Actors?

What I am doing:
I am using play 2.5.7 (java) and trying to build a REST application.
When I get a call on my controller I ask the first actor, this actor can only solve part of the problem (getting additional data), which needs to be forwarded to another actor which uses the request data and additional data to update some more data, send an async void call (tell) to another actor and respond to the controller. All these (4) actors are #Injected in other actors or controller with Guice.
Flow of calls:
controller --(Patterns.ask)--> actor1 --(actor.forward)--> actor2 --(actor.forward)--> actor3 (-tell-> actor4) and --(sender().tell)--> controller.
Issue:
This works for first 4 calls. Then on actor1.forward keeps failing on every consecutive request; Patterns.ask times out. System.out on the line before actor1.forward works but not the actual forward. No matter the timeout value (tried even 20s). No change done in the request; I just hit the send button in postman every time.
I have two questions:
Why 4? Why does it fail after 4th request? Is it some config? What should I look for in config?
Is what I am doing with actors correct way to build a REST web service?
Update: I found the issue; it was caused due to consumption of Redis connections through the pool and never freeing them. But the second question I had still remains, is what I am doing here advisable?

Sure, this could be a reasonable design. But I would consider though whether it would be more maintainable to work with Future returning methods, unless your workflow requires some complex protocol between multiple moving pieces or internal state. It may also be worth considering Akka Streams, if your processing doesn't map well to async method calls.
Basically, actors are a pretty low-level tool. To the extent that you need them, I would try to minimize the surface area of your application where they are being directly used. Higher-level abstractions are better, where possible.

Is there a way to tell the servlet container to spawn one instance of a resource at a time?

I have a resource, say a #POST method serving the clients. It doesn't run on any external parameters, not even the caller URL (we're leaving that to the firewall) or the user authentication.
However, we don't want to handle user requests simultaneously. When a request1 is being processed and the method hasn't just yet returned, a request2 coming in should receive a response of status 309 (or whatever status code applies) and shouldn't get served.
Is there a way of doing this without getting into anything on the server back-end side like multithreading?
I'm using Tomcat 8. The application will be deployed on JBoss, however this wouldn't effect the outcome(?) I used Jersey 1.19 for coding the resource.
This is a Q relevant to How to ignore multiple clicks from an impatient user?.
TIA.

Depending on what you want to achieve, yes, it is possible to reject additional requests while a service is "in use." I don't know if it's possible at the servlet level; servlets are designed to spin up processes for as many requests as possible so that, say, if one user requests something simple and another requests something difficult, the simple request can get handled while the difficult request is processing.
The primary reason you would probably NOT want to return an HTTP error code simply because a service is in use is that the service didn't error; it was simply in use. Imagine trying to use a restroom that someone else was using and instead of "in use" the restroom said "out of order."
Another reason to think twice about a service that rejects requests while it is processing any other request is that it will not scale. Period. You will have some users have their requests accepted and others have their requests rejected, seemingly at random, and the ratio will tilt toward more rejections the more users the service has. Think of calling into the radio station to try to be the 9th caller, getting a busy tone, and then calling back again and again until you get through. This works for trying to win free tickets to a concert, but would not work well for a business you were a customer of.
That said, here are some ways I might approach handling expensive, possibly duplicate, requests.
If you're trying to avoid multiple identical/simultaneous requests from an impatient user, you most likely have a UX problem (e.g. a web button doesn't seem to respond when clicked because of processing lag). I'd implement a loading mask or something similar to prevent multiple clicks and to communicate that the user's request has been received and is processing. Loading/processing masks have the added benefit of giving users an abstract feeling of ease and confidence that the service is indeed working as expected.
If there is some reason out of your control why multiple identical requests might get triggered coming from the same source, I'd opt for a cache that returns the processed result to all requests, but only processes the first request (and retrieves the response from the cache for all other requests).
If you really really want to return errors, implement a singleton service that remembers a cache of some number of requests, detects duplicates, and handles them appropriately.
Remember that if your use case is indeed multiple clicks from a browser, you likely want to respond to the last request sent, not the first. If a user has clicked twice, the browser will register the error response first (it will come back immediately as a response to the last click). This can further undermine the UX: a single click results in a delay, but two clicks results in an error.
But before implementing a service that returns an error condsider the following: what if two different users request the same resource at the same time? Should one really get an error response? What if the quantity of requests increases during certain times? Do you really want to return errors to what amounts to a random selection of consumers of the service?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.