Request atomicity within microservices bounded context - java

Our project consists of multiple microservices. These microservices form a boundary to which the entry point is not strictly defined meaning each of microservices can be requested and can request other services.
The situation we need to handle in this bounded microservice context is following:
client (other application) makes the request to perform some logic and change the data (PATCH),
request times out,
while request is being processed client fires the same request to repeat the operation,
operation successfully completes,
second request is being processed the same way and completes within it's time and client gets response.
Now what happened is that the same was processed two times because of first timeout.
We need to make sure the same request won't get processed and application will respond with former response and status code.
The subsequent request is identified by the same uuid.
Now, I understand it's the client that should do requesting more precisely or we should have a single request entry point in out micorservices bounded context, but in enterprise projects the team doesn't own the whole system therefore we are a bit constrained with the solutions we propose for the problem. with this in mind while trying to not reinvent the wheel this comes to my mind:
The microservices should utilize some kind of session sharing (spring-session?) with the ability to look up the request by it's id before it gets processed and in described case, when first is being processed and second arrives, wait for the completion of the 1st and respond to the second with data of the first that has timed out for a client.
What I am struggling with is imagining handling the asynchronicity of replying to the second one and how to listen for session state of the first request.
If spring-session would be used (for example with hazelcast) I'm lacking some kind of concrete session state handler which would get fired when request ends. Is there something like this to listen for?
No code written yet. It's an architectural thought experiment that I want to discuss.
If unsure of understanding, read second time please, then I'm happy to expand.
EDIT: first idea:
process would be as follows (with numbering on the image):
(1) first request fired
(3) processing started; (2) request timed out meanwhile;
(4) client repeats the same request; program knows it has received the same request before because it knows the req. id.
program checks the cache and the state of that request id 'pending' so it WAITS (async).
computed result of first request is saved into the cache - orange square
(5) program responds to the first request with the data that was meant to be for the first one
idea is that result checking and responding to the repeated request would be done in the filter chain so it won't actually hit the controller when the second request is asynchronously waiting for the operation triggered by the first request to be done (I see hazelcast has some events when rows are added/updated/evicted from the cache - dunno if it's working yet) and when complete just respond (somehow write to the HttpServletResponse). result would be saved into the cache in postHandling filter.
Thanks for insights.

I'd consider this more of a caching paradigm. Stick your request/responses into an external cache provider (REDIS or similar), indexed by uuid. Having a TTL will allow your responses to automatically get cleaned up for requests that are never coming back, and the high-speed implementation (o1) should allow this to scale nicely. It will also out-of-the-box give you an asynchronous model (not a stated goal, but always a nice option).

Related

Waiting for a HTTP request in middle of the main thread

I have a queue and I have this consumer written in java for this queue. After consuming, we are executing an HTTP call to a downstream partner and this is a one-way asynchronous call. After executing this request, the downstream partner will send an HTTP request back to our system with the response for the initial asynchronous call. This response is needed for the same thread that we executed the initial asynchronous call. This means we need to expose an endpoint within the thread so the downstream system can call and send the response back. I would like to know how can I implement a requirement like this.
PS : We also can get the same response to a different web service and update a database row with the response. But I'm not sure how to stop the main thread and listen to the database row when the response is needed.
Hope you understood what I want with this requirement.
My response based on some assumptions. (I didn't wait for you respond to my comment since I found the problem had some other interesting features anyhow.)
the downstream partner will send an HTTP request back to our system
This necessitates that you have a listening port (ie, a server) running on this side. This server could be in the same JVM or a different one. But...
This response is needed for the same thread
This is a little confusing because at a high level, reusing the thread programmatically itself is not usually our interest but reusing the object (no matter in which thread). To reuse threads, you may consider using ExecutorService. So, what you may try to do, I have tried to depict in this diagram.
Here are the steps:
"Queue Item Consumer" consumes item from the queue and sends the request to the downstream system.
This instance of the "Queue Item Consumer" is cached for handling the request from the downstream system.
There is a listener running at some port within the same JVM to which the downstream system sends its request.
The listener forwards this request to the "right" cached instance of "Queue Item Consumer" (you have to figure out a way for this based on your caching mechanism). May be some header has to be present in the request from the downstream system to identify the right handler on this side.
Hope this works for you.

Scalable Way to Combine Same Requests Within Certain Time Threshold

I have an application, call it Service 1, that potentially makes a lot of the same requests to another application, call it Service 2. As an example, x number of people use Service 1 and that results in x requests (which are the exact same request) to Service 2. Each response is cached in Service 1.
Currently, we have a synchronized method that checks whether or not the same request has been made within a certain time threshold. The problem we are having is that when the server is under a heavy load that synchronized method locks up the threads, kubernetes can't perform liveness checks, so kubernetes restarts the service. The reason we want to prevent duplicate requests is two fold: 1) we don't want to hammer service 2, and 2) if we already are making the request we don't want to make it again, just wait for the result that will already be coming back.
What is the fastest, most scalable solution to not making duplicate requests without locking up and taking down the server?
FWIW, my experience with rx-java specifically is very limited, so I'm not entirely confident how applicable this is for your case. This is a solution I've used several times with Scala and I know Java itself does have analogous constructs that would allow the same approach.
A solution I have used in the past that has worked very well for me involves using Futures. It does not reduce duplication entirely, but it does remove duplication per requesting server. The approach involves using a TTL Cache in which we stored the Future object that does or will contain the result of a request we want to deduplicate on. It is stored under a key that can determine uniqueness of the request such as the different parameters that might be applicable.
So let's say you have a method that you call to fetch the response from Service 2 and returns it as a Future. As an example we'll say getPage which has one parameter, an integer, which is the page you'd like to fetch.
When a request begins and we're about to call getPage with the page number of 2, we check the cache for a key like "getPage:2". This won't contain anything for the first request, so we call getPage(2) which returns a Future[SomeResponseObject]. We set "getPage:2" in the TTL Cache to the Future object. When another request comes in that may spawn a duplicate request, the same cache check happens, however, there's a Future object already in the cache. We get this future and add a response listener to be invoked when the response is available, or in Scala, simply .map() on it.
This has a few advantages. If your request is slow or there's highly duplicative requests even in a small time frame, many requests to Service 1 are serviced by a single response from Service 2.
Secondarily, once the request to Service 2 has come back, assuming you have a window in which the response is still valid, the response is already available immediately and no request is necessary at all.
If your Service 2 request takes 50ms, and your response can be considered valid for 5 seconds, all requests happening to the same server in the first 50ms are serviced at ms 50 when the response is returned, and from that point forward for the remaining 4950 ms already have access to the response.
As I alluded earlier to the effectiveness here is tied to how many instances of Service 1 are running. The number of duplicate requests at any time is linear to the number of Servers running.
This is a mostly lock free way to achieve this. I saw mostly because some synchronization is necessary the TTL Cache itself to make sure the request is only started once, but has never been an issue for performance in my experience.
As an extension of this, you can potentially use something like redis to cache responses from Service 2 if it has long-ish response times, and have your getPage equivalent first check a redis cache for the serialized response (and write an expiring value if one wasn't there). This allows you to further reduce requests to Service 2 by having a more global value cached, but having a second caching layer does add some complexity and potential for issues.

Is there a way to tell the servlet container to spawn one instance of a resource at a time?

I have a resource, say a #POST method serving the clients. It doesn't run on any external parameters, not even the caller URL (we're leaving that to the firewall) or the user authentication.
However, we don't want to handle user requests simultaneously. When a request1 is being processed and the method hasn't just yet returned, a request2 coming in should receive a response of status 309 (or whatever status code applies) and shouldn't get served.
Is there a way of doing this without getting into anything on the server back-end side like multithreading?
I'm using Tomcat 8. The application will be deployed on JBoss, however this wouldn't effect the outcome(?) I used Jersey 1.19 for coding the resource.
This is a Q relevant to How to ignore multiple clicks from an impatient user?.
TIA.
Depending on what you want to achieve, yes, it is possible to reject additional requests while a service is "in use." I don't know if it's possible at the servlet level; servlets are designed to spin up processes for as many requests as possible so that, say, if one user requests something simple and another requests something difficult, the simple request can get handled while the difficult request is processing.
The primary reason you would probably NOT want to return an HTTP error code simply because a service is in use is that the service didn't error; it was simply in use. Imagine trying to use a restroom that someone else was using and instead of "in use" the restroom said "out of order."
Another reason to think twice about a service that rejects requests while it is processing any other request is that it will not scale. Period. You will have some users have their requests accepted and others have their requests rejected, seemingly at random, and the ratio will tilt toward more rejections the more users the service has. Think of calling into the radio station to try to be the 9th caller, getting a busy tone, and then calling back again and again until you get through. This works for trying to win free tickets to a concert, but would not work well for a business you were a customer of.
That said, here are some ways I might approach handling expensive, possibly duplicate, requests.
If you're trying to avoid multiple identical/simultaneous requests from an impatient user, you most likely have a UX problem (e.g. a web button doesn't seem to respond when clicked because of processing lag). I'd implement a loading mask or something similar to prevent multiple clicks and to communicate that the user's request has been received and is processing. Loading/processing masks have the added benefit of giving users an abstract feeling of ease and confidence that the service is indeed working as expected.
If there is some reason out of your control why multiple identical requests might get triggered coming from the same source, I'd opt for a cache that returns the processed result to all requests, but only processes the first request (and retrieves the response from the cache for all other requests).
If you really really want to return errors, implement a singleton service that remembers a cache of some number of requests, detects duplicates, and handles them appropriately.
Remember that if your use case is indeed multiple clicks from a browser, you likely want to respond to the last request sent, not the first. If a user has clicked twice, the browser will register the error response first (it will come back immediately as a response to the last click). This can further undermine the UX: a single click results in a delay, but two clicks results in an error.
But before implementing a service that returns an error condsider the following: what if two different users request the same resource at the same time? Should one really get an error response? What if the quantity of requests increases during certain times? Do you really want to return errors to what amounts to a random selection of consumers of the service?

Session Oriented Asynchronous Architecture using Actors/AKKA

The application we are building has a very simple concept: it receives incoming events from a Database and for each event it opens an interactive session with clients (in the event) by showing a menu. Based on client response, we move to the next state or take some concrete action (e.g. transferring funds).
Sessions are independent of one another. For example, suppose we get two events from the database saying clients A and B have reached a zero account balance state. In response to this event, we establish two connections to A and B show a menu which looks like the following:
Please select an option:
1. Get $5
2. Get $10
3. Ignore
For options 1 and 2, we ask for confirmation in the form of second menu.
Are you sure?
1. yes
2. no
In this case, we'll have two sessions. Client A might choose option 1 (1. Get $5), whereas Client B chooses option 3 [in the first menu]. In the case of Client A, we'll present the second menu (above) and if the response is 1. yes, we'll take some concrete action such as transferring funds and closing the session.
All client communication is done by a 3rd party system which takes JSON including client address, menu text and returns a response back to us. It takes care of actually maintaing the session on the wire, whereas we only need to do response correlation and dealing with session states.
We're expected to handle 50,000 of such sessions in parallel.
Earlier, we designed the system in Java using SEDA model. Having heard of Actors, we are willing to check them out and write a quick PoC project (Java/AKKA). My questions are:
Has anyone had experience in building such kind of an application? Is 50,000 simultaneous sessions too much for AKKA to handle? (Note, we are only waiting for the response. When the response comes, based on the answer, we jump to the next stage, so it should be possible).
Which architectural stye/paradigm which best suit this problem in AKKA? Are there any frameworks out there for this kind of problem?
This is actually a reasonably easy use case with Akka's clustering. 50K sessions represented as an Actor instance for each is not very high load. The reason to use clustering is only for fault tolerance.
The idea behind the architecture would be to have a web tier for handling RESTful requests that correspond to the sessions. These requests would be sent to the Akka cluster and routed to the appropriate session Actor by session ID, or a new one would be created. When a session is done, you stop the actor that is associated with it.
Note that the session actors should send themselves timeout messages via the scheduler. Upon completion of handling a new message, the actor should schedule itself a message via the ActorSystem scheduler for 15 minutes (or whatever your timeout is). When a new session message is received, that scheduled task should be cancelled, the new update handled, and then a new timeout scheduled. There is a plausible race condition here, in that a timeout message may be in your session actor's mailbox queue AFTER a session message, but if your timeout message includes a time of when it was scheduled (the 15 minutes ago), you can check that and ignore it and reschedule another (just as a safety mechanism to avoid a memory leak). If the time is greater than 15 minutes ago, then you stop the actor.
To see how the distribution of work to the session actors would be implemented, please see the "Distributed Workers with Akka and Java" template in Typesafe's Activator. You will have a fully running clustered Akka application that you can tailor to do the session management as I've described above. You can then export the project and work on it in Eclipse/IntelliJ/Sublime/TextMate/etc. To download Activator, see here.

Implementing idempotency for AWS Spot Instance Requests

I'm using the Java AWS SDK to make EC2 spot instance requests. As opposed to on demand instances, the API for spot requests does not have anything similar to ClientToken and thus does not support idempotency out of the box.
The most straightforward way I could think of to do this was to set the LaunchGroup property to a unique UUID; when I check for that I call DescribeSpotInstanceRequests and see if I already have a request with the same launch group.
To my surprise, it seems that there's a delay before the describe call returns the spot requests sent before. I wrote a JUnit test for this and it seems that in order for it to be consistent I would have to set a timeout of at least 60s between the two calls (request spot instance and describe spot instance requests). I need to have a granularity of 10s, because my requests can get repeated by the application at this interval, in case of any failure - i.e. something breaks after I sent the request but before I could read the result I got back from Amazon. In that case I don't want to have the request repeated, I just want to see that it got registered and move on.
#Test
public void testRunSpotInstances() throws Exception {
activity.execute(execution);
timeout(TIMEOUT);
// shouldn't do anything
activity.execute(execution);
timeout(TIMEOUT);
DescribeSpotInstanceRequestsResult result = client.describeSpotInstanceRequests(
new DescribeSpotInstanceRequestsRequest().withFilters(new Filter()
.withName("launch-group").withValues(BUSINESS_KEY)));
assertThat(result.getSpotInstanceRequests()).hasSize(1);
timeout(TIMEOUT);
}
The test works every time if TIMEOUT is set to 60s; for 40-50s it works intermittently. Anything below this fails every time.
Has anyone managed to work around this delay? Is implementing idempotency for spot requests possible using just the AWS API and not having state saved in the client application?
In that case I don't want to have the request repeated, I just want to see that it got registered and move on.
If you got a 200 back, then it's registered. It may not show up right away, but it's registered and you can move on in your flow.
Is implementing idempotency for spot requests possible using just the AWS API and not having state saved in the client application?
I don't believe so. I have the same sort of issue with Amazon's EMR. The way that I work around it is to have a component who's job it is to observe clusters. When I make a request for an EMR cluster, I get back a cluster id, which I then pass off to some observer. The observer will then call my other components when that cluster changes state. Not being acknowledged by EMR right away is a valid case and is not treated like an exception.
I have no idea if that's appropriate for you. Perhaps you could try maintaining the SpotInstanceRequestId. In my case, I only keep them in memory, but you could keep them somewhere persistent if need be.

Categories