Identifying duplicate requests fired from back end - java

I am facing a use case where I need to track down duplicate requests, which are fired through REST API calls from back end. Each request writes into the database, and hence the duplicate requests need not be processed again.
The duplicate requests may come in different threads under the same VM, or may be under different VM's altogether.The problem is how do I identify these duplicate requests ?
Approaches that I can think of :
Check in the database every time before processing an incoming request if the outcome of request is already what it is even if we process the request. If yes, then ignore the request else process it.
For every incoming request that has been processed, store it in a serialized format in a db mapped to a value (something like the hash index). Then, for every incoming request check if the db already has that request. If yes, then ignore else process it.
But both require db read operations. Can I do better ?

I don't think you can avoid DB operations in this case.
Your first approach is very project-specific one.
The second approach also cannot be applied to any code, because there might be cases when users send several equal requests and they both have to be processed.
A more general approach would be for the server to issue tokens, which are then passed with every request by the client to the server. The server in processing every request checks if the token which was passed in the request has already been used by someone. If not, mark in the DB that this token has been used and process the request. Otherwise ignore the request or send an error.
A client can obtain such a token by querying a server method (in this case there is no need to check any tokens for this particular request), or optionally the server can send a new token each time it responds a query.
You should also make sure to clean up outdated tokens once in a while to avoid polluting the database and collisions when generating new ones, if you generate tokens randomly. (See Birthday paradox).

The "double submit" is a common problem with web development. With standard forms a common idiom is submit-redirect-get which avoids a lot of problems.
I assume you're using javascript to fire requests to a REST backend? A simple approach to prevent one user from duplicating a request is to use javascript to disable the button for a small period of time after it's clicked.
However if you have to prevent this for multiple users, it is highly dependent on your model and other project details.

Related

Persisting related data after client has completed some web service calls in a chain

Consider that our application has some configs that user set them, and we need to have a backup of those data in order to restore them later.
Configs are list of different Objects and I have created some web services for each List of Object and application calls them in a chain, it means that with getting success response from one service they call the next one.
Now what the problem is...
I need to store each services data somewhere and after finishing the last service call in front end, I will create the final Object with received data from client and persist it in database(here MongoDB).
What is the best way for implementing this strategy?, consider that I don't want to persist each List of Object per service, I need to persist whole Object once.
Is there any way for storing body of a request somewhere until other services to be called?
What is the best for that?
I will appreciate any clue or solution that help me!
BEST WAY:
store all objects in client side and send only one request to server.
it reduces resource usage of server side.
ALTERNATIVE:
if you realy want to handle it by several requests (which I do not recommend it) then one strategy is : store objects of each request by an identifier related to that session (best candidate is JSESSIONID) to a temporary_objects_table and after final request store it in main tables.
and in failure of any service for that session, remove records with that sessionid from temporary_objects_table.
it has much more complexity comparing first approche.
After some research I found my answer:
REST and transaction rollbacks
and
https://stackoverflow.com/a/1390393/607033
You cannot use transactions because by REST the client maintains the client state and the server maintains the resource state. So if you want the resource state to be maintained by the client then it is not REST, because it would violate the stateless constraint. Violating the stateless constraint usually causes bad scalability. In this case it will cause bad horizontal scalability because you have to sync ongoing transactions between the instances. So please, don't try to build multi-phase commits on top of REST services.
Possible solutions:
You can stick with immediate consistency and use only a single
webservice instead of two. By resources like database, filesystem,
etc. the multi phase commit is a necessity. When you break up a
bigger REST service and move the usage of these resources into
multiple smaller REST services, then problems can occur if you do
this splitting wrongly. This is because one of the REST services will
require a resource, which it does not have access to, so it has to
use another REST service to access that resource. This will force the
multi phase commit code to move to a higher abstraction level, to the
level of REST services. You can fix this by merging these 2 REST
services and move the code to the lower abstraction level where it
belongs.
Another workaround to use REST with eventual consistency so you can
respond with 202 accepted immediately and you can process the
accepted request later. If you choose this solution then you must be
aware by developing your application that the REST services are not
always in sync. Ofc. this approach works only by inner REST services
by which you are sure that the client retry if a REST service is not
available, so if you write and run the client code.

Best option to cache information in order to have a fast http response

I am developing a web service which listens for HTTP requests. That service has to respond very fast and handle the maximum number of concurrent requests possible.
That service has one endpoint with 2 request parameters (param1 and param2), which are used to validate some information and then send a response.
I have some information predefined that I will use to validate the request parameters. For example:
I know that the param1 is assigned to assign1 and has the property prop1. Param1 is also assigned to assign3 which has the property prop3.
And that param2 is assigned to assign2 and has the property prop2.
With that information, when I get an HTTP request, I need to validate that param1 is assigned with assign1 or assign2, otherwise, I should return an empty response.
Obviously, the idea is to avoid hard-coding it and would be nice if I could modify that information while the server is running up.
In order to achieve that, I know two existing solutions (there are probably more) :
Use spring boot, and set that information in a .yml file, in order to read them through a spring bean using placeholders like #Value(${param1}), etc. The problem is that I will need to reload the application if I want to see the changes made in that .yml file, so technically I can not modify the cached info while the server is running up.
Use an embedded database like H2 and get the information for every request using a select. The problem is that I need to make a select for every request (and if I have to make any join it would increase the response time). But I could modify the info while the server is running up.
I would like to know which would be the best option to "cache" that information and let the service be very fast as every request is blocking.
I'm also using spring boot to make that service, but if you know any better web container I'm willing to try it.
You are prematurely optimizing.
For one: “I need to process requests very fast” and “handle as many requests as possible” are sort of vague criteria for performance.
You could express your goals more clearly like so:
I need my service handle 99% of all requests, each hour, with a latency of 50ms or less.
And I need my service to be able to handle 1000 requests per second (or whatever number you need)
Then, you can begin to measure performance and optimize as necessary.
Of course caching results is a great idea and you should do it. But if you’re worried that a local database, running on the same server is going to add significant latency to your service, you may find that that is not the case.

Optimizing async multi-request operations calling the same service

We are developing a document management web application and right now we are thinking about how to tackle actions on multiple documents. For example lets say a user multi selects 100 documents and wants to delete all of them. Until now (where we did not support multiple selection) the deleteDoc action does an ajax request to a deleteDocument service according to docId. The service in turn calls the corresponding utility function which does the required permission checking and proceeds to delete the document from the database. When it comes to multiple-deletion we are not sure what is the best way to proceed. We have come to many solutions but do not know which one is the best(-practice) and I'm looking for advice. Mind you, we are keen on keeping the back end code as intact as possible:
Creating a new multipleDeleteDocument service which calls the single doc delete utility function a number of times according to the amount of documents we want to delete (ugly in my opinion and counter-intuitive with modern practices).
Keep the back end code as is and instead, for every document, make an ajax request on the service.
Somehow (I have no idea if this is even possible) batch the requests into one but still have the server execute the deleteDocument service X amount of times.
Use WebSockets for the multi-delete action essentially cutting down on the communication overhead and time. Our application generally runs over lan networks with low latency which is optimal for websockets (when latency is introduced web sockets tend to match http request speeds).
Something we haven't thought of?
Sending N Ajax calls or N webSocket messages when all the data could be combined into a single call or message is never the most optimal solution so options 2 and 4 are certainly not ideal. I see no particular reason to use a webSocket over an Ajax call. If you already have a webSocket connection, then you can certainly just send a single delete message with a list of document IDs over the webSocket, but if an Ajax call could work just as well so I wouldn't create a webSocket connection just for this purpose.
Options 1 and 3 both require a new service endpoint that lets you make a single call to delete multiple documents. This would be recommended.
If I were designing an API like this, I'd design a single delete endpoint that takes one or more document IDs. That way the same API call can be used whether deleting a single document or multiple documents.
Then, from the client anytime you have multiple documents to delete, always collect them together and make one API call to delete all of them at once.
Internal to the server, how you implement that API depends upon your data store. If your data store also permits sending multiple documents to delete, then you would likewise call the data store that way. If it only supports single deletes, then you would just loop and delete each one individually.
Doing the option 3 would be the most elegant solution for me.
Assuming you send requests like POST /deleteDocument where you have docId as a parameter, you could instead pass an array of document ids to remove.
Then in backend you would only have to iterate through the list of ids and perform the deletion. You should be able keep the deletion code relatively intact.

how to deal with race conditions in a RESTful application?

Here is what's happening in my RESTful web app:
HTTP request comes in
The app starts to build a response, with some initial portion of data
Another requests changes the data that where used in step 2
The first request understands that the data are expired
What should it do? Fail the request and return an error to a client? Or it should start from scratch (taking more time than the client expects)?
IMHO you should treat a REST request very close to how you treat a DB transaction:
Either make sure, you lock what to need to lock before doing some real work
Or prepare to fail/retry on a concurrency issue
Very often this can actually be handed down to a DB transaction - depending on how much and what non-DB work your request does.
I think a good starting point is the concurrency model used by CouchDB. In essence:
Each request is handled in isolation, i.e. is not affected by other concurrent requests. This implies that you need to be able to get a consistent snapshot of the database when you begin processing a request, which most DBMS systems support with some notion of transaction.
GET requests always succeed, and return the state of the system at the point when they were submitted, ignoring any subsequent updates.
GET requests return a revision ID for the resource in question, which must be included as a field in any subsequent PUT request.
In a PUT request, the submitted revision ID is checked against the latest revision ID in the database. If they don't match then an error code is returned, in which case the client must re-fetch the latest version and re-apply any changes that they made.
More reading:
http://wiki.apache.org/couchdb/Technical%20Overview#ACID_Properties
http://wiki.apache.org/couchdb/HTTP_Document_API#PUT
Assuming it not about db transactions and say distributed long running processes are involved in each step.
In this scenario the client should be sent an appropriate response (something like 409/410 http codes) with details indicating that this request is no more valid and the client should try again. Retrying could end up in loops or worst case end up doing what client did not know.
Example, when you book a hotel/ticket online you get a response saying the price has since changed and you need to submit again to buy with new price.
From my point of view your question is the same as like:
"If I try to do a read from the database, and another transaction tries to do a write, it will block. But when I finish my read, I will have missed the new data that will be populated by the new transaction that comes in after my read."
This is a bad way to think about it. You should make sure that the clients get consistent data in the responses. If the data have been updated by the time they get the response that is not problem of the original method.
Your problem is that the data are currently updated, and I happen to know. What if the data are updated right after the response goes out the network?
IMHO choose the simplest solution that fits your requirements.
The clients should "poll" more frequently to make sure they always have the most recent copy of the data
Well strictly speaking a race condition is a bug. The solution to a race condition is not have shared data. If that is not to be avoided for a given use case, then a first come - first serve basis is usually helpful:
The first request locks the shared data and the second waits until the first is done with it.

Best performing way to guarantee data consistency between concurrent web service calls?

Multiple clients are concurrently accessing a JAX-JWS webservice running on Glassfish or some other application server. Persistence is provided by something like Hibernate or OpenJPA. Database is Microsoft SQL Server 2005.
The service takes a few input parameters, some "magic" occurs, and then returns what is basically a transformed version of the next available value in a sequence, with the particular sequence and transformation being determined by the inputs. The "magic" that performs the transformation depends on the input parameters and various database tables (describing the relationship between the input parameters, the transformation, the sequence to get the next base value from, and the list of already served values for a particular sequence). Not sure if this could all be wrapped up in a stored procedure (probably), but also not sure if the client wants it there.
What is the best way to ensure consistency (i.e. each value is unique and values are consumed in order, with no opportunity for a value to reach a client without also being stored in the database) while maintaining performance?
It's hard to provide a complete answer without a full description (table schemas, etc.), but giving my best guess here as to how it works, I would say that you need a transaction around your "magic", which marks the next value in the sequence as in use before returning it. If you want to reuse sequence numbers then you can later unflag them (for example, if the user then cancels what they're doing) or you can just consider them lost.
One warning is that you want your transaction to be as short and as fast as possible, especially if this is a high-throughput system. Otherwise your sequence tables could quickly become a bottleneck. Analyze the process and see what the shortest transaction window is that will still allow you to ensure that a sequence isn't reused and use that.
It sounds like you have most of the elements you need here. One thing that might pose difficulty, depending on how you've implemented your service, is that you don't want to write any response to the browser until your database transaction has been safely committed without errors.
A lot of web frameworks keep the persistence session open (and uncommitted) until the response has been rendered to support lazy loading of persistent objects by the view. If that's true in your case, you'll need to make sure that none of that rendered view is delivered to the client until you're sure it's committed.
One approach is a Servlet Filter that buffers output from the servlet or web service framework that you're using until it's completed its work.

Categories