REST service and race conditions

REST service and race conditions - java

Let's imagine a problem:
I have a REST service, which is implemented using Java/MySQL/Spring and HTTP/JSON technologies.
The clients of the REST service are mobile applications.
So it's possible that someone will decompile the code and will get the API of the REST service.
( Yes, the code is obfuscated etc, but anyway ).
Problem: there is a POST method to send money to other user of the application.
I am worried, that someone can get the API, write a bot and make this POST request 500 or 5,000 or even 50,000 times per second.
As a result, he might send more money than he actually has, because if 1000 requests are processed simultaneously then the balance checking might be
successful for all 1000 request, however the real amount of money on an account might be enough only for, lets say, 50 requests.
So, basically, it's more like the standard "race" condition with multiple threads.
The problem is, that I have multiple servers and they are not related with each other anyhow.
So, 300 request can come to server A, 300 requests can come to server B and rest requests can come to server C.
The best idea what I have is to use something like "SELECT ... FOR UPDATE" and synchronize on database level.
However, I would like to consider another solutions.
Any ideas or suggestions?

You have a few options:
Rely on ACID implementation of the database (MySQL in your case). Assuming you are using InnoDB engine, you need to choose the right transaction isolation level (SET TRANSACTION syntax) in combination with the right locking reads mechanism (SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE Locking Reads). You need to understand these concepts well in order to do the right choice. It might be possible that simply using the right isolation level will already prevent the race condition even without the locking reads. The cons are you are trading off consistency for scalability and tying your application to RDBMS database thus it will be more difficult for you to move to NoSQL for example.
Decompose your back end into web tier and service tier (option suggested by atk in the comments). This will allow you to scale the web tier instances independently while keeping a single service tier instance. Having a single service tier instance makes it possible to use Java synchronization mechanisms such as synchronised blocks or ReadWriteLock. Although this solution will work I wouldn't recommend it since it reduces the scalability of your service tier.
This is an enhancement of the previous option. You can use a Distributed lock manager instead of built-in java synchronization mechanisms. It will allow you to scale your web tier and service tier independently.

For mission critical applications, it's best to have multiple levels of locking mechanisms.
"SELECT ... FOR UPDATE" is a good way to do so, but they're pretty expensive, and when you try to bombard this with Charles, you'll see that your upper API stack will suffer, and that simple mechanism will cripple your infrastructure pretty easily, much like a DDoS event.
Implement it at the Load Balancer/Proxy layer first, to throttle N-number of requests per specified intervals from a single IP address.
Then apply a shared cache layer lock, where all your boxes go to synchronize on certain keys depending on which critical transaction you want to lockdown. For example, you can use Redis GETSET or INCR functionality to atomically set a flag, before entering the critical code path. Reject anything else quickly to avoid those bad actors from holding on to CPU/memory.
You can also implement things like an APC cache (before hitting your Redis / Memcache cluster) to do similar locking on a per box basis. This is quicker as there's no network latency involved.
The above are necessary on top of using "SELECT ... FOR UPDATE"

Related

Serving single HTTP request with multiple threads

Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?

I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.

Single Transaction in multiple java jvms

One spring service is implemented in one java deployment unit(JVM). Another spring service is implemented in another JVM. Making service call from 1st jvm to 2nd jvm. Service interface could be either rest or soap over http. Need to keep single transaction over multiple jvms, meaning if any service fails every thing must be rolled back. How to do this. Any code examples.

Use global transactions (i.e., JTA),
Use XA resources (RDBMS and JMS connections), do "Full XA with 2PC".
For further reference on the Spring transaction management, including the JTA/XA scenario, read: http://docs.spring.io/spring/docs/current/spring-framework-reference/htmlsingle/#transaction

REST faces the exact same problem as SOAP-based web services with regards to atomic transactions. There is no stateful connection, and every operation is immediately committed; performing a series of operations means other clients can see interim states.
Unless, of course, you take care of this by design. First, ask yourself: do I have a standard set of atomic operations? This is commonly the case. For example, for a banking operation, removing a sum from one account and adding the same sum to a different account is often a required atomic operation. But rather than exporting just the primitive building blocks, the REST API should provide a single "transfer" operation, which encapsulates the entire process. This provides the desired atomicity, while also making client code much simpler. This appracoh is known as low granularity services, or high-level batch operations.
If there is no simple, pre-defined set of desired atomic operation sequences, the problem is more severe. A common solution is the batch command pattern. Define one REST method to demarcate the beginning of a transaction, and another to demarcate its end (a 'commit' request). Anything sent between these sets of operations is queued by the server but not committed, until the commit request is sent.
This pattern complicates the server significantly -- it must maintain a state per client. Normally, the first operation ('begin transaction') returns a transaction ID (TID), and all subsequent operations, up to and including the commit, must include this TID as a parameter.
It is a good idea to enforce a timeout on transactions: if too much time has passed since the initial 'begin transaction' request, or since the last step, the server has the right to abort the transaction. This prevents a potential DoS attack that causes the server to waste resources by keeping too many transactions open. The client design must keep in mind that each operation must be checked for a timeout response.
It is also a good idea to allow the client to abort a transaction, by providing a 'rollback' API.
The usual care required in designing code that uses multiple concurrent transactions applies as usual in this complex design scenario. If at all possible, try to limit the use of transactions, and support high-level batch operations instead.
I take no credit of this information, i'm just a director, credit goes to This article
Also please read Transactions in REST?
You can get some handy code samples here http://www.it-soa.eu/en/resp/atomicrest/userguide/index.html

Large number of single threaded task queues

At our company we have a server which is distributed into few instances. Server handles users requests. Requests from different users can be processed in parallel. Requests from same users should be executed strongly sequentionally. But they can arrive to different instances due to balancing. Currently we use Redis-based distributed locks but this is error-prone and requires more work around concurrency than business logic.
What I want is something like this (more like a concept):
Distinct queue for each user
Queue is named after user id
Each requests identified by request id
Imagine two requests from the same user arriving at two different instances concurrently:
Each instance put their request id into this user queue.
Additionaly, they both store their request ids locally.
Then some broker takes request id from the top of "some_user_queue" and moves it into "some_user_queue_processing"
Both instances listen for "some_user_queue_processing". They peek into it and see if this is request id they stored locally. If yes, then do processing. If not, then ignore and wait.
When work is done server deletes this id from "some_user_queue_processing".
Then step 3 again.
And all of this happens concurrently for a lot (thousands of them) of different users (and their queues).
Now, I know this sounds a lot like actors, but:
We need solution requiring as small changes as possible to make fast transition from locks. Akka will force us to rewrite almost everything from scratch.
We need production ready solution. Quasar sounds good, but is not production ready yet (more correctly, their Galaxy cluster).
Tops at my work are very conservative, they simply don't want another dependency which we'll need to support. But we already use Redis (for distributed locks), so I thought maybe it could help with this too.
Thanks

The best solution that matches the description of your problem is Redis Cluster.
Basically, the cluster solves your concurrency problem, in the following way:
Two (or more) requests from the same user, will always go to the same instance, assuming that you use the user-id as a key and the request as a value. The value must be actually a list of requests. When you receive one, you will append it to that list. In other words, that is your queue of requests (a single one for every user).
That matching is being possible by the design of the cluster implementation. It is based on a range of hash-slots spread over all the instances.
When a set command is executed, the cluster performs a hashing operation, which results in a value (the hash-slot that we are going to write on), which is located on a specific instance. The cluster finds the instance that contains the right range, and then performs the writing procedure.
Also, when a get is performed, the cluster does the same procedure: it finds the instance that contains the key, and then it gets the value.
The transition from locks is very easy to perform because you only need to have the instances ready (with the cluster-enabled directive set on "yes") and then to run the cluster-create command from redis-trib.rb script.
I've worked last summer with the cluster in a production environment and it behaved very well.

Concurrency : Handling multiple submits in a web application

This is a recent interview question to my friend:
How would you handle a situation where users enter some data in the screen and let's say 5 of them clicked on the Submit button *the SAME time ?*
(By same time,the interviewer insisted that they are same to the level of nanoseconds)
My answer was just to make the method that handles the request synchronized and only one request can acquire the lock on the method at a given time.
But it looks like the interviewer kept insisting there was a "better way" to handle it .
One other approach to handle locking at the database level, but I don't think it is "better".
Are there any other approaches. This seems to be a fairly common problem.

If you have only one network card, you can only have one request coming down it at once. ;)
The answer he is probably looking for is something like
Make the servlet stateless so they can be executed concurrently.
Use components which allow thread safe concurrent access like Atomic* or Concurrent*
Use locks only where you obsolutely have to.
What I prefer to do is to make the service so fast it can respond before the next resquest can come in. ;) Though I don't have the overhead of Java EE or databases to worry about.

Does it matter that they click at the same time e.g. are they both updating the same record on a database?
A synchronized method will not cut it, especially if it's a webapp distributed amongst multiple JVMs. Also the synchronized method may block, but then the other threads would just fire after the first completes and you'd have lost writes.
So locking at database level seems to be the option here i.e. if the record has been updated, report an error back to the users whose updates were serviced after the first.

You do not have to worry about this as web server launches each request in isolated thread and manages it.
But if you have some shared resource like some file for logging then you need to achieve concurrency and put thread lock on it in request and inter requests

Use threads as "sessions"

I am developing a text-based game, MUD. I have the base functions of the program ready, and now I would like to allow to connect more than one client at a time. I plan to use threads to accomplish that.
In my game I need to store information such as current position or health points for each player. I could hold it in the database, but as it will change very quick, sometimes every second, the use of database would be inefficient (am I right?).
My question is: can threads behave as "sessions", ie hold some data unique to each user?
If yes, could you direct me to some resources that I could use to help me understand how it works?
If no, what do you suggest? Is database a good option or would you recommend something else?
Cheers,
Eleeist

Yes, they can, but this is a mind-bogglingly stupid way to do things. For one thing, it permanently locks you into a "one thread per client" model. For another thing, it makes it difficult (maybe even impossible) to implement interactions between users, which I'm sure your MUD has.
Instead, have a collection of some kind that stores your users, with data on each user. Save persistent data to the database, but you don't need to update ephemeral data on every change.
One way to handle this is to have a "changed" boolean in each user. When you make a critical change to a user, write them to the database immediately. But if it's a routine, non-critical change, just set the "changed" flag. Then have a thread come along every once in a while and write out changed users to the database (and clear the "changed" flag).
Use appropriate synchronization, of course!

A Thread per connection / user session won't scale. You can only have N number of threads active where N is equal to the number of physical cores / processors your machine has. You are also limited by the amount of memory in your machine for how many threads you can create a time, some operating systems just put arbitrary limits as well.
There is nothing magical about Threads in handling multiple clients. They will just make your code more complicated and less deterministic and thus harder to reason about what is actually happening when you start hunting logic errors.
A Thread per connection / user session would be an anti-pattern!
Threads should be stateless workers that pull things off concurrent queues and process the data.
Look at concurrent maps for caching ( or use some appropriate caching solution ) and process them and then do something else. See java.util.concurrent for all the primitive classes you need to implement something correctly.

Instead of worrying about threads and thread-safety, I'd use an in-memory SQL database like HSQLDB to store session information. Among other benefits, if your MUD turns out to be the next Angry Birds, you could more easily scale the thing up.

Definitely you can use threads as sessions. But it's a bit off the mark.
The main point of threads is the ability of concurrent, asynchronous execution. Most probably, you don't want events received from your MUD clients to happen in an parallel, uncontrolled order.
To ensure consistency of the world I'd use an in-memory database to store the game world. I'd serialize updates to it, or at least some updates to it. Imagine two players in parallel hitting a monster with HP 100. Each deals 100 damage. If you don't serialize the updates, you could end up giving credit for 100 damage to both players. Imagine two players simultaneously taking loot from the monster. Without proper serialization they could end up each with their own copy of the loot.
Threads, on the other hand, are good for asynchronous communication with clients. Use threads for that, unless something else (like a web server) does that for you already.

ThreadLocal is your friend! :)
http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
ThreadLocal provides storage on the Thread itself. So the exact same call from 2 different threads will return/store different data.
The biggest danger is having a leak between Threads. You would have to be absolutely sure that if a different user used a Thread that someone else used, you would reset/clear the data.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.