We have an application which is currently threaded (about 50 threads) to process transactions.
We have setup a redis database and using DECRBY to deduct credits from a users account.
Here is an example of the process:
1. Get amount of credits for this transaction
2. Get current credit amount from from Redis: GET <key>
3. If amount of credits exceeds amount cost of transaction continue
4. DECRBY the transaction amount from Redis.
The issue i have here is obvious, when the users credits reaches 0, it does fail the transaction (good), but it lets about 10-20 transactions through because of the threading.
I have thought of setting up WATCH, MULTI, EXEC with Redis, and then retry, but won't this cause a bottleneck (I think its called race conditions) because the threads will be constantly fighting to complete the transaction.
Any suggestions ?
Locking is what you need. Since DB locks are expensive, you can implement a simple locking scheme in Redis using SETNX and also avoid race conditions. It's well explained here - http://redis.io/commands/setnx. But you still need to implement retries at application level.
It isn't the most conventional way of doing it IMO (most usual way is probably to use a lock in a RDBMS), but using WATCH, MULTI, EXEC looks akin to CAS and it doesn't seem too weird to me.
I'd assume that the author of Redis intended WATCH to be used like this. Performance implication obviously depends on how this thing is implemented (which I don't know), but my bet is that it will perform pretty good.
This is because it seems likely that there will be very less to almost no contention for the same keys in your situation (what is the chance of a user frantically issuing transactions for him/herself?), the success rate for the first swap operation will be really good. So the retry will only happen in very rare cases. Since Redis seems to be a credible framework, they also probably know what they are doing (i.e. less contention = easy job for Redis, thus it can probably handle it!).
You could try to use Redis based Lock object implementation for Java provided by Redisson framework instead of retrying with WATCH-MULTI commands. Working with WATCH-MULTI involves unnecessary requests to Redis during each attempt which works much slower than already acquired lock.
Here is the code sample:
Lock lock = redisson.getLock("transationLock");
lock.lock();
try {
... // instructions
} finally {
lock.unlock();
}
Related
I am working on a Java web application that uses Weblogic to connect to an Informix database. In the application we have multiple threads creating records in a table.
It happens pretty often that it fails and the following error is thrown:
java.sql.SQLException: Could not do a physical-order read to fetch next row....
Caused by: java.sql.SQLException: ISAM error: record is locked.
I am assuming that both threads are trying to insert or update when the record is locked.
I did some research and found that there is an option to set the database that instead of throwing an error, it should wait for the lock to be released.
SET LOCK MODE TO WAIT;
SET LOCK MODE TO WAIT 17;
I don't think that there is an option in JDBC to use this setting. How do I go about using this setting in my java web app?
You can always just send that SQL straight up, using createStatement(), and then send that exact SQL.
The more 'normal' / modern approach to this problem is a combination of MVCC, the transaction level 'SERIALIZABLE', retry, and random backoff.
I have no idea if Informix is anywhere near that advanced, though. Modern DBs such as Postgres are (mysql does not count as modern for the purposes of MVCC/serializable/retry/backoff, and transactional safety).
Doing MVCC/Serializable/Retry/Backoff in raw JDBC is very complicated; use a library such as JDBI or JOOQ.
MVCC: A mechanism whereby transactions are shallow clones of the underlying data. 2 separate transactions can both read and write to the same records in the same table without getting in each other's way. Things aren't 'saved' until you commit the transaction.
SERIALIZABLE: A transaction level (also called isolationlevel), settable with jdbcDbObj.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE); - the safest level. If you know how version control systems work: You're asking the database to aggressively rebase everything so that the entire chain of commits is ordered into a single long line of events: Each transaction acts as if it was done after the previous transaction was completed. The simplest way to implement this level is to globally lock all the things. This is, of course, very detrimental to multithread performance. In practice, good DB engines (such as postgres) are smarter than that: Multiple threads can simultaneously run transactions without just being frozen and waiting for locks; the DB engine instead checks if the things that the transaction did (not just writing, also reading) is conflict-free with simultaneous transactions. If yes, it's all allowed. If not, all but one simultaneous transaction throw a retry exception. This is the only level that lets you do this sequence of events safely:
Fetch the balance of isaace's bank account.
Fetch the balance of rzwitserloot's bank account.
subtract €10,- from isaace's number, failing if the balance is insufficient.
add €10,- to rzwitserloot's number.
Write isaace's new balance to the db.
Write rzwitserloot's new balance to the db.
commit the transaction.
Any level less than SERIALIZABLE will silently fail the job; if multiple threads do the above simultaneously, no SQLExceptions occur but the sum of the balance of isaace and rzwitserloot will change over time (money is lost or created – in between steps 1 & 2 vs. step 5/6/7, another thread sets new balances, but these new balances are lost due to the update in 5/6/7). With serializable, that cannot happen.
RETRY: The way smart DBs solve the problem is by failing (with a 'retry' error) all but one transaction, by checking if all SELECTs done by the entire transaction are not affected by any transactions that been committed to the db after this transaction was opened. If the answer is yes (some selects would have gone differently), the transaction fails. The point of this error is to tell the code that ran the transaction to just.. start from the top and do it again. Most likely this time there won't be a conflict and it will work. The assumption is that conflicts CAN occur but usually do not occur, so it is better to assume 'fair weather' (no locks, just do your stuff), check afterwards, and try again in the exotic scenario that it conflicted, vs. trying to lock rows and tables. Note that for example ethernet works the same way (assume fair weather, recover errors afterwards).
BACKOFF: One problem with retry is that computers are too consistent: If 2 threads get in the way of each other, they can both fail, both try again, just to fail again, forever. The solution is that the threads twiddle their thumbs for a random amount of time, to guarantee that at some point, one of the two conflicting retriers 'wins'.
In other words, if you want to do it 'right' (see the bank account example), but also relatively 'fast' (not globally locking), get a DB that can do this, and use JDBI or JOOQ; otherwise, you'd have to write code to run all DB stuff in a lambda block, catch the SQLException, check the SqlState to see if it is indicating that you should retry (sqlstate codes are DB-engine specific), and if yes, rerun that lambda, after waiting an exponentially increasing amount of time that also includes a random factor. That's fairly complicated, which is why I strongly advise you rely on JOOQ or JDBI to take care of this for you.
If you aren't ready for that level of DB usage, just make a statement and send "SET LOCK MDOE TO WAIT 17;" as SQL statement straight up, at the start of opening any connection. If you're using a connection pool there is usually a place you can configure SQL statements to be run on connection start.
The Informix JDBC driver does allow you to automatically set the lock wait mode when you connect to the server.
Simply pass via the DataSource or connection URL the following parameter
IFX_LOCK_MODE_WAIT=17
The values for JDBC are
(-1) Wait forever
(0) not wait (default)
(> 0) wait this many seconds
See https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.jdbc.doc/ids_jdbc_040.htm
Connection conn = DriverManager.getConnection ( "jdbc:Informix-sqli://cleo:1550:
IFXHOST=cleo;PORTNO=1550;user=rdtest;password=my_passwd;IFX_LOCK_MODE_WAIT=17";);
There is a bus booking site and around 100k threads are acting on it but total seats are only 20. How will you control the performance and what will be your approach in Java multithreading?
I replied that i will go for synchronize methods or block as it will control the concurrent threads and lock will prevent asynchronous execution.
But the interviewer interrupted me and said that syncronize is a bogus idea and performance will degrade and its not helpful.
Please let me know if i am missing any other useful multithreading concept that can fix this issue
You have to use Semaphore of permits 20, it means at a time 20 passengers can occupy seats if passengers are considered as thread. This may be answer only in case of core java multi threading.
This question is not about thread concept. Interviewer actually wants you not to use locking because that will make the application very slow if 100K threads are accessing the same object. It can be achieved with the optimistic locking (in jpa or hibernate).
This is accomplished by a version column in the database table, with a corresponding version attribute (#Version annotation) in the entity class. When a row is modified, the version value is incremented. The original transaction checks the version attribute, and if the data has been modified by another transaction, a javax.persistence.OptimisticLockException will be thrown, and the original transaction will be rolled back. We can use Date and timeStamp as well.
Please have a look here
javaEE Entity locking docs.oracle.com
An interviewer who asks a question like this is not looking for some specific solution or idea. They're looking to hear how you reason about synchronization. A sensible answer might have been:
With so many threads, clearly nobody cares about performance. If you care about performance, reduce the number of threads. They're just fighting over the same resource.
But if you must have so many threads, using locks makes a lot of sense as it will negate some of the harm from having so many threads by keeping most of the threads out of the ready-to-run state for as long as possible. While some method other than locks might keep more threads running at the same time, it would likely decrease performance as the threads would contend heavily causing extensive cache ping-ponging and other issues. With so many threads, blocking most of them is the best strategy to avoid contention among the executing threads and allow this poorly-designed application to actually at least get some work done.
Again, if you really want this to work sensibly, you have to rearchitect it out of the insane thread count issue.
Example Scenario:
Using a threadpool in java where each thread gets a new connection from the connectionpool and then all threads proceed to do some db transaction in parallel. For example inserting 100 values into the same table.
Will this somehow mess with the table/database or is it entirely safe without any kind of synchronization required between the threads?
I find it hard to find reliable information about this subject. From what I gather DB engines handle this on their own/if at all (PostgresQL apparently since version 9.X). Are there any well written articles explaining this further?
Bonus question: Is there even a point to make use of parallel transactions when the DB runs on a single hdd?
As long as the database itself is conforming to ACID you are fine (although every now and then someone finds a bug in some really strange situation).
To the bonus question: for PostgreSQL it totally does make sense as long as you have some time for collecting concurrent transactions (increase value for commit_delay), which then can help combining disk I/O's into batches. There are also other parameters for transaction throughput tuning, most of which can be pretty dangerous if Durability is one of your major concerns.
Also, please keep in mind that the database client also needs to do some work between database calls which, when executed sequentially, will just add idle time for the database. So even here, parallelism helps (as long as you have actual resources for it (CPU, ...).
As I understand it, all transactions are Thread-bound (i.e. with the context stored in ThreadLocal). For example if:
I start a transaction in a transactional parent method
Make database insert #1 in an asynchronous call
Make database insert #2 in another asynchronous call
Then that will yield two different transactions (one for each insert) even though they shared the same "transactional" parent.
For example, let's say I perform two inserts (and using a very simple sample, i.e. not using an executor or completable future for brevity, etc.):
#Transactional
public void addInTransactionWithAnnotation() {
addNewRow();
addNewRow();
}
Will perform both inserts, as desired, as part of the same transaction.
However, if I wanted to parallelize those inserts for performance:
#Transactional
public void addInTransactionWithAnnotation() {
new Thread(this::addNewRow).start();
new Thread(this::addNewRow).start();
}
Then each one of those spawned threads will not participate in the transaction at all because transactions are Thread-bound.
Key Question: Is there a way to safely propagate the transaction to the child threads?
The only solutions I've thought of to solve this problem:
Use JTA or some XA manager, which by definition should be able to do
this. However, I ideally don't want to use XA for my solution
because of it's overhead
Pipe all of the transactional work I want performed (in the above example, the addNewRow() function) to a single thread, and do all of the prior work in the multithreaded fashion.
Figuring out some way to leverage InheritableThreadLocal on the Transaction status and propagate it to the child threads. I'm not sure how to do this.
Are there any more solutions possible? Even if it's tastes a little bit of like a workaround (like my solutions above)?
The JTA API has several methods that operate implicitly on the current Thread's Transaction, but it doesn't prevent you moving or copying a Transaction between Threads, or performing certain operations on a Transaction that's not bound to the current (or any other) Thread. This causes no end of headaches, but it's not the worst part...
For raw JDBC, you don't have a JTA Transaction at all. You have a JDBC Connection, which has its own ideas about transaction context. In which case, the transaction is Connection bound, not thread bound. Pass the Connection around and the tx goes with it. But Connections aren't necessarily threadsafe and are probably a performance bottleneck anyhow, so sharing one between multiple concurrent threads doesn't really help you. You likely need multiple Connections that think they are in the same Transaction, which means you need XA, since that's how the db identifies such cases. At which point you're back to JTA, but now with a JCA in the picture to handle the Connection management properly. In short, you've reinvented the JavaEE application server.
For frameworks that layer on JDBC e.g. ORMs like Hibernate, you have an additional complication: their abstractions are not necessarily threadsafe. So you can't have a Session that is bound to multiple Threads concurrently. But you can have multiple concurrent Sessions that each participate in the same XA transaction.
As usual it boils down to Amdahl's law. If the speedup you get from using multiple Connections per tx to allow for multiple concurrent Threads to share the db I/O work is large relative to what you get from batching, then the overhead of XA is worthwhile. If the speedup is in local computation and the db I/O is a minor concern, then a single Thread that handles the JDBC Connection and offloads non-IO computation work to a Thread pool is the way to go.
First, a clarification: if you want to speed up several inserts of the same kind, as your example suggests, you will probably get the best performance by issuing the inserts in the same thread and using some type of batch inserting. Depending on your DBMS there are several techniques available, look at:
Efficient way to do batch INSERTS with JDBC
What's the fastest way to do a bulk insert into Postgres?
As for your actual question, I would personally try to pipe all the work to a worker thread. It is the simplest option as you don't need to mess with either ThreadLocals or transaction enlistment/delistment. Furthermore, once you have your units of work in the same thread, if you are smart you might be able to apply the batching techniques above for better performance.
Lastly, piping work to worker threads does not mean that you must have a single worker thread, you could have a pool of workers and achieve some parallelism if it is really beneficial to your application. Think in terms of producers/consumers.
I read app engine wiki, on datastore contention if too frequent write
more than 5 times in 1 seconds. The wiki introduced use "shard"
approach" as workaround. May i know if we use spring #transactional
on this, this can prevent datastore contention timeout right since
writing in done concurrently ?
No, you can't do that. Whether or not you use #transactional, it will not make the problem go away - the fact that you have one object that you need to keep on writing to. The contention limit will continue to remain whatever approach you use.
The answer to this problem is actually deciding what it is you want to do, and how important accuracy is to you. Take the case of a simple counter, which is a common example of this problem. If you think accuracy is very important, then you will have to have a list of counters that you choose either sequentially, or at random, and write into. If you have ten counters in this list, then that gives you then times more writes per second, even transactional writes. You need to write code to choose which counters you want to write to, though.
On the other had, if you don't require too much precision, you could try writing to memcache very often. The write contention limits are much higher when writing to memcache or incrementing a counter there. You can then write out and reset the counter at a set interval.
When I was on a project that needed to store a lot of individual records to DB I found the system could not handle all the concurrent transactions. I instead build the object in memory and then all at once saved it to the DB.