How to implement transaction with rollback in Redis - java

My program needs to add data to two lists in Redis as a transaction. Data should be consistent in both lists. If there is an exception or system failure and thus program only added data to one list, system should be able to recover and rollback. But based on Redis doc, it doesn't support rollback. How can I implement this? The language I use is Java.

If you need transaction rollback, I recommend using something other than Redis. Redis transactions are not the same as for other datastores. Even Multi/Exec doesn't work for what you want - first because there is no rollback. If you want rollback you will have to pull down both lists so you can restore - and hope that between our error condition and the "rollback" no other client also modified either of the lists. Doing this in a sane and reliable way is not trivial, nor simple. It would also probably not be a good question for SO as it would be very broad, and not Redis specific.
Now as to why EXEC doesn't do what one might think. In your proposed scenario MULTI/EXEC only handles the cases of:
You set up WATCHes to ensure no other changes happened
Your client dies before issuing EXEC
Redis is out of memory
It is entirely possible to get errors as a result of issuing the EXEC command. When you issue EXEC, Redis will execute all commands in the queue and return a list of errors. It will not provide the case of the add-to-list-1 working and add-to-list-2 failing. You would still have your two lists out of sync. When you issue, say an LPUSH after issuing MULTI, you will always get back an OK unless you:
a) previously added a watch and something in that list changed or
b) Redis returns an OOM condition in response to a queued push command
DISCARD does not work like some might think. DISCARD is used instead of EXEC, not as a rollback mechanism. Once you issue EXEC your transaction is completed. Redis does not have any rollback mechanism at all - that isn't what Redis' transaction are about.
The key to understanding what Redis calls transactions is to realize they are essentially a command queue at the client connection level. They are not a database state machine.

Redis transactions are different. It guarantees two things.
All or none of the commands are executed
sequential and uninterrupted commands
Having said that, if you have the control over your code and know when the system failure would happen (some sort of catching the exception) you can achieve your requirement in this way.
MULTI -> Start transaction
LPUSH queue1 1 -> pushing in queue 1
LPUSH queue2 1 -> pushing in queue 2
EXEC/DISCARD
In the 4th step do EXEC if there is no error, if you encounter an error or exception and you wanna rollback do DISCARD.
Hope it makes sense.

Related

How to SET LOCK MODE in java application

I am working on a Java web application that uses Weblogic to connect to an Informix database. In the application we have multiple threads creating records in a table.
It happens pretty often that it fails and the following error is thrown:
java.sql.SQLException: Could not do a physical-order read to fetch next row....
Caused by: java.sql.SQLException: ISAM error: record is locked.
I am assuming that both threads are trying to insert or update when the record is locked.
I did some research and found that there is an option to set the database that instead of throwing an error, it should wait for the lock to be released.
SET LOCK MODE TO WAIT;
SET LOCK MODE TO WAIT 17;
I don't think that there is an option in JDBC to use this setting. How do I go about using this setting in my java web app?
You can always just send that SQL straight up, using createStatement(), and then send that exact SQL.
The more 'normal' / modern approach to this problem is a combination of MVCC, the transaction level 'SERIALIZABLE', retry, and random backoff.
I have no idea if Informix is anywhere near that advanced, though. Modern DBs such as Postgres are (mysql does not count as modern for the purposes of MVCC/serializable/retry/backoff, and transactional safety).
Doing MVCC/Serializable/Retry/Backoff in raw JDBC is very complicated; use a library such as JDBI or JOOQ.
MVCC: A mechanism whereby transactions are shallow clones of the underlying data. 2 separate transactions can both read and write to the same records in the same table without getting in each other's way. Things aren't 'saved' until you commit the transaction.
SERIALIZABLE: A transaction level (also called isolationlevel), settable with jdbcDbObj.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE); - the safest level. If you know how version control systems work: You're asking the database to aggressively rebase everything so that the entire chain of commits is ordered into a single long line of events: Each transaction acts as if it was done after the previous transaction was completed. The simplest way to implement this level is to globally lock all the things. This is, of course, very detrimental to multithread performance. In practice, good DB engines (such as postgres) are smarter than that: Multiple threads can simultaneously run transactions without just being frozen and waiting for locks; the DB engine instead checks if the things that the transaction did (not just writing, also reading) is conflict-free with simultaneous transactions. If yes, it's all allowed. If not, all but one simultaneous transaction throw a retry exception. This is the only level that lets you do this sequence of events safely:
Fetch the balance of isaace's bank account.
Fetch the balance of rzwitserloot's bank account.
subtract €10,- from isaace's number, failing if the balance is insufficient.
add €10,- to rzwitserloot's number.
Write isaace's new balance to the db.
Write rzwitserloot's new balance to the db.
commit the transaction.
Any level less than SERIALIZABLE will silently fail the job; if multiple threads do the above simultaneously, no SQLExceptions occur but the sum of the balance of isaace and rzwitserloot will change over time (money is lost or created – in between steps 1 & 2 vs. step 5/6/7, another thread sets new balances, but these new balances are lost due to the update in 5/6/7). With serializable, that cannot happen.
RETRY: The way smart DBs solve the problem is by failing (with a 'retry' error) all but one transaction, by checking if all SELECTs done by the entire transaction are not affected by any transactions that been committed to the db after this transaction was opened. If the answer is yes (some selects would have gone differently), the transaction fails. The point of this error is to tell the code that ran the transaction to just.. start from the top and do it again. Most likely this time there won't be a conflict and it will work. The assumption is that conflicts CAN occur but usually do not occur, so it is better to assume 'fair weather' (no locks, just do your stuff), check afterwards, and try again in the exotic scenario that it conflicted, vs. trying to lock rows and tables. Note that for example ethernet works the same way (assume fair weather, recover errors afterwards).
BACKOFF: One problem with retry is that computers are too consistent: If 2 threads get in the way of each other, they can both fail, both try again, just to fail again, forever. The solution is that the threads twiddle their thumbs for a random amount of time, to guarantee that at some point, one of the two conflicting retriers 'wins'.
In other words, if you want to do it 'right' (see the bank account example), but also relatively 'fast' (not globally locking), get a DB that can do this, and use JDBI or JOOQ; otherwise, you'd have to write code to run all DB stuff in a lambda block, catch the SQLException, check the SqlState to see if it is indicating that you should retry (sqlstate codes are DB-engine specific), and if yes, rerun that lambda, after waiting an exponentially increasing amount of time that also includes a random factor. That's fairly complicated, which is why I strongly advise you rely on JOOQ or JDBI to take care of this for you.
If you aren't ready for that level of DB usage, just make a statement and send "SET LOCK MDOE TO WAIT 17;" as SQL statement straight up, at the start of opening any connection. If you're using a connection pool there is usually a place you can configure SQL statements to be run on connection start.
The Informix JDBC driver does allow you to automatically set the lock wait mode when you connect to the server.
Simply pass via the DataSource or connection URL the following parameter
IFX_LOCK_MODE_WAIT=17
The values for JDBC are
(-1) Wait forever
(0) not wait (default)
(> 0) wait this many seconds
See https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.jdbc.doc/ids_jdbc_040.htm
Connection conn = DriverManager.getConnection ( "jdbc:Informix-sqli://cleo:1550:
IFXHOST=cleo;PORTNO=1550;user=rdtest;password=my_passwd;IFX_LOCK_MODE_WAIT=17";);

Mysql transactions and RabbitMq

I have to do a save in the mysql db, and then send an event over rabbitmq to my clients.
Now when pushing to RMq, if it fails for whatever reason, i need to rollback the save. It should be all or nothing, there cannot be data in the db for which events have not gone out.
So essentially,
Begin Transaction
Save to db
Push to queue
If exception rollback else commit
End transaction
Another approach
Now i can do the save operation only in the transaction. And then after that have some way of retrying the queuing if it fails, but that becomes overly complex.
Are there any best practices around this? Any suggestions regarding which approach to follow.
PS: the events over rmq contain ids for which some data has changed. The clients are expected to do a http get for the changed ids to perform their actions.
Well you pretty much doing everything right. RabbitMQ guys published a good article about semaphore queues: https://www.rabbitmq.com/blog/2014/02/19/distributed-semaphores-with-rabbitmq/
I would consider this to be a "best practice", since it is written by the people who somehow involved in developing the RabbitMQ.
If I understood correctly your main concern is whether you published the message to exchange successfully. You can use Confirms in RabbitMQ to be sure that your message is accepted by the queue.
Depending on how did you design the updates in your DB and how much time difference is tolerated from DB update to your clients getting the ID few things come to mind:
You can add a new field in a DB that you can use as a flag to check if the message for specific update is sent to the RabbitMQ. When you do the update you set the flag to 0, try to publish the message to the exchange and if it is successful you update the flag to 1. This can be complicated if you need to use the old values until the message to the clients has been sent and will require you to go through the database in some intervals and try to send the message to RabbitMQ again for all rows that don't have a flag set to 1.
Publish the message to RabbitMQ and if you get the confirm you then commit the DB update. If the commit for any reason fails your clients will do the update but get the old values which usually won't be problematic, but it really depends on your application. This is probably the better/easier way especially since there is no reason to expect anything to fail constantly. You should also consider to add some kind of short delay on the client side from the time it receives the ID for update and actually doing the update (to give some time for DB update).
In such heterogeneous environment you can use Two Phase Commits. You add "dirty " record to db, publish message, then edit record to mark it "clean" and ready.
Also when you deal with messaging middleware you have to be ready that some messages may be potentially lost or their consumer fail to process it in full, so maybe some feedback from consumer may be required, say you publish message and when consumer receive it, then it will add record to db.

XA/JTA transaction: JMS message arrives before DB changes are visible

Context is:
producer (JTA transaction PT) is both sending message to JMS queue and making DB update;
consumer (JTA transaction CT) listens on same queue and reads DB when message is received;
application server - WebLogic, DB - Oracle.
I've observed, that sometimes CT is not (yet?) able to see DB changes of PT, event if corresponding JMS message is already received (PT is committed?).
It seems that JTA can't guarantee consistency of such kind (this was also confirmed in Jurgen Holler's presentation "Transaction Choices for Performance").
What is the best way to avoid such problem (except obvious - not using JTA)?
Thanks.
So it seems there is no simple, elegant and fail-proof solution for that. In our case it was decided to rely on simple redelivery mechanism (throwing exception and letting JMS message to be redelivered after certain amount of time).
Also considered:
Marking DB datasource as and expecting Last Resource Commit Optimization (LRCO) to kick-in (thus partially controlling order of commits inside XA transaction). Rejected due to dependency to internals of application server (WL).
Setting DeliveryDelay to JMS message, so it can be consumed only after some time, when (supposedly) DB sync is over. Rejected due to lack of guarantee and need to fine-tune it for different environments.
Blog post mentioned in other answer indeed contains all these and several other options covered (but no definitive one).
some options are outlined here:
http://jbossts.blogspot.co.uk/2011/04/messagingdatabase-race-conditions.html
Concerning the Answer:
"So it seems there is no simple, elegant and fail-proof solution for
that. In our case it was decided to rely on simple redelivery
mechanism (throwing exception and letting JMS message to be
redelivered after certain amount of time)."
This is only fail proof if your second transaction that starts after Transaction1 logically ends has a way of detecting that the Transaction 1 changes are not yet visible and blow up itself on techichal exception.
When you have Transaction 2 that is a different process than Transaction 1 then this is likely to be possible to check. Most likely the output of Transaction 1 is necessary to the success of transaction 2 to go forward. You can only make french fries if you have potatoes... If you have no potatoes you can blow up and try again next time.
However, if your process that is breaking due to the DB appearing stale is the exact same process that run on Transaction 1 itself. You are just adding potatoes into a bowel (e.g. a db table) and fail to detect that you bowel is overlfowing and continue running transactions to pumptit up... Such a check may be out of your hands.
Something of the sort, happens to be my case.
A theoretical solution for this might very well be to try to induce a Dirty Read on the DB by creating an artificial entity equivalent to the #Version field of JPA, forcing each process that needs to run serially to hammer an update on a common entity. If both transaction 2 and 1 update a common field on a common entity, the process will have to break - either you get a JPA optimistic lock exception on the second transaction or if you get a dirty read update exception from the DB.
I have not tested this approach yet, but it is likely going to be the needed work around, sadly enough.

Two threads reading from the same table:how do i make both thread not to read the same set of data from the TASKS table

I have a tasks thread running in two separate instances of tomcat.
The Task threads concurrently reads (using select) TASKS table on certain where condition and then does some processing.
Issue is ,sometimes both the threads pick the same task , because of which the task is executed twice.
My question is how do i make both thread not to read the same set of data from the TASKS table
It is just because your code(which is accessing data base)DAO function is not synchronized.Make it synchronized,i think your problem will be solved.
If the TASKS table you mention is a database table then I would use Transaction isolation.
As a suggestion, within a trasaction, set an attribute of the TASK table to some unique identifiable value if not set. Commit the tracaction. If all is OK then the task has be selected by the thread.
I haven't come across this usecase so treat my suggestion with catuion.
I think you need to see some information how does work with any enterprise job scheduler, for example with Quartz
For your use case there is a better tool for the job - and that's messaging. You are persisting items that need to be worked on, and then attempting to synchronise access between workers. There are a number of issues that you would need to resolve in making this work - in general updating a table and selecting from it should not be mixed (it locks), so storing state there doesn't work; neither would synchronization in your Java code, as that wouldn't survive a server restart.
Using the JMS API with a message broker like ActiveMQ, you would publish a message to a queue. This message would contain the details of the task to be executed. The message broker would persist this somewhere (either in its own message store, or a database). Worker threads would then subscribe to the queue on the message broker, and each message would only be handed off to one of them. This is quite a powerful model, as you can have hundreds of message consumers all acting on tasks so it scales nicely. You can also make this as resilient as it needs to be, so tasks can survive both Tomcat and broker restarts.
Whether the database can provide graceful management of this will depend largely on whether it is using strict two-phase locking (S2PL) or multi-version concurrency control (MVCC) techniques to manage concurrency. Under MVCC reads don't block writes, and vice versa, so it is very possible to manage this with relatively simple logic. Under S2PL you would spend too much time blocking for the database to be a good mechanism for managing this, so you would probably want to look at external mechanisms. Of course, an external mechanism can work regardless of the database, it's just not really necessary with MVCC.
Databases using MVCC are PostgreSQL, Oracle, MS SQL Server (in certain configurations), InnoDB (except at the SERIALIZABLE isolation level), and probably many others. (These are the ones I know of off-hand.)
I didn't pick up any clues in the question as to which database product you are using, but if it is PostgreSQL you might want to consider using advisory locks. http://www.postgresql.org/docs/current/interactive/explicit-locking.html#ADVISORY-LOCKS I suspect many of the other products have some similar mechanism.
I think you need have some variable (column) where you keep last modified date of rows. Your threads can read same set of data with same modified date limitation.
Edit:
I did not see "not to read"
In this case you need have another table TaskExecutor (taskId , executorId) , and when some thread runs task you put data to TaskExecutor; and when you start another thread it just checks that task is already executing or not (Select ... from RanTask where taskId = ...).
Нou also need to take care of isolation level for transaсtions.

Multithread-safe JDBC Save or Update

We have a JMS queue of job statuses, and two identical processes pulling from the queue to persist the statuses via JDBC. When a job status is pulled from the queue, the database is checked to see if there is already a row for the job. If so, the existing row is updated with new status. If not, a row is created for this initial status.
What we are seeing is that a small percentage of new jobs are being added to the database twice. We are pretty sure this is because the job's initial status is quickly followed by a status update - one process gets one, another process the other. Both processes check to see if the job is new, and since it has not been recorded yet, both create a record for it.
So, my question is, how would you go about preventing this in a vendor-neutral way? Can it be done without locking the entire table?
EDIT: For those saying the "architecture" is unsound - I agree, but am not at liberty to change it.
Create a unique constraint on JOB_ID, and retry to persist the status in the event of a constraint violation exception.
That being said, I think your architecture is unsound: If two processes are pulling messages from the queue, it is not guaranteed they will write them to the database in queue order: one consumer might be a bit slower, a packet might be dropped, ..., causing the other consumer to persist the later messages first, causing them to be overridden with the earlier state.
One way to guard against that is to include sequence numbers in the messages, update the row only if the sequence number is as expected, and delay the update otherwise (this is vulnerable to lost messages, though ...).
Of course, the easiest way would be to have only one consumer ...
JDBC connections are not thread safe, so there's nothing to be done about that.
"...two identical processes pulling from the queue to persist the statuses via JDBC..."
I don't understand this at all. Why two identical processes? Wouldn't it be better to have a pool of message queue listeners, each of which would handle messages landing on the queue? Each listener would have its own thread; each one would be its own transaction. A Java EE app server allows you to configure the size of the message listener pool to match the load.
I think a design that duplicates a process like this is asking for trouble.
You could also change the isolation level on the JDBC connection. If you make it SERIALIZABLE you'll ensure ACID at the price of slower performance.
Since it's an asynchronous process, performance will only be an issue if you find that the listeners can't keep up with the messages landing on the queue. If that's the case, you can try increasing the size of the listener pool until you have adequate capacity to process the incoming messages.

Categories