Cassandra - Distributed Row Locking for Select AND Update - java

I have a micro-service which runs on multiple machines in two regions which connects to Cassandra DB. I have the following table structure.
CREATE TABLE gifts (
gift_id text,
user_id text
PRIMARY KEY gift_id
);
The table contains a list of gift id.
During multiple concurrent request each user must be assigned a unique gift i.e select a random gift_id and update the user_id.
Will LWT be helpful to solve this problem?
Limitations:
Cannot use zookeeper for locking
Cannot use any relational database.

Note that LWT doesn't affect SELECTs. The usual last-writer-wins and tunable consistency semantics apply (though running a SELECT with consistency level ALL guarantees that you'll see the write from the latest INSERT/UPDATE (which in Cassandra should best be considered the same: INSERT has upsert semantics)).
You can use LWTs to ensure that no two users share a gift_id. Simply INSERT with IF NOT EXISTS if once a gift is associated with a user it can't be associated with any other user. Alternatively, you can do a SELECT user_id FROM gifts WHERE gift_id = ... and then INSERT with IF user_id = the_user_id_from_the_select.

Related

Hibernate #DynamicUpdate

We have a table say 'A' in our system that has around 120 columns (I know that the table is not normalised and table normalisation exists as a backlog item in our roadmap).
Currently, we are facing a problem where multiple threads while updating same table row overwrite some of the existing updated columns. We do not have dynamic update annotation on our table entity.
I was thinking of adding that annotation on the table as that can solve our problem as different threads work on different columns of the table and if the sql is generated at runtime to only update those columns then it will not overwrite any update done on other columns by different thread.
But I read that hibernate caches insert, update sql for table with all columns in case dynamic update is not there.
So, will using #DynamicUpdate be actually beneficial here or the cost of generating dynamic sql at runtime would cause performance slow-down?
The table in question has millions of records and an insert or update happens every other second (updates being more frequent).
I will definitely recommend #DynamicUpdate to you. And let me explain you why:
a) DB Performance: By lowering the number of columns to update, you are cutting down DB server overhead to check referential integrity & constraints for each column(Primary Key, Foreign key, Unique key, Not Null etc), and to update respective indexes comprising the updated column.
b) You are only trading off with rebuilding cached Query plan everytime, but for multi-threaded environment remember it will not make the difference, as cached query plans are only valid till given session life and thus only useful for consecutive udpates in the same DB session.

How to executing batch statement and LWT as a transaction in Cassandra

I have two table with below model:
CREATE TABLE IF NOT EXISTS INV (
CODE TEXT,
PRODUCT_CODE TEXT,
LOCATION_NUMBER TEXT,
QUANTITY DECIMAL,
CHECK_INDICATOR BOOLEAN,
VERSION BIGINT,
PRIMARY KEY ((LOCATION_NUMBER, PRODUCT_CODE)));
CREATE TABLE IF NOT EXISTS LOOK_INV (
LOCATION_NUMBER TEXT,
CHECK_INDICATOR BOOLEAN,
PRODUCT_CODE TEXT,
CHECK_INDICATOR_DDTM TIMESTAMP,
PRIMARY KEY ((LOCATION_NUMBER), CHECK_INDICATOR, PRODUCT_CODE))
WITH CLUSTERING ORDER BY (CHECK_INDICATOR ASC, PRODUCT_CODE ASC);
I have a business operation where i need to update CHECK_INDICATOR in both the tables and QUANTITY in INV table.
As CHECK_INDICATOR is a part of key in LOOK_INV table, i need to delete the row first and insert a new row.
Below are the three operations i need to perform in batch fashion (either all will be executed sucessfully or none should be executed)
Delete row from LOOK_INV table.
Insert row in LOOK_INV table.
Update QUANTITY and CHECK_INDICATOR in INV table.
As INV table is getting access by multiple threads, i need to make sure before updating INV table row that it has not been changed since last read.
I am using LWT transaction to update INV table using VERSON column and batch operation for deletion and insertion in LOOK_INV table.I want to add all the three operation in batch.But since LWT is not acceptable in batch i need to execute in aforesaid fashion.
The problem with this approach is that in some scenario batch get executed sucessfully but updating INV table results in timeout exception and data become incosistent in both the table.
Is there any feature provided by cassandra to handle these type of scenario elegantly?
Caution with Lightweight Transactions (LWT)
Lightweight Transactions are currently considered a Cassandra anti-pattern because of the performance issues you are suffering.
Here is a bit of context to explain.
Cassandra does not use RDBMS ACID transactions with rollback or locking mechanisms. It does not provide locking because of a fundamental constraint on all kinds of distributed data store called the CAP Theorem. It states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Because of this, Cassandra is not good for atomic operations and you should not use Cassandra for this purpose.
It does provide lightweight transactions, which can replace locking in some cases. But because the Paxos protocol (the basis for LWT) involves a series of actions that occur between nodes, there will be multiple round trips between the node that proposes a LWT and the other replicas that are part of the transaction.
This has an adverse impact on performance and is one reason for the WriteTimeoutException error. In this situation you can't know if the LWT operation has been applied, so you need to retry it in order to fallback to a stable state. Because LWTs are so expensive, the driver will not automatically retry it for you.
LTW comes with big performance penalties if used frequently, and we see some clients with big timeout issues due to using LWTs.
Lightweight transactions are generally a bad idea and should be used infrequently.
If you do require ACID properties on part of your workload but still require it to scale , consider shifting that part of your load to cochroach BD.
In summary, if you do need ACID transactions it is generally a lot easier to bring a second technology in.

Hibernate concurrency creating a duplicate record on saveOrUpdate

I'm trying to implement a counter with Java, Spring, Hibernate and Oracle SQL. Each record represents a count, by a given timestamp. Let's say each record is uniquely identified by the minute, and each record holds a count column. The service should expect to receive a ton of concurrent requests and my update a counter column for possibly the same record.
In my table, if the record does not exist, just insert the record in and set its count to 1. Otherwise, find the record by timestamp and increase its existing counter column by 1.
In order to ensure that we're maintain data consistency and integrity, I'm using pessimistic locking. For example, if 20 counts come in at the same time, and not necessarily by the same user, it's possible that we may override the record from a stale read of that record before updating. With locking, I'm ensuring that if 20 counts come in, the net effect on the database should represent the 20 count.
So locking is fine, but the problem is that if the record never did exist in the first place, and we have two or more concurrent requests coming in trying to update the not-yet-existant record, I've observed that the a duplicate record gets inserted because we cannot lock on a record that doesn't exist yet. How can we ensure that no duplicates get created in the table? Should it be controlled via Oracle? Or can I manage this via my app and Hibernate?
Thank you.
One was to avoid this sort of problem altogether would be to just generate the count at the time you actually query the data. Oracle has an analytic function ROW_NUMBER() which can assign a row number to each record in the result set of a query. As a rough example, consider the following query:
SELECT
ts,
ROW_NUMBER() OVER (ORDER BY ts) rn
FROM yourTable
The count you want would be in the rn column, representing the number of records appearing since the first entry in the table. Of course, you could further restrict the query.
This approach is robust to removing records, as the count would always start with 1. One drawback is that row number functionality is not supported by Hibernate. You would have to run this either as a native query or a stored proc.

Locking Tables with postgres in JDBC

Just a quick question about locking tables in a postgres database using JDBC. I have a table for which I want to add a new record to, however, To do this for the primary key, I use an increasing integer value.
I want to be able to retrieve the max value of this column in Java and store it as a variable to be used as a new primary key when adding a new row.
This gives me a small problem, as this is going to be modelled as a multi-user system, what happens when 2 locations request the same max value? This will of course create a problem when trying to add the same primary key.
I realise that I should be using an EXCLUSIVE lock on the table to prevent reading or writing while getting the key and adding a new row. However, I can't seem to find any way to deal with table locking in JDBC, just standard transactions.
psuedo code as such:
primaryKey = "SELECT MAX(id) FROM table1;";
primary key++;
//id retrieved again from 2nd source
"INSERT INTO table1 (primaryKey, value 1, value 2);"
You're absolutely right, if two locations request at around the same time, you'll run into a race condition.
The way to handle this is to create a sequence in postgres and select the nextval as the primary key.
I don't know exactly what direction you're heading and how your handle your data, but you could also set the column as a serial and not even include the column in your insert query. The column will automatically auto increment.

Using trigger to generate ID vs creating IDs manually

If we have a sequence to generate unique ID fields for a table, which of the 2 approaches is more efficient:
Create a trigger on insert, to populate the ID field by fetching nextval from sequence.
Calling nextval on the sequence in the application layer before inserting the object (or tuple) in the db.
EDIT: The application performs a mass upload. So assume thousands or a few millions of rows to be inserted each time the app runs. Would triggers from #1 be more efficient than calling the sequence within the app as mentioned in #2?
Since you are inserting a large number of rows, the most efficient approach would be to include the sequence.nextval as part of the SQL statement itself, i.e.
INSERT INTO table_name( table_id, <<other columns>> )
VALUES( sequence_name.nextval, <<bind variables>> )
or
INSERT INTO table_name( table_id, <<other columns>> )
SELECT sequence_name.nextval, <<other values>>
FROM some_other_table
If you use a trigger, you will force a context shift from the SQL engine to the PL/SQL engine (and back again) for every row you insert. If you get the nextval separately, you'll force an additional round-trip to the database server for every row. Neither of these are particularly costly if you do them once or twice. If you do them millions of times, though, the milliseconds add up to real time.
If you're only concerned about performance, on Oracle it'll generally be a bit faster to populate the ID with a sequence in your INSERT statement, rather than use a trigger, as triggers add a bit of overhead.
However (as Justin Cave says), the performance difference will probably be insignificant unless you're inserting millions of rows at a time. Test it to see.
What is a key? One or more fields to uniquely identify records, should be final and never change in the course of an application.
I make a difference between technical and business keys. Technical keys are defined on the database and are generated (sequence, uuid, etc ); business keys are defined by your domain model.
That's why I suggest
always generate technical PK's with a sequence/trigger on the database
never use this PK field in your application ( tip: mark the getId()
setId() #Deprecated )
define business fields which uniquely identify your entity and use these in equals/hashcode methods
I'd say if you already use hibernate, then let it control how the id's are created with #SequenceGenerator and #GeneratedValue. It will be more transparent, and Hibernate can reserve id's for itself so it might be more efficient than doing it by hand, or from a trigger.

Categories