I am writing an application which will be deployed on n number of nodes. The applications entity classes used the SEQUENCE generation strategy to generate the primary keys. Since, there would be bulk inserts; we shall be giving an allocation size as well.
The concern is when the application will be deployed on n nodes and if two nodes simultaneously requests next sequence from the defined sequence in database:
Wouldn't there be any race condition?
Or is that sequence also has some light weight locking mechanism to serve the requests sequentially, as it happens in IDENTITY strategy?
Or sequence is not the right solution to this problem?
Kindly help. Thank you !
Think of Sequence as a table with one column storing an integer representing the current id. Each time you insert a new entry, the next operations happen in a transaction:
The current value from SEQUENCE table is read
That value is assigned as ID to the new entry
The value from SEQUENCE is incremented
To answer your questions
The concurrency issues are addressed by the database.
Since inserts happen in a transaction (both simple and bulk inserts), the consistency on ID generation is enforced by the database engine via transactions (by the isolation level of the transaction to be more precise). Make sure your database engine supports transactions.
Sequence is the right solution, assuming your database engine supports transactions.
Related
I would like to have a persistent, distributed counter. My idea is to use a database sequence. Only sequence. I do not want to have a table, because I will not populate the table. I just need a sequence of unique numbers.
I don't want to use naive select mys-seq.nextval from dual (or org.springframework.jdbc.support.incrementer.OracleSequenceMaxValueIncrementer) because I would like to use sequence caching ability - I do not want to hit the database every time I need a new number.
I guess I should use org.hibernate.id.enhanced.SequenceStyleGenerator, but I cannot find any example of how to use it "standalone", without entity.
Unfortunately, all examples I found describes how to configure entity id generation with the sequence.
PS. I have the Spring Boot app.
I found simple solution for my problem: I can treat each number from database sequence as a range of numbers to use. For example, if sequence returns 5, it means that reserved range for my counter is 5000 - 5999.
In that solution I will hit database once for thousand numbers.
Initially I thought I have to utilize internal database level sequence number caching, but the same result I can achieve with trivial application level caching
I have two table with below model:
CREATE TABLE IF NOT EXISTS INV (
CODE TEXT,
PRODUCT_CODE TEXT,
LOCATION_NUMBER TEXT,
QUANTITY DECIMAL,
CHECK_INDICATOR BOOLEAN,
VERSION BIGINT,
PRIMARY KEY ((LOCATION_NUMBER, PRODUCT_CODE)));
CREATE TABLE IF NOT EXISTS LOOK_INV (
LOCATION_NUMBER TEXT,
CHECK_INDICATOR BOOLEAN,
PRODUCT_CODE TEXT,
CHECK_INDICATOR_DDTM TIMESTAMP,
PRIMARY KEY ((LOCATION_NUMBER), CHECK_INDICATOR, PRODUCT_CODE))
WITH CLUSTERING ORDER BY (CHECK_INDICATOR ASC, PRODUCT_CODE ASC);
I have a business operation where i need to update CHECK_INDICATOR in both the tables and QUANTITY in INV table.
As CHECK_INDICATOR is a part of key in LOOK_INV table, i need to delete the row first and insert a new row.
Below are the three operations i need to perform in batch fashion (either all will be executed sucessfully or none should be executed)
Delete row from LOOK_INV table.
Insert row in LOOK_INV table.
Update QUANTITY and CHECK_INDICATOR in INV table.
As INV table is getting access by multiple threads, i need to make sure before updating INV table row that it has not been changed since last read.
I am using LWT transaction to update INV table using VERSON column and batch operation for deletion and insertion in LOOK_INV table.I want to add all the three operation in batch.But since LWT is not acceptable in batch i need to execute in aforesaid fashion.
The problem with this approach is that in some scenario batch get executed sucessfully but updating INV table results in timeout exception and data become incosistent in both the table.
Is there any feature provided by cassandra to handle these type of scenario elegantly?
Caution with Lightweight Transactions (LWT)
Lightweight Transactions are currently considered a Cassandra anti-pattern because of the performance issues you are suffering.
Here is a bit of context to explain.
Cassandra does not use RDBMS ACID transactions with rollback or locking mechanisms. It does not provide locking because of a fundamental constraint on all kinds of distributed data store called the CAP Theorem. It states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Because of this, Cassandra is not good for atomic operations and you should not use Cassandra for this purpose.
It does provide lightweight transactions, which can replace locking in some cases. But because the Paxos protocol (the basis for LWT) involves a series of actions that occur between nodes, there will be multiple round trips between the node that proposes a LWT and the other replicas that are part of the transaction.
This has an adverse impact on performance and is one reason for the WriteTimeoutException error. In this situation you can't know if the LWT operation has been applied, so you need to retry it in order to fallback to a stable state. Because LWTs are so expensive, the driver will not automatically retry it for you.
LTW comes with big performance penalties if used frequently, and we see some clients with big timeout issues due to using LWTs.
Lightweight transactions are generally a bad idea and should be used infrequently.
If you do require ACID properties on part of your workload but still require it to scale , consider shifting that part of your load to cochroach BD.
In summary, if you do need ACID transactions it is generally a lot easier to bring a second technology in.
I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.
This article says:
Unlike identity, the next number for the column value will be retrieved from memory rather than from the disk – this makes Sequence significantly faster than Identity
Does it mean that ID comes from disk in case of identity? If yes, then which disk and how?
Using sequence, I can see in the log, an extra select query to DB while inserting a new record. But I didn't find that extra select query in the log in case of identity.
Then how sequence becomes faster than identity?
Strategy used by sequence:
Before inserting a new row, ask the database for the next sequence value, then insert this row with the returned sequence value as ID.
Strategy used by identity:
Insert a row without specifying a value for the ID. After inserting the row, ask the database for the last generated ID.
The number of queries is thus the same in both cases. But, Hibernate uses by default a strategy that is more efficient for the sequence generator. In fact, when it asks for the next sequence value, it keeps th 50 (that's the dafault, IIRC, and it's configurable) next values in memory, and uses these 50 next values for the next 50 inserts. Only after 50 inserts, it goes to the database to get the 50 next values. This tremendously reduces the number of needed SQL queries needed for automatic ID generation.
The identity strategy doesn't allow for such an optimization.
The IDENTITY generator will always require a database hit for fetching the primary key value without waiting for the flush to synchronize the current entity state transitions with the database.
So the IDENTITY generator doesn't play well with Hibernate write-behind first level cache strategy, therefore JDBC batching is disabled for the IDENTITY generator.
The sequence generator can benefit from database value preallocation and you can even employ a hi/lo optimization strategy.
In my opinion, the best generators are the pooled and pooled-lo sequence generators. These generators combine the batch-friendly sequence generator with a client-side value generation optimization that's compatible with other DB clients that may insert rows without knowing anything about our generation strategy.
Anyway, you should never choose the TABLE generator because it performs really bad.
Though I'm personally new to Hibernate, from what I can recall, using Identity basically means that Hibernate will check what is the next possible id value from your DB and keep a value for it.
For sequence, you basically tell Hibernate to generate the next value based on a particular sequence you provide it. So it has to actually calculate the next id by looking at the next possible id value. Hence, the extra query is fired.
maybe this will answer your question :
Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.
you can find the detail here
Identity doesnt need that extra select query because Identity is a table dependent and Sequence is independent from table, but because of this we can get sequence even before creating a row(when you do session.save(T entity), sequence is generated even before you commit the transaction).
sequence :
you create or update entities -> each time you save entity -> hibernate get next sequence value -> your program return the value after all process complete without exception or rollback -> you commit all transaction -> hibernate insert all complete entity
identity : when commit transaction, insert incomplete entity(must get it from identity column). so the INSERT command of sequence is definitely slower, but the advantages is if you cancel the insert the count doesn't increasing.
I am using Hibernate 3.0 in my application with Postgres database. It is a monitoring application and gathers data every minute. So we have thousands of rows in some tables every month.
Currently i am using sequence for generating Id in hibernate. Is there any better option according to this scenario?
Any suggestion will be appreciated.
IMHO sequence is the best approach because it gives you more flexibility although you may also use identity (auto-increment) column. I think it postgres it is called serial and there is also a way to store ids in sepearate table. To address these 3 approach you may use
appropriately :
#GeneratedValue(strategy=GenerationType.TABLE)
#GeneratedValue(strategy=GenerationType.SEQUENCE)
#GeneratedValue(strategy=GenerationType.IDENTITY)
As for your previous question whether it is good to use single sequence for all tables. I wouldn't recommend this approach becasue db must assert that all sequence numbers are unique that is why each sequence generated value needs to be synchronized by the db server. If you have single sequence per db it may cause performace issues when multiple requests from multiple tables asks for next id value. I would rather recommend to have single sequence per table.
While I am not sure if there is a better alternative than using a sequence, I am pretty sure that you would want to look at using StatelessSession if this is just for gathering data. You can get rid of all the overhead for e.g 1st level cache, transactional write-behind etc