Hibernate IDENTITY vs SEQUENCE entity identifier generators

Hibernate IDENTITY vs SEQUENCE entity identifier generators - java

This article says:
Unlike identity, the next number for the column value will be retrieved from memory rather than from the disk – this makes Sequence significantly faster than Identity
Does it mean that ID comes from disk in case of identity? If yes, then which disk and how?
Using sequence, I can see in the log, an extra select query to DB while inserting a new record. But I didn't find that extra select query in the log in case of identity.
Then how sequence becomes faster than identity?

Strategy used by sequence:
Before inserting a new row, ask the database for the next sequence value, then insert this row with the returned sequence value as ID.
Strategy used by identity:
Insert a row without specifying a value for the ID. After inserting the row, ask the database for the last generated ID.
The number of queries is thus the same in both cases. But, Hibernate uses by default a strategy that is more efficient for the sequence generator. In fact, when it asks for the next sequence value, it keeps th 50 (that's the dafault, IIRC, and it's configurable) next values in memory, and uses these 50 next values for the next 50 inserts. Only after 50 inserts, it goes to the database to get the 50 next values. This tremendously reduces the number of needed SQL queries needed for automatic ID generation.
The identity strategy doesn't allow for such an optimization.

The IDENTITY generator will always require a database hit for fetching the primary key value without waiting for the flush to synchronize the current entity state transitions with the database.
So the IDENTITY generator doesn't play well with Hibernate write-behind first level cache strategy, therefore JDBC batching is disabled for the IDENTITY generator.
The sequence generator can benefit from database value preallocation and you can even employ a hi/lo optimization strategy.
In my opinion, the best generators are the pooled and pooled-lo sequence generators. These generators combine the batch-friendly sequence generator with a client-side value generation optimization that's compatible with other DB clients that may insert rows without knowing anything about our generation strategy.
Anyway, you should never choose the TABLE generator because it performs really bad.

Though I'm personally new to Hibernate, from what I can recall, using Identity basically means that Hibernate will check what is the next possible id value from your DB and keep a value for it.
For sequence, you basically tell Hibernate to generate the next value based on a particular sequence you provide it. So it has to actually calculate the next id by looking at the next possible id value. Hence, the extra query is fired.

maybe this will answer your question :
Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.
you can find the detail here
Identity doesnt need that extra select query because Identity is a table dependent and Sequence is independent from table, but because of this we can get sequence even before creating a row(when you do session.save(T entity), sequence is generated even before you commit the transaction).
sequence :
you create or update entities -> each time you save entity -> hibernate get next sequence value -> your program return the value after all process complete without exception or rollback -> you commit all transaction -> hibernate insert all complete entity
identity : when commit transaction, insert incomplete entity(must get it from identity column). so the INSERT command of sequence is definitely slower, but the advantages is if you cancel the insert the count doesn't increasing.

Related

Oracle autoincrement with sequence and trigger issue

I've got a weird situation here. I've used triggers and sequences to implement auto-increment. I insert the data into my tables from my web app which uses Hibernate. I test the web app in my machine (Netbeans) as well as on my office network (the web app is also deployed on our server with Wildfly).
It has always worked fine, until I started getting exceptions due to the unique constraint (Primary key). Then I discovered that the problem was the sequence that generates values for the ids. Example, For my table xtable, its sequence's last_number is 78400, the max id in xtable is 78308, but the sequence's nextval is 78304. I have no idea how that happens because I created the sequence with the following:
CREATE SEQUENCE XTABLE_SEQUENCE INCREMENT BY 1 START WITH 1;
I tried the following to update the sequence and make its NEXTVAL greater than the max(id) in the table, but I'm still getting the same result after n inserts
declare
maxval number(10);
begin
select max(ID) into maxval from XTABLE;
maxval := maxval+1;
execute immediate 'DROP SEQUENCE XTABLE_SEQUENCE';
execute immediate 'CREATE SEQUENCE XTABLE_SEQUENCE START WITH '|| maxval+50 ||' INCREMENT BY 1';
end;
Here is the trigger statement:
create or replace TRIGGER xtable_sequence_tr
BEFORE INSERT ON xtable FOR EACH ROW
WHEN (NEW.id IS NULL)
BEGIN
SELECT xtable_sequence.NEXTVAL INTO :NEW.id FROM DUAL;
END;
Or what is the proper way to implement autoincrement in Oracle in order to avoid the issue I am facing? At some point, I start getting unique key constraint violation on the primary key due to the fact that (I don't know for what reason) the max id in the table happens to be greater than the sequence.nextval used in the trigger. What is causing that and how to fix it?

To be honest this post is quite confusing on its own.
You state that,
"For my table xtable, Its sequence's last_number is 78400, the max id in xtable is 78308, but the sequence's nextval is 78304."
What it tells me is by having sequence last number as 78400, there were 100 sequences that were cached in memory and that would have to be started at 78300. Once 100 sequences are cached they can only be used as long as server is not restarted and they change sequence last value to show 78400 in your case but it doesn't mean that is how many sequences have already been used that are just sequences which are cached in memory to be used by next insert, unless the database is restarted in that case you will lose those sequence numbers that were cached. BTW sequence cache is shared among different sessions.
"but the sequence's nextval didn't change" Again you are assuming it that Last Value of sequence is same as sequence.nextval it is not the case. When you query dba_sequences view and look at Last_NUMBER column it represent last value CACHED not the last value generated by sequence.nextval or used in table.
To be honest to resolve this shouldn't take much effort.
A. Verify every time you insert row you must use sequence instead of running with procedures or triggers and then coming back to sequences, don't mix and match. (Remember one draw back of using direct sequences in insert is the order is not guaranteed like there could be entries like 1, 2 ,3 for id and next could be 10 reason could be that server was restarted and you lost unused cached value for sequences, if you really always want order than don't use sequence instead use procedure or other means).
B. Instead of first querying max id in table and then dropping sequence and then recreating again.
Drop the sequence first then get max value from table and then create sequence from that point onward. This will save you from losing track of sequence which may have been used already by dirty transaction from other sessions which may have been committed right when you were doing query to find max id on table.... but it is still not safe.
To make sure better results I would just create the new sequence starting from value above one value shown by below query, which should be used right before dropping the sequence.
select LAST_NUMBER from dba_sequences where sequence_name='YOUR_SEQUENCE_NAME'
Basically what I am saying is to be safe create the new sequence with greater value than the one currently been cached.

I figured the condition in which i was getting that problem. The thing is, while i was loading tens of thousands of records, for example executing a file containing 250000 insert queries, someone whould try to insert records (Through my webapp) at the same time. So probably, the problem occured when two insert query where gonna be executed at the same time.

Prevent violating of UNIQUE constraint with Hibernate

I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.

Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.

Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.

Get multiple Oracle sequences in one roundtrip

We have a "audit" table that we create lots of rows in. Our persistence layer queries the audit table sequence to create a new row in the audit table. With millions of rows being created daily the select statement to get the next value from the sequence is one of our top ten most executed queries. We would like to reduce the number of database roundtrips just to get the sequence next value (primary key) before inserting a new row in the audit table. We know you can't batch select statements from JDBC. Are there any common techniques for reducing database roundtrips to get a sequence next value?

Get a couple (e.g. 1000) of sequence values in advance by a single select:
select your_sequence.nextval
from dual
connect by level < 1000
cache the obtained sequences and use it for the next 1000 audit inserts.
Repeat this when you have run out of cached sequence values.

Skip the select statement for the sequence and generate the sequence value in the insert statement itself.
insert (ID,..) values (my_sequence.nextval,..)
No need for an extra select. If you need the sequence value get it by adding a returning clause.
insert (ID,..) values (my_sequence.nextval,..) returning ID into ..
Save some extra time by specifying a cache value for the sequence.

I suggest you change the "INCREMENT BY" option of the sequence and set it to a number like 100 (you have to decide what step size must be taken by your sequence, 100 is an example.)
then implement a class called SequenceGenerator, in this class you have a property that contains the nextValue, and every 100 times, calls the sequence.nextVal in order to keep the db sequence up to date.
in this way you will go to db every 100 inserts for the sequence nextVal
every time the application starts, you have to initialize the SequenceGenerator class with the sequence.nextVal.
the only downside of this approach is that if your application stops for any reason, you will loose some of the sequences values and there will be gaps in your ids. but it should not be a logical problem if you don't have anu business logic on the id values.

increment or sequence instead of table generation JPA

I am developing an application that support multiple databases and hibernate fulfilling that requirement.
Now the issue is in primary auto generate key. some databases support auto increment and some required sequence for increment the identity. to solve this issue the use the following strategy
strategy = GenerationType.TABLE (javax.persistence)
This is fulfilling my requirement.
in this post, a user comment that
its always better to use increment or sequence instead of table generation if you need the ids to be in sequence
If I use the auto increment or sequence, it means it required some changes # annotation level, when I move one database to another (extra burden)
update me , it is really better to use increment or sequence instead of table generation or it is just a statement?

Auto increment drawbacks: You don't know the id until the transaction has committed (which can be a problem in JPA, since some EntityManager operations rely on Id's). Not all databases support auto increment fields.
Sequence drawbacks: Not all databases have sequences.
Table drawbacks: Id's are not necessarily consecutive.
Since it is very unlikely that you run out of Id's, using Table generation remains a good option. You can even tweak the id allocation size in order to use more consecutive id's (default size is 50):
#TableGenerator(name="myGenerator", allocationSize=1)
However, this will result in at least two queries to the id allocation table for each insert: one to step the value of the latest id, and one to retrieve it.

Using trigger to generate ID vs creating IDs manually

If we have a sequence to generate unique ID fields for a table, which of the 2 approaches is more efficient:
Create a trigger on insert, to populate the ID field by fetching nextval from sequence.
Calling nextval on the sequence in the application layer before inserting the object (or tuple) in the db.
EDIT: The application performs a mass upload. So assume thousands or a few millions of rows to be inserted each time the app runs. Would triggers from #1 be more efficient than calling the sequence within the app as mentioned in #2?

Since you are inserting a large number of rows, the most efficient approach would be to include the sequence.nextval as part of the SQL statement itself, i.e.
INSERT INTO table_name( table_id, <<other columns>> )
VALUES( sequence_name.nextval, <<bind variables>> )
or
INSERT INTO table_name( table_id, <<other columns>> )
SELECT sequence_name.nextval, <<other values>>
FROM some_other_table
If you use a trigger, you will force a context shift from the SQL engine to the PL/SQL engine (and back again) for every row you insert. If you get the nextval separately, you'll force an additional round-trip to the database server for every row. Neither of these are particularly costly if you do them once or twice. If you do them millions of times, though, the milliseconds add up to real time.

If you're only concerned about performance, on Oracle it'll generally be a bit faster to populate the ID with a sequence in your INSERT statement, rather than use a trigger, as triggers add a bit of overhead.
However (as Justin Cave says), the performance difference will probably be insignificant unless you're inserting millions of rows at a time. Test it to see.

What is a key? One or more fields to uniquely identify records, should be final and never change in the course of an application.
I make a difference between technical and business keys. Technical keys are defined on the database and are generated (sequence, uuid, etc ); business keys are defined by your domain model.
That's why I suggest
always generate technical PK's with a sequence/trigger on the database
never use this PK field in your application ( tip: mark the getId()
setId() #Deprecated )
define business fields which uniquely identify your entity and use these in equals/hashcode methods

I'd say if you already use hibernate, then let it control how the id's are created with #SequenceGenerator and #GeneratedValue. It will be more transparent, and Hibernate can reserve id's for itself so it might be more efficient than doing it by hand, or from a trigger.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.