We have a "audit" table that we create lots of rows in. Our persistence layer queries the audit table sequence to create a new row in the audit table. With millions of rows being created daily the select statement to get the next value from the sequence is one of our top ten most executed queries. We would like to reduce the number of database roundtrips just to get the sequence next value (primary key) before inserting a new row in the audit table. We know you can't batch select statements from JDBC. Are there any common techniques for reducing database roundtrips to get a sequence next value?
Get a couple (e.g. 1000) of sequence values in advance by a single select:
select your_sequence.nextval
from dual
connect by level < 1000
cache the obtained sequences and use it for the next 1000 audit inserts.
Repeat this when you have run out of cached sequence values.
Skip the select statement for the sequence and generate the sequence value in the insert statement itself.
insert (ID,..) values (my_sequence.nextval,..)
No need for an extra select. If you need the sequence value get it by adding a returning clause.
insert (ID,..) values (my_sequence.nextval,..) returning ID into ..
Save some extra time by specifying a cache value for the sequence.
I suggest you change the "INCREMENT BY" option of the sequence and set it to a number like 100 (you have to decide what step size must be taken by your sequence, 100 is an example.)
then implement a class called SequenceGenerator, in this class you have a property that contains the nextValue, and every 100 times, calls the sequence.nextVal in order to keep the db sequence up to date.
in this way you will go to db every 100 inserts for the sequence nextVal
every time the application starts, you have to initialize the SequenceGenerator class with the sequence.nextVal.
the only downside of this approach is that if your application stops for any reason, you will loose some of the sequences values and there will be gaps in your ids. but it should not be a logical problem if you don't have anu business logic on the id values.
Related
I have got a table with auto increment primary key. This table is meant to store millions of records and I don't need to delete anything for now. The problem is, when new rows are getting inserted, because of some error, the auto increment key is leaving some gaps in the auto increment ids.. For example, after 5, the next id is 8, leaving the gap of 6 and 7. Result of this is when I count the rows, it results 28000, but the max id is 58000. What can be the reason? I am not deleting anything. And how can I fix this issue.
P.S. I am using insert ignore while inserting records so that it doesn't give error when I try to insert duplicate entry in unique column.
This is by design and will always happen.
Why?
Let's take 2 overlapping transaction that are doing INSERTs
Transaction 1 does an INSERT, gets the value (let's say 42), does more work
Transaction 2 does an INSERT, gets the value 43, does more work
Then
Transaction 1 fails. Rolls back. 42 stays unused
Transaction 2 completes with 43
If consecutive values were guaranteed, every transaction would have to happen one after the other. Not very scalable.
Also see Do Inserted Records Always Receive Contiguous Identity Values (SQL Server but same principle applies)
You can create a trigger to handle the auto increment as:
CREATE DEFINER=`root`#`localhost` TRIGGER `mytable_before_insert` BEFORE INSERT ON `mytable` FOR EACH ROW
BEGIN
SET NEW.id = (SELECT IFNULL(MAX(id), 0) + 1 FROM mytable);;
END
This is a problem in the InnoDB, the storage engine of MySQL.
It really isn't a problem as when you check the docs on “AUTO_INCREMENT Handling in InnoDB” it basically says InnoDB uses a special table to do the auto increments at startup
And the query it uses is something like
SELECT MAX(ai_col) FROM t FOR UPDATE;
This improves concurrency without really having an affect on your data.
To not have this use MyISAM instead of InnoDB as storage engine
Perhaps (I haven't tested this) a solution is to set innodb_autoinc_lock_mode to 0.
According to http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html this might make things a bit slower (if you perform inserts of multiple rows in a single query) but should remove gaps.
You can try insert like :
insert ignore into table select (select max(id)+1 from table), "value1", "value2" ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )
I'm trying to implement a counter with Java, Spring, Hibernate and Oracle SQL. Each record represents a count, by a given timestamp. Let's say each record is uniquely identified by the minute, and each record holds a count column. The service should expect to receive a ton of concurrent requests and my update a counter column for possibly the same record.
In my table, if the record does not exist, just insert the record in and set its count to 1. Otherwise, find the record by timestamp and increase its existing counter column by 1.
In order to ensure that we're maintain data consistency and integrity, I'm using pessimistic locking. For example, if 20 counts come in at the same time, and not necessarily by the same user, it's possible that we may override the record from a stale read of that record before updating. With locking, I'm ensuring that if 20 counts come in, the net effect on the database should represent the 20 count.
So locking is fine, but the problem is that if the record never did exist in the first place, and we have two or more concurrent requests coming in trying to update the not-yet-existant record, I've observed that the a duplicate record gets inserted because we cannot lock on a record that doesn't exist yet. How can we ensure that no duplicates get created in the table? Should it be controlled via Oracle? Or can I manage this via my app and Hibernate?
Thank you.
One was to avoid this sort of problem altogether would be to just generate the count at the time you actually query the data. Oracle has an analytic function ROW_NUMBER() which can assign a row number to each record in the result set of a query. As a rough example, consider the following query:
SELECT
ts,
ROW_NUMBER() OVER (ORDER BY ts) rn
FROM yourTable
The count you want would be in the rn column, representing the number of records appearing since the first entry in the table. Of course, you could further restrict the query.
This approach is robust to removing records, as the count would always start with 1. One drawback is that row number functionality is not supported by Hibernate. You would have to run this either as a native query or a stored proc.
I've got a weird situation here. I've used triggers and sequences to implement auto-increment. I insert the data into my tables from my web app which uses Hibernate. I test the web app in my machine (Netbeans) as well as on my office network (the web app is also deployed on our server with Wildfly).
It has always worked fine, until I started getting exceptions due to the unique constraint (Primary key). Then I discovered that the problem was the sequence that generates values for the ids. Example, For my table xtable, its sequence's last_number is 78400, the max id in xtable is 78308, but the sequence's nextval is 78304. I have no idea how that happens because I created the sequence with the following:
CREATE SEQUENCE XTABLE_SEQUENCE INCREMENT BY 1 START WITH 1;
I tried the following to update the sequence and make its NEXTVAL greater than the max(id) in the table, but I'm still getting the same result after n inserts
declare
maxval number(10);
begin
select max(ID) into maxval from XTABLE;
maxval := maxval+1;
execute immediate 'DROP SEQUENCE XTABLE_SEQUENCE';
execute immediate 'CREATE SEQUENCE XTABLE_SEQUENCE START WITH '|| maxval+50 ||' INCREMENT BY 1';
end;
Here is the trigger statement:
create or replace TRIGGER xtable_sequence_tr
BEFORE INSERT ON xtable FOR EACH ROW
WHEN (NEW.id IS NULL)
BEGIN
SELECT xtable_sequence.NEXTVAL INTO :NEW.id FROM DUAL;
END;
Or what is the proper way to implement autoincrement in Oracle in order to avoid the issue I am facing? At some point, I start getting unique key constraint violation on the primary key due to the fact that (I don't know for what reason) the max id in the table happens to be greater than the sequence.nextval used in the trigger. What is causing that and how to fix it?
To be honest this post is quite confusing on its own.
You state that,
"For my table xtable, Its sequence's last_number is 78400, the max id in xtable is 78308, but the sequence's nextval is 78304."
What it tells me is by having sequence last number as 78400, there were 100 sequences that were cached in memory and that would have to be started at 78300. Once 100 sequences are cached they can only be used as long as server is not restarted and they change sequence last value to show 78400 in your case but it doesn't mean that is how many sequences have already been used that are just sequences which are cached in memory to be used by next insert, unless the database is restarted in that case you will lose those sequence numbers that were cached. BTW sequence cache is shared among different sessions.
"but the sequence's nextval didn't change" Again you are assuming it that Last Value of sequence is same as sequence.nextval it is not the case. When you query dba_sequences view and look at Last_NUMBER column it represent last value CACHED not the last value generated by sequence.nextval or used in table.
To be honest to resolve this shouldn't take much effort.
A. Verify every time you insert row you must use sequence instead of running with procedures or triggers and then coming back to sequences, don't mix and match. (Remember one draw back of using direct sequences in insert is the order is not guaranteed like there could be entries like 1, 2 ,3 for id and next could be 10 reason could be that server was restarted and you lost unused cached value for sequences, if you really always want order than don't use sequence instead use procedure or other means).
B. Instead of first querying max id in table and then dropping sequence and then recreating again.
Drop the sequence first then get max value from table and then create sequence from that point onward. This will save you from losing track of sequence which may have been used already by dirty transaction from other sessions which may have been committed right when you were doing query to find max id on table.... but it is still not safe.
To make sure better results I would just create the new sequence starting from value above one value shown by below query, which should be used right before dropping the sequence.
select LAST_NUMBER from dba_sequences where sequence_name='YOUR_SEQUENCE_NAME'
Basically what I am saying is to be safe create the new sequence with greater value than the one currently been cached.
I figured the condition in which i was getting that problem. The thing is, while i was loading tens of thousands of records, for example executing a file containing 250000 insert queries, someone whould try to insert records (Through my webapp) at the same time. So probably, the problem occured when two insert query where gonna be executed at the same time.
We want to programmably copy all records from one table to another periodically.
Now I use SELECT * FROM users LIMIT 2 OFFSET <offset> for fetch records.
The table records like below:
user_1
user_2
user_3
user_4
user_5
user_6
When I fetched the first page (user_1, user_2), then the record "user_2" was be deleted at the source table.
And now I fetched the second page is (user_4, user_5), the third page is (user_6).
This lead to I lost the records "user_3" at the destination table.
And the real source table may be has 1000 000 records, How can I resolve the problem effectively?
First you should use an unique index on the source table and use it in an order clause to make sure that the order or the rows is consistent over time. Next you do not use offsets but start after the last element fetched.
Something like:
SELECT * FROM users ORDER BY id LIMIT 2;
for the first time, and then
SELECT * FROM users WHERE ID > last_recieved_id ORDER BY id LIMIT 2;
for the next ones.
This will be immune to asynchronous deletions.
I you have no unique index but have a non unique one in your table, you can still apply the above solution with a non-strict comparison operator. You will consistently re-get the last rows and it would certainly break with a limit 2, but it could work for reasonable values.
If you have no index - which is known to cause different other problems - the only reliable way is to have one single big select and use the SQL cursor to page.
This article says:
Unlike identity, the next number for the column value will be retrieved from memory rather than from the disk – this makes Sequence significantly faster than Identity
Does it mean that ID comes from disk in case of identity? If yes, then which disk and how?
Using sequence, I can see in the log, an extra select query to DB while inserting a new record. But I didn't find that extra select query in the log in case of identity.
Then how sequence becomes faster than identity?
Strategy used by sequence:
Before inserting a new row, ask the database for the next sequence value, then insert this row with the returned sequence value as ID.
Strategy used by identity:
Insert a row without specifying a value for the ID. After inserting the row, ask the database for the last generated ID.
The number of queries is thus the same in both cases. But, Hibernate uses by default a strategy that is more efficient for the sequence generator. In fact, when it asks for the next sequence value, it keeps th 50 (that's the dafault, IIRC, and it's configurable) next values in memory, and uses these 50 next values for the next 50 inserts. Only after 50 inserts, it goes to the database to get the 50 next values. This tremendously reduces the number of needed SQL queries needed for automatic ID generation.
The identity strategy doesn't allow for such an optimization.
The IDENTITY generator will always require a database hit for fetching the primary key value without waiting for the flush to synchronize the current entity state transitions with the database.
So the IDENTITY generator doesn't play well with Hibernate write-behind first level cache strategy, therefore JDBC batching is disabled for the IDENTITY generator.
The sequence generator can benefit from database value preallocation and you can even employ a hi/lo optimization strategy.
In my opinion, the best generators are the pooled and pooled-lo sequence generators. These generators combine the batch-friendly sequence generator with a client-side value generation optimization that's compatible with other DB clients that may insert rows without knowing anything about our generation strategy.
Anyway, you should never choose the TABLE generator because it performs really bad.
Though I'm personally new to Hibernate, from what I can recall, using Identity basically means that Hibernate will check what is the next possible id value from your DB and keep a value for it.
For sequence, you basically tell Hibernate to generate the next value based on a particular sequence you provide it. So it has to actually calculate the next id by looking at the next possible id value. Hence, the extra query is fired.
maybe this will answer your question :
Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.
you can find the detail here
Identity doesnt need that extra select query because Identity is a table dependent and Sequence is independent from table, but because of this we can get sequence even before creating a row(when you do session.save(T entity), sequence is generated even before you commit the transaction).
sequence :
you create or update entities -> each time you save entity -> hibernate get next sequence value -> your program return the value after all process complete without exception or rollback -> you commit all transaction -> hibernate insert all complete entity
identity : when commit transaction, insert incomplete entity(must get it from identity column). so the INSERT command of sequence is definitely slower, but the advantages is if you cancel the insert the count doesn't increasing.